Power analysis #20

Michael-Howes · 2024-10-18T18:51:10Z

Overview: This pull requests adds functions to perform power analyses with PPI. The methodology behind the power analyses is developed in Section 3 of [BHvL2024]. The pull request includes:

A new python module ppi/ppi_power_analysis.py implementing the power analysis.
A jupyter notebook examples/ppi_power_analysis.ipynb to demonstrate the power analysis with examples.
A test file tests/test_power_analysis.py.

Motivation: Power analysis inform design choices and are a desirable feature for applied researchers. The implemented power analysis captures the trade-off between expensive high-quality labels and cheaper machine learning predictions. The power analysis also quantifies the effectiveness of PPI for a given dataset.

Implementation: Functions are named ppi_[estimand]_power in line with the existing PPI functions such as ppi_[estimand]_ci. The functions output a standardized dictionary containing the recommended number of labeled and unlabeled samples. The dictionary also contains other quantities related to the power analysis. The power analysis is currently implemented for mean estimation, linear regression, logistic regression and Poisson regression.

Testing: Tests are included in tests/test_power_analysis.py. The following features are tested:

The output satisfies the budget or effective sample size constraints.
The output is optimal given the costs.
The predicted effective sample size is close to realized effective sample size.

Dependencies: No new dependencies added.

Documentation: No additional documentation was added outside of the jupyter notebook (examples/power_analysis.ipynb). Let me know if you would like additional documentation.

Checklist:

Tested with pytest framework
Formatted with black
Documentation

…f desired standard error

tijana-zrnic

Approved. Thanks!

aangelopoulos · 2024-12-22T01:54:06Z

tests/test_power_analysis.py

+        1 + epsilon
+    ), f"{optimal_n}, {powerful_pair['effective_n']}"
+
+    ## Check if the estimated


Comment incomplete?

… contributors to the readme.

aangelopoulos · 2024-12-22T02:10:50Z

examples/power_analysis.ipynb

Fixed two errors:

First, np.concat does not exist. Made it np.concatenate.

Second, was getting a nan error. Added np.nan_to_num. It is a kluge to make the notebook work. Please feel free to add a different fix if there's one @Michael-Howes .

Actually, reverted the second change np.nan_to_num because it looks like it was caused by some corrupted data on my end. Re-downloading the dataset fixed it.

Michael-Howes and others added 30 commits August 5, 2024 18:17

Create power_ppi.py

70904c3

Update power_ppi.py

12ece40

create power analysis function mean estimation

03199e9

Update power_ppi.py

eb7c7bd

Create power_analysis.ipynb

c307a5a

update power analysis to return dict

90bad83

add ols power analysis

a27fb59

add ols example

904f410

Update power_analysis.ipynb

4edb6b9

add logisitic regression power analysis

42fb6f4

moved power analysis

faa8703

Update power_ppi.py

00973f0

add poisson power

8139a8f

add Poisson examples

6eb4f2d

added ppi_power for user supplied rho

3b2bd86

Update power_analysis.ipynb

88957c3

update .gitignore

6635528

added n_max constraint

a46b0b3

add warning for cheapest pair

d06f02d

run notebook

9c0c822

Update power_ppi.py

4899ace

Merge branch 'aangelopoulos:main' into main

6d4a77d

update power analysis

7a4f608

update notebook

a923b87

init notebook

9289979

update power analysis notebook

51e4f3f

Merge branch 'main' of https://github.com/Michael-Howes/ppi_py

06d72d4

add power tests

0f1e951

add moral machine data and notebook with AMCE function

2b7fa2f

move AMCE function to utils

6f75e73

Michael-Howes added 16 commits October 17, 2024 11:26

reformat with black

606d415

reformat with black

34833c4

remove moral machine notebook

9066071

remove moral machine function from utils

244e4f6

update gitignore

d1703c0

update .gitignore

ddabddb

add example .gitignore

cbb0951

remove moral machine function from utils

295cb82

use safe_expit

3b0387b

update references

b820440

update power analysis

21b8252

update power analysis example

7c727d3

update power analysis

b342d32

update power analysis example

425b798

rename power_ppi to ppi_power_analysis

76d4b8c

reformat with black

46f16ab

aangelopoulos requested review from aangelopoulos and tijana-zrnic October 18, 2024 18:53

aangelopoulos added the enhancement New feature or request label Oct 18, 2024

aangelopoulos assigned aangelopoulos and tijana-zrnic Oct 18, 2024

Michael-Howes added 3 commits December 2, 2024 11:53

update power analysis to used desired effective sample size instead o…

775e35a

…f desired standard error

update power analysis tests

0d412d7

reformat with black

37cb62f

tijana-zrnic approved these changes Dec 22, 2024

View reviewed changes

aangelopoulos reviewed Dec 22, 2024

View reviewed changes

aangelopoulos added 2 commits December 21, 2024 17:57

[black for formatting]

cbe850c

[minor] made a quick fix to the notebook and also added credit to the…

5b801f0

… contributors to the readme.

aangelopoulos approved these changes Dec 22, 2024

View reviewed changes

aangelopoulos merged commit 63d1782 into aangelopoulos:main Dec 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Power analysis #20

Power analysis #20

Michael-Howes commented Oct 18, 2024 •

edited

Loading

tijana-zrnic left a comment

aangelopoulos Dec 22, 2024

aangelopoulos Dec 22, 2024

aangelopoulos Dec 22, 2024

Power analysis #20

Power analysis #20

Conversation

Michael-Howes commented Oct 18, 2024 • edited Loading

tijana-zrnic left a comment

Choose a reason for hiding this comment

aangelopoulos Dec 22, 2024

Choose a reason for hiding this comment

aangelopoulos Dec 22, 2024

Choose a reason for hiding this comment

aangelopoulos Dec 22, 2024

Choose a reason for hiding this comment

Michael-Howes commented Oct 18, 2024 •

edited

Loading