-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Power analysis #20
Power analysis #20
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved. Thanks!
1 + epsilon | ||
), f"{optimal_n}, {powerful_pair['effective_n']}" | ||
|
||
## Check if the estimated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment incomplete?
… contributors to the readme.
examples/power_analysis.ipynb
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed two errors:
First, np.concat
does not exist. Made it np.concatenate
.
Second, was getting a nan
error. Added np.nan_to_num
. It is a kluge to make the notebook work. Please feel free to add a different fix if there's one @Michael-Howes .
data:image/s3,"s3://crabby-images/58c39/58c39f413d4f72151c1df24f0776ad42521d49dc" alt="image"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, reverted the second change np.nan_to_num
because it looks like it was caused by some corrupted data on my end. Re-downloading the dataset fixed it.
Overview: This pull requests adds functions to perform power analyses with PPI. The methodology behind the power analyses is developed in Section 3 of [BHvL2024]. The pull request includes:
ppi/ppi_power_analysis.py
implementing the power analysis.examples/ppi_power_analysis.ipynb
to demonstrate the power analysis with examples.tests/test_power_analysis.py
.Motivation: Power analysis inform design choices and are a desirable feature for applied researchers. The implemented power analysis captures the trade-off between expensive high-quality labels and cheaper machine learning predictions. The power analysis also quantifies the effectiveness of PPI for a given dataset.
Implementation: Functions are named
ppi_[estimand]_power
in line with the existing PPI functions such asppi_[estimand]_ci
. The functions output a standardized dictionary containing the recommended number of labeled and unlabeled samples. The dictionary also contains other quantities related to the power analysis. The power analysis is currently implemented for mean estimation, linear regression, logistic regression and Poisson regression.Testing: Tests are included in
tests/test_power_analysis.py
. The following features are tested:Dependencies: No new dependencies added.
Documentation: No additional documentation was added outside of the jupyter notebook (
examples/power_analysis.ipynb
). Let me know if you would like additional documentation.Checklist: