about summary refs log tree commit diff
AgeCommit message (Collapse)Author
5 daysClarify that the test suite is not for end users. HEAD mainArun Isaac
End users who install pyhegp via pip cannot run the test suite. Clarify this in the README. Perhaps, in the future, we should move these developer-oriented instructions to a separate document.
6 daysSeparate table of contents from introduction.Arun Isaac
If not separated, GitHub combines the table of contents with the list of papers in the introduction.
6 daysAdd table of contents to README.Arun Isaac
A table of contents gives people a brief overview of what's in the README, and allows them to jump to the section they are interested in.
6 daysReplace csv extension with tsv extension on genotype files.Arun Isaac
6 daysRemove txt extension from summary files.Arun Isaac
6 daysLink to file formats documentation from README.Arun Isaac
Readers are more likely to follow through to the file formats documentation if there is a link.
6 daysAdd example key file.Arun Isaac
6 daysAdd example genotype file.Arun Isaac
6 daysAdd example summary file.Arun Isaac
6 daysReduce precision in test data files.Arun Isaac
Reducing precision lowers the file size and makes the files more human-comprehensible.
6 daysAdd instructions to install via Guix.Arun Isaac
6 daysMark virtual environment creation as optional.Arun Isaac
Not everyone may want to create a virtual environment. For example, on some HPC machines, creating a virtual environment is complicated or does not work.
6 daysPackage as a CLI utility only, not a Python library.Arun Isaac
We have not exposed a Python library interface, and it is not clear if we need to. We can revisit this decision later, if need be.
7 daysSubset to common SNPs.Arun Isaac
* pyhegp/pyhegp.py: Import reduce from functools. (pool_summaries, encrypt_genotype): New functions. (pool): Use pool_summaries. (encrypt): Use encrypt_genotype. * tests/test_pyhegp.py: Import pandas; Summary, read_summary and read_genotype from pyhegp.serialization. (test_pool, test_encrypt): New tests. * test-data/encrypt-test-encrypted-genotype.tsv, test-data/encrypt-test-genotype.tsv, test-data/encrypt-test-key, test-data/encrypt-test-summary, test-data/pool-test-complete-summary, test-data/pool-test-summary1, test-data/pool-test-summary2: New files.
7 daysStandardize key files.Arun Isaac
* doc/file-formats.md (File formats)[key file]: New section. * pyhegp/serialization.py: Import numpy. (read_key, write_key): New functions. * pyhegp/pyhegp.py: Import write_key from pyhegp.serialization. (encrypt): Use write_key. * tests/test_serialization.py: Import arrays and array_shapes from hypothesis.extra.numpy; approx from pytest; read_key and write_key from pyhegp.serialization. (test_read_write_key_are_inverses): New test.
7 daysCompute summary on encryption if not provided.Arun Isaac
* pyhegp/pyhegp.py (genotype_summary): New function. (summary): Use genotype_summary. (encrypt): Compute summary if not provided. * tests/test_pyhegp.py (test_simple_workflow): Remove xfail mark.
7 daysAdd simple workflow.Arun Isaac
* README.md (How to use): Indent down into "Joint/federated analysis with many data owners" section. [Simple data sharing]: New section. * doc/generate-images.sh: Add simple workflow. * doc/workflow.png: Rename to doc/joint-workflow.png. * doc/workflow.uml: Rename to doc/joint-workflow.uml. * doc/simple-workflow.png, doc/simple-workflow.uml: New files. * tests/test_pyhegp.py: Import pytest. (test_simple_workflow): New test. * test-data/genotype.tsv: New file.
7 daysTest joint workflow CLI.Arun Isaac
* tests/test_pyhegp.py: Import CliRunner from click.testing, and main from pyhegp.pyhegp. (test_joint_workflow): New test. * test-data/genotype0.tsv, test-data/genotype1.tsv, test-data/genotype2.tsv, test-data/genotype3.tsv: New files.
7 daysStandardize file formats in the likeness of plink files.Arun Isaac
* pyhegp/pyhegp.py: Import pandas. (summary, pool, encrypt, cat): Use pandas data frames and new data format. * pyhegp/serialization.py: Import csv and pandas. (Summary)[mean, std]: Delete fields. [data]: New field. (read_summary, write_summary, read_genotype, write_genotype): Use pandas data frames and new data format. * tests/test_serialization.py: Import column, columns and data_frames from hypothesis.extra.pandas; pandas; negate from pyhegp.utils. Do not import hypothesis.extra.numpy and approx from pytest. (tabless_printable_ascii_text, chromosome_column, position_column, reference_column, sample_names): New variables. (summaries, genotype_reserved_column_name_p, genotype_frames): New functions. (test_read_write_summary_are_inverses): Use pandas data frames and new data format. (test_read_write_genotype_are_inverses): Use pandas for testing. * doc/file-formats.md (File formats)[summary file]: Describe new standard. [genotype file]: New section. * .guix/pyhegp-package.scm (pyhegp-package): Import python-pandas from (gnu packages python-science). (python-pyhegp)[propagated-inputs]: Add python-pandas. * pyproject.toml (dependencies): Add pandas.
7 daysMove negate to pyhegp.utils.Arun Isaac
* tests/test_pyhegp.py (negate): Move to pyhegp.utils. Import negate from pyhegp.utils. * pyhegp/utils.py: New file.
7 daysAdd gitignore.Arun Isaac
* .gitignore: New file.
7 daysLoosen relative tolerance in test_pool_stats.Arun Isaac
* tests/test_pyhegp.py (test_pool_stats): Set relative tolerance to 1e-6.
13 daysRename genotype_file argument in read_genotype.Arun Isaac
* pyhegp/serialization.py (read_genotype): Rename genotype_file argument to file.
13 daysTest that read_genotype and write_genotype are inverses.Arun Isaac
* tests/test_serialization.py: Import read_genotype and write_genotype from pyhegp.serialization. (test_read_write_genotype_are_inverses): New test.
13 daysEnsure that read genotype matrices have 2 dimensions.Arun Isaac
* pyhegp/serialization.py (read_genotype): Ensure 2 dimensions.
13 daysWrite genotype matrix with increased precision.Arun Isaac
* pyhegp/serialization.py (write_genotype): Write with format %.8g.
13 daysTab-separate data section of summary files.Arun Isaac
* pyhegp/serialization.py (read_summary, write_summary): Use tab as the delimiter. * doc/file-formats.md (File formats)[summary file]: Update documentation.
13 daysAbstract out write_genotype.Arun Isaac
* pyhegp/serialization.py (write_genotype): New function. * pyhegp/pyhegp.py: Import write_genotype from pyhegp.serialization. (encrypt, cat): Use write_genotype.
13 daysTest solution of linear system after encryption.Arun Isaac
* tests/test_pyhegp.py: Import math. (square_matrices, negate, is_singular): New functions. (test_conservation_of_solutions): New test.
13 daysSeparate standardization from encryption.Arun Isaac
* pyhegp/pyhegp.py (hegp_encrypt, hegp_decrypt): Do not standardize or unstandardize. (encrypt): Standardize before calling hegp_encrypt. * tests/test_pyhegp.py (test_hegp_encryption_decryption_are_inverses): Do not pass mean and standard deviation for standardization and unstandardization.
13 daysDo not test encryption on order 1 matrices.Arun Isaac
* tests/test_pyhegp.py (test_hegp_encryption_decryption_are_inverses): Do not test encryption on order 1 matrices.
13 daysMention TianjingZhao2023 paper in README.Arun Isaac
* README.md: Mention TianjingZhao2023 paper.
2025-07-18Add CI badge to README.Arun Isaac
* README.md: Add CI badge.
2025-07-17Document usage instructions and workflow.Arun Isaac
* doc/workflow.uml, doc/workflow.png, doc/generate-images.sh: New files. * README.md (How to use): New section.
2025-07-17Add development version installation instructions.Arun Isaac
* README.md (Install development version): New section.
2025-07-17Add cat subcommand.Arun Isaac
* pyhegp/pyhegp.py (cat): New function.
2025-07-17Only output key optionally.Arun Isaac
* pyhegp/pyhegp.py (encrypt): Only output key to file optionally.
2025-07-17Use File instead of Path for encrypt subcommand options.Arun Isaac
* pyhegp/pyhegp.py (encrypt): Use File instead of Path for options.
2025-07-17Turn arguments of the encrypt subcommand into options.Arun Isaac
Prefixed options are easier to follow than the order of positional arguments. * pyhegp/pyhegp.py (encrypt): Turn summary, key and ciphertext arguments into options.
2025-07-17Move read_genotype to pyhegp.serialization.Arun Isaac
* pyhegp/pyhegp.py: Import read_genotype from pyhegp.serialization. (read_genotype): Move to pyhegp.serialization.
2025-07-17Standardize before encryption.Arun Isaac
* pyhegp/pyhegp.py (hegp_encrypt): Standardize before encryption. (hegp_decrypt): Unstandardize after decryption. (encrypt): Pass in mean and standard deviation from summary file to hegp_encrypt. * tests/test_pyhegp.py (test_hegp_encryption_decryption_are_inverses): Pass in mean and standard deviation to hegp_encrypt.
2025-07-17Add standardization.Arun Isaac
* pyhegp/pyhegp.py (standardize): Standardize using mean and standard deviation, instead of the minor allele frequency. (unstandardize): New function. * tests/test_pyhegp.py: Import standardize and unstandardize from pyhegp.pyhegp. (no_column_zero_standard_deviation): New function. (test_standardize_unstandardize_are_inverses): New test.
2025-07-17Add pool subcommand.Arun Isaac
* pyhegp/pyhegp.py: Import namedtuple from collections, and read_summary from pyhegp.serialization. (Stats): New type. (pool_stats, pool): New functions. * tests/test_pyhegp.py: Import Stats and pool_stats from pyhegp.pyhegp. (test_pool_stats): New test.
2025-07-17Add summary subcommand.Arun Isaac
* pyhegp/pyhegp.py: Import Summary and write_summary from pyhegp.serialization. (summary): New function.
2025-07-17Implement the summary file format.Arun Isaac
* doc/file-formats.md, pyhegp/serialization.py, tests/test_serialization.py: New files.
2025-07-17Remove decrypt subcommand.Arun Isaac
Decryption does not make much sense with HEGP. And, the added complexity of standardization makes it even less attractive. * pyhegp/pyhegp.py (decrypt): Delete function.
2025-07-17Use python-pytest built with python-hypothesis-next.Arun Isaac
* .guix/pyhegp-package.scm: Import python-pytest with guix: prefix. (python-pytest): New variable.
2025-07-17Use default array shapes testing encryption/decryption.Arun Isaac
It may be better to sample a smaller set of matrices finely than a large set of matrices coarsely. * tests/test_pyhegp.py (test_hegp_encryption_decryption_are_inverses): Use default array shapes testing encryption/decryption.
2025-07-17Reduce maximum matrix size testing encryption/decryption.Arun Isaac
* tests/test_pyhegp.py (test_hegp_encryption_decryption_are_inverses): Reduce maximum matrix size to 100.
2025-07-17Organize source into directory structure.Arun Isaac
* pyhegp/__init__.py: New file. * pyhegp.py: Move to pyhegp/pyhegp.py. * test_pyhegp.py: Move to tests/test_pyhegp.py. Import from pyhegp.pyhegp instead of from pyhegp. * pyproject.toml (project.scripts)[pyhegp]: Switch to pyhegp.pyhegp:main.