about summary refs log tree commit diff
AgeCommit message (Collapse)Author
2025-09-02Test cat_genotype.Arun Isaac
Test cat_genotype extensively using hypothesis.
2025-09-02Add is_genotype_metadata_column.Arun Isaac
Promote genotype_reserved_column_name_p from helpers.strategies to is_genotype_metadata_column in pyhegp.serialization, and use it everywhere.
2025-09-02Drop duplicates in generated test genotype frames.Arun Isaac
2025-09-02Catenate an empty list of genotypes.Arun Isaac
We handle this as a special case.
2025-09-02Move hypothesis strategies to separate file.Arun Isaac
These strategies may be used by other test modules as well.
2025-09-02Add cat_genotype workhorse function.Arun Isaac
Move workhorse logic of the cat command to a separate function. This will make it easy to test the logic without having to invoke the command itself.
2025-09-02Suffix CLI subcommand functions with _command.Arun Isaac
We distinguish CLI subcommand functions using the _command suffix. This way, we don't have to concoct weird names for the actual workhorse functions. To remain consistent, we also suffix _command to the command testing functions.
2025-09-01Do not require output ciphertext file path.Arun Isaac
Make output ciphertext file path implicit; infer it by appending ".hegp" to the plaintext file. We take inspiration from GnuPG.
2025-09-01Pass dtype to read_csv.Arun Isaac
read_csv can incorrectly infer that the string "00" is the integer 0. To avoid this ambiguity, pass the correct dtype to read_csv.
2025-09-01Use open method of Path object, rather than the open function.Arun Isaac
2025-09-01Do not skip blank lines when reading TSV files.Arun Isaac
2025-09-01Decide to not use logging.Arun Isaac
Remove comments mentioning logging. Command-line error messages have their own place; they are not the same as logging.
2025-09-01Test for existence of output files.Arun Isaac
We were testing for zero exit status. Now, in addition, we test for the existence of output files. This is slightly more robust.
2025-09-01Title case sentence.Arun Isaac
2025-09-01Add phenotype file format and serialization functions.Arun Isaac
2025-08-08Clarify that the test suite is not for end users.Arun Isaac
End users who install pyhegp via pip cannot run the test suite. Clarify this in the README. Perhaps, in the future, we should move these developer-oriented instructions to a separate document.
2025-08-08Separate table of contents from introduction.Arun Isaac
If not separated, GitHub combines the table of contents with the list of papers in the introduction.
2025-08-08Add table of contents to README.Arun Isaac
A table of contents gives people a brief overview of what's in the README, and allows them to jump to the section they are interested in.
2025-08-08Replace csv extension with tsv extension on genotype files.Arun Isaac
2025-08-08Remove txt extension from summary files.Arun Isaac
2025-08-08Link to file formats documentation from README.Arun Isaac
Readers are more likely to follow through to the file formats documentation if there is a link.
2025-08-08Add example key file.Arun Isaac
2025-08-08Add example genotype file.Arun Isaac
2025-08-08Add example summary file.Arun Isaac
2025-08-08Reduce precision in test data files.Arun Isaac
Reducing precision lowers the file size and makes the files more human-comprehensible.
2025-08-08Add instructions to install via Guix.Arun Isaac
2025-08-08Mark virtual environment creation as optional.Arun Isaac
Not everyone may want to create a virtual environment. For example, on some HPC machines, creating a virtual environment is complicated or does not work.
2025-08-08Package as a CLI utility only, not a Python library.Arun Isaac
We have not exposed a Python library interface, and it is not clear if we need to. We can revisit this decision later, if need be.
2025-08-06Subset to common SNPs.Arun Isaac
* pyhegp/pyhegp.py: Import reduce from functools. (pool_summaries, encrypt_genotype): New functions. (pool): Use pool_summaries. (encrypt): Use encrypt_genotype. * tests/test_pyhegp.py: Import pandas; Summary, read_summary and read_genotype from pyhegp.serialization. (test_pool, test_encrypt): New tests. * test-data/encrypt-test-encrypted-genotype.tsv, test-data/encrypt-test-genotype.tsv, test-data/encrypt-test-key, test-data/encrypt-test-summary, test-data/pool-test-complete-summary, test-data/pool-test-summary1, test-data/pool-test-summary2: New files.
2025-08-06Standardize key files.Arun Isaac
* doc/file-formats.md (File formats)[key file]: New section. * pyhegp/serialization.py: Import numpy. (read_key, write_key): New functions. * pyhegp/pyhegp.py: Import write_key from pyhegp.serialization. (encrypt): Use write_key. * tests/test_serialization.py: Import arrays and array_shapes from hypothesis.extra.numpy; approx from pytest; read_key and write_key from pyhegp.serialization. (test_read_write_key_are_inverses): New test.
2025-08-06Compute summary on encryption if not provided.Arun Isaac
* pyhegp/pyhegp.py (genotype_summary): New function. (summary): Use genotype_summary. (encrypt): Compute summary if not provided. * tests/test_pyhegp.py (test_simple_workflow): Remove xfail mark.
2025-08-06Add simple workflow.Arun Isaac
* README.md (How to use): Indent down into "Joint/federated analysis with many data owners" section. [Simple data sharing]: New section. * doc/generate-images.sh: Add simple workflow. * doc/workflow.png: Rename to doc/joint-workflow.png. * doc/workflow.uml: Rename to doc/joint-workflow.uml. * doc/simple-workflow.png, doc/simple-workflow.uml: New files. * tests/test_pyhegp.py: Import pytest. (test_simple_workflow): New test. * test-data/genotype.tsv: New file.
2025-08-06Test joint workflow CLI.Arun Isaac
* tests/test_pyhegp.py: Import CliRunner from click.testing, and main from pyhegp.pyhegp. (test_joint_workflow): New test. * test-data/genotype0.tsv, test-data/genotype1.tsv, test-data/genotype2.tsv, test-data/genotype3.tsv: New files.
2025-08-06Standardize file formats in the likeness of plink files.Arun Isaac
* pyhegp/pyhegp.py: Import pandas. (summary, pool, encrypt, cat): Use pandas data frames and new data format. * pyhegp/serialization.py: Import csv and pandas. (Summary)[mean, std]: Delete fields. [data]: New field. (read_summary, write_summary, read_genotype, write_genotype): Use pandas data frames and new data format. * tests/test_serialization.py: Import column, columns and data_frames from hypothesis.extra.pandas; pandas; negate from pyhegp.utils. Do not import hypothesis.extra.numpy and approx from pytest. (tabless_printable_ascii_text, chromosome_column, position_column, reference_column, sample_names): New variables. (summaries, genotype_reserved_column_name_p, genotype_frames): New functions. (test_read_write_summary_are_inverses): Use pandas data frames and new data format. (test_read_write_genotype_are_inverses): Use pandas for testing. * doc/file-formats.md (File formats)[summary file]: Describe new standard. [genotype file]: New section. * .guix/pyhegp-package.scm (pyhegp-package): Import python-pandas from (gnu packages python-science). (python-pyhegp)[propagated-inputs]: Add python-pandas. * pyproject.toml (dependencies): Add pandas.
2025-08-06Move negate to pyhegp.utils.Arun Isaac
* tests/test_pyhegp.py (negate): Move to pyhegp.utils. Import negate from pyhegp.utils. * pyhegp/utils.py: New file.
2025-08-06Add gitignore.Arun Isaac
* .gitignore: New file.
2025-08-06Loosen relative tolerance in test_pool_stats.Arun Isaac
* tests/test_pyhegp.py (test_pool_stats): Set relative tolerance to 1e-6.
2025-08-01Rename genotype_file argument in read_genotype.Arun Isaac
* pyhegp/serialization.py (read_genotype): Rename genotype_file argument to file.
2025-08-01Test that read_genotype and write_genotype are inverses.Arun Isaac
* tests/test_serialization.py: Import read_genotype and write_genotype from pyhegp.serialization. (test_read_write_genotype_are_inverses): New test.
2025-08-01Ensure that read genotype matrices have 2 dimensions.Arun Isaac
* pyhegp/serialization.py (read_genotype): Ensure 2 dimensions.
2025-08-01Write genotype matrix with increased precision.Arun Isaac
* pyhegp/serialization.py (write_genotype): Write with format %.8g.
2025-08-01Tab-separate data section of summary files.Arun Isaac
* pyhegp/serialization.py (read_summary, write_summary): Use tab as the delimiter. * doc/file-formats.md (File formats)[summary file]: Update documentation.
2025-08-01Abstract out write_genotype.Arun Isaac
* pyhegp/serialization.py (write_genotype): New function. * pyhegp/pyhegp.py: Import write_genotype from pyhegp.serialization. (encrypt, cat): Use write_genotype.
2025-08-01Test solution of linear system after encryption.Arun Isaac
* tests/test_pyhegp.py: Import math. (square_matrices, negate, is_singular): New functions. (test_conservation_of_solutions): New test.
2025-08-01Separate standardization from encryption.Arun Isaac
* pyhegp/pyhegp.py (hegp_encrypt, hegp_decrypt): Do not standardize or unstandardize. (encrypt): Standardize before calling hegp_encrypt. * tests/test_pyhegp.py (test_hegp_encryption_decryption_are_inverses): Do not pass mean and standard deviation for standardization and unstandardization.
2025-08-01Do not test encryption on order 1 matrices.Arun Isaac
* tests/test_pyhegp.py (test_hegp_encryption_decryption_are_inverses): Do not test encryption on order 1 matrices.
2025-08-01Mention TianjingZhao2023 paper in README.Arun Isaac
* README.md: Mention TianjingZhao2023 paper.
2025-07-18Add CI badge to README.Arun Isaac
* README.md: Add CI badge.
2025-07-17Document usage instructions and workflow.Arun Isaac
* doc/workflow.uml, doc/workflow.png, doc/generate-images.sh: New files. * README.md (How to use): New section.
2025-07-17Add development version installation instructions.Arun Isaac
* README.md (Install development version): New section.