about summary refs log tree commit diff
AgeCommit message (Collapse)Author
2025-09-04Raise exception if data frame to be written has NA values.Arun Isaac
This should never occur, but can occur due to bugs in the code; we wish to protect against that.
2025-09-04Add --force flag to encrypt subcommand permitting file overwriting.Arun Isaac
2025-09-04Support encrypting phenotypes.Arun Isaac
2025-09-04Compare complete frame in test_cat_*.Arun Isaac
It is so much simpler and much more robust to simply compare expected and actual data frames.
2025-09-04Do not import unused settings from hypothesis.Arun Isaac
2025-09-04Test cat_phenotype.Arun Isaac
2025-09-04Add cat-phenotype subcommand.Arun Isaac
2025-09-02Rename cat subcommand to cat-genotype.Arun Isaac
A cat-phenotype subcommand is coming. Hence rename this.
2025-09-02Add is_phenotype_metadata_column.Arun Isaac
Promote phenotype_reserved_column_name_p from helpers.strategies to is_phenotype_metadata_column in pyhegp.serialization.
2025-09-02Drop duplicates in generated test phenotype frames.Arun Isaac
2025-09-02Set CI environment variable when building Guix package.Arun Isaac
2025-09-02Merge, not concat, genotype frames.Arun Isaac
pd.concat duplicates the metadata columns, and is generally the wrong approach to the problem.
2025-09-02Test cat_genotype.Arun Isaac
Test cat_genotype extensively using hypothesis.
2025-09-02Add is_genotype_metadata_column.Arun Isaac
Promote genotype_reserved_column_name_p from helpers.strategies to is_genotype_metadata_column in pyhegp.serialization, and use it everywhere.
2025-09-02Drop duplicates in generated test genotype frames.Arun Isaac
2025-09-02Catenate an empty list of genotypes.Arun Isaac
We handle this as a special case.
2025-09-02Move hypothesis strategies to separate file.Arun Isaac
These strategies may be used by other test modules as well.
2025-09-02Add cat_genotype workhorse function.Arun Isaac
Move workhorse logic of the cat command to a separate function. This will make it easy to test the logic without having to invoke the command itself.
2025-09-02Suffix CLI subcommand functions with _command.Arun Isaac
We distinguish CLI subcommand functions using the _command suffix. This way, we don't have to concoct weird names for the actual workhorse functions. To remain consistent, we also suffix _command to the command testing functions.
2025-09-01Do not require output ciphertext file path.Arun Isaac
Make output ciphertext file path implicit; infer it by appending ".hegp" to the plaintext file. We take inspiration from GnuPG.
2025-09-01Pass dtype to read_csv.Arun Isaac
read_csv can incorrectly infer that the string "00" is the integer 0. To avoid this ambiguity, pass the correct dtype to read_csv.
2025-09-01Use open method of Path object, rather than the open function.Arun Isaac
2025-09-01Do not skip blank lines when reading TSV files.Arun Isaac
2025-09-01Decide to not use logging.Arun Isaac
Remove comments mentioning logging. Command-line error messages have their own place; they are not the same as logging.
2025-09-01Test for existence of output files.Arun Isaac
We were testing for zero exit status. Now, in addition, we test for the existence of output files. This is slightly more robust.
2025-09-01Title case sentence.Arun Isaac
2025-09-01Add phenotype file format and serialization functions.Arun Isaac
2025-08-08Clarify that the test suite is not for end users.Arun Isaac
End users who install pyhegp via pip cannot run the test suite. Clarify this in the README. Perhaps, in the future, we should move these developer-oriented instructions to a separate document.
2025-08-08Separate table of contents from introduction.Arun Isaac
If not separated, GitHub combines the table of contents with the list of papers in the introduction.
2025-08-08Add table of contents to README.Arun Isaac
A table of contents gives people a brief overview of what's in the README, and allows them to jump to the section they are interested in.
2025-08-08Replace csv extension with tsv extension on genotype files.Arun Isaac
2025-08-08Remove txt extension from summary files.Arun Isaac
2025-08-08Link to file formats documentation from README.Arun Isaac
Readers are more likely to follow through to the file formats documentation if there is a link.
2025-08-08Add example key file.Arun Isaac
2025-08-08Add example genotype file.Arun Isaac
2025-08-08Add example summary file.Arun Isaac
2025-08-08Reduce precision in test data files.Arun Isaac
Reducing precision lowers the file size and makes the files more human-comprehensible.
2025-08-08Add instructions to install via Guix.Arun Isaac
2025-08-08Mark virtual environment creation as optional.Arun Isaac
Not everyone may want to create a virtual environment. For example, on some HPC machines, creating a virtual environment is complicated or does not work.
2025-08-08Package as a CLI utility only, not a Python library.Arun Isaac
We have not exposed a Python library interface, and it is not clear if we need to. We can revisit this decision later, if need be.
2025-08-06Subset to common SNPs.Arun Isaac
* pyhegp/pyhegp.py: Import reduce from functools. (pool_summaries, encrypt_genotype): New functions. (pool): Use pool_summaries. (encrypt): Use encrypt_genotype. * tests/test_pyhegp.py: Import pandas; Summary, read_summary and read_genotype from pyhegp.serialization. (test_pool, test_encrypt): New tests. * test-data/encrypt-test-encrypted-genotype.tsv, test-data/encrypt-test-genotype.tsv, test-data/encrypt-test-key, test-data/encrypt-test-summary, test-data/pool-test-complete-summary, test-data/pool-test-summary1, test-data/pool-test-summary2: New files.
2025-08-06Standardize key files.Arun Isaac
* doc/file-formats.md (File formats)[key file]: New section. * pyhegp/serialization.py: Import numpy. (read_key, write_key): New functions. * pyhegp/pyhegp.py: Import write_key from pyhegp.serialization. (encrypt): Use write_key. * tests/test_serialization.py: Import arrays and array_shapes from hypothesis.extra.numpy; approx from pytest; read_key and write_key from pyhegp.serialization. (test_read_write_key_are_inverses): New test.
2025-08-06Compute summary on encryption if not provided.Arun Isaac
* pyhegp/pyhegp.py (genotype_summary): New function. (summary): Use genotype_summary. (encrypt): Compute summary if not provided. * tests/test_pyhegp.py (test_simple_workflow): Remove xfail mark.
2025-08-06Add simple workflow.Arun Isaac
* README.md (How to use): Indent down into "Joint/federated analysis with many data owners" section. [Simple data sharing]: New section. * doc/generate-images.sh: Add simple workflow. * doc/workflow.png: Rename to doc/joint-workflow.png. * doc/workflow.uml: Rename to doc/joint-workflow.uml. * doc/simple-workflow.png, doc/simple-workflow.uml: New files. * tests/test_pyhegp.py: Import pytest. (test_simple_workflow): New test. * test-data/genotype.tsv: New file.
2025-08-06Test joint workflow CLI.Arun Isaac
* tests/test_pyhegp.py: Import CliRunner from click.testing, and main from pyhegp.pyhegp. (test_joint_workflow): New test. * test-data/genotype0.tsv, test-data/genotype1.tsv, test-data/genotype2.tsv, test-data/genotype3.tsv: New files.
2025-08-06Standardize file formats in the likeness of plink files.Arun Isaac
* pyhegp/pyhegp.py: Import pandas. (summary, pool, encrypt, cat): Use pandas data frames and new data format. * pyhegp/serialization.py: Import csv and pandas. (Summary)[mean, std]: Delete fields. [data]: New field. (read_summary, write_summary, read_genotype, write_genotype): Use pandas data frames and new data format. * tests/test_serialization.py: Import column, columns and data_frames from hypothesis.extra.pandas; pandas; negate from pyhegp.utils. Do not import hypothesis.extra.numpy and approx from pytest. (tabless_printable_ascii_text, chromosome_column, position_column, reference_column, sample_names): New variables. (summaries, genotype_reserved_column_name_p, genotype_frames): New functions. (test_read_write_summary_are_inverses): Use pandas data frames and new data format. (test_read_write_genotype_are_inverses): Use pandas for testing. * doc/file-formats.md (File formats)[summary file]: Describe new standard. [genotype file]: New section. * .guix/pyhegp-package.scm (pyhegp-package): Import python-pandas from (gnu packages python-science). (python-pyhegp)[propagated-inputs]: Add python-pandas. * pyproject.toml (dependencies): Add pandas.
2025-08-06Move negate to pyhegp.utils.Arun Isaac
* tests/test_pyhegp.py (negate): Move to pyhegp.utils. Import negate from pyhegp.utils. * pyhegp/utils.py: New file.
2025-08-06Add gitignore.Arun Isaac
* .gitignore: New file.
2025-08-06Loosen relative tolerance in test_pool_stats.Arun Isaac
* tests/test_pyhegp.py (test_pool_stats): Set relative tolerance to 1e-6.
2025-08-01Rename genotype_file argument in read_genotype.Arun Isaac
* pyhegp/serialization.py (read_genotype): Rename genotype_file argument to file.