pyhegp - Homomorphic encryption of genotypes and phenotypes

Age	Commit message (Collapse)	Author
2025-09-04	Avoid wildcard import from helpers.strategies.	Arun Isaac

2025-09-04	Limit values in genotype and phenotype strategies.	Arun Isaac

2025-09-04	Test that ciphertext does not contain NA values.	Arun Isaac

2025-09-04	Parameterize number of samples in phenotype frame strategy.	Arun Isaac

2025-09-04	Parameterize number of samples in genotype frame strategy.	Arun Isaac

2025-09-04	Parameterize presence of reference column in genotype frame strategy.	Arun Isaac

2025-09-04	Add keys strategy.	Arun Isaac
	Add keys strategy, and use it.
2025-09-04	Raise exception if data frame to be written has NA values.	Arun Isaac
	This should never occur, but can occur due to bugs in the code; we wish to protect against that.
2025-09-04	Add --force flag to encrypt subcommand permitting file overwriting.	Arun Isaac

2025-09-04	Support encrypting phenotypes.	Arun Isaac

2025-09-04	Compare complete frame in test_cat_*.	Arun Isaac
	It is so much simpler and much more robust to simply compare expected and actual data frames.
2025-09-04	Do not import unused settings from hypothesis.	Arun Isaac

2025-09-04	Test cat_phenotype.	Arun Isaac

2025-09-04	Add cat-phenotype subcommand.	Arun Isaac

2025-09-02	Rename cat subcommand to cat-genotype.	Arun Isaac
	A cat-phenotype subcommand is coming. Hence rename this.
2025-09-02	Add is_phenotype_metadata_column.	Arun Isaac
	Promote phenotype_reserved_column_name_p from helpers.strategies to is_phenotype_metadata_column in pyhegp.serialization.
2025-09-02	Drop duplicates in generated test phenotype frames.	Arun Isaac

2025-09-02	Set CI environment variable when building Guix package.	Arun Isaac

2025-09-02	Merge, not concat, genotype frames.	Arun Isaac
	pd.concat duplicates the metadata columns, and is generally the wrong approach to the problem.
2025-09-02	Test cat_genotype.	Arun Isaac
	Test cat_genotype extensively using hypothesis.
2025-09-02	Add is_genotype_metadata_column.	Arun Isaac
	Promote genotype_reserved_column_name_p from helpers.strategies to is_genotype_metadata_column in pyhegp.serialization, and use it everywhere.
2025-09-02	Drop duplicates in generated test genotype frames.	Arun Isaac

2025-09-02	Catenate an empty list of genotypes.	Arun Isaac
	We handle this as a special case.
2025-09-02	Move hypothesis strategies to separate file.	Arun Isaac
	These strategies may be used by other test modules as well.
2025-09-02	Add cat_genotype workhorse function.	Arun Isaac
	Move workhorse logic of the cat command to a separate function. This will make it easy to test the logic without having to invoke the command itself.
2025-09-02	Suffix CLI subcommand functions with _command.	Arun Isaac
	We distinguish CLI subcommand functions using the _command suffix. This way, we don't have to concoct weird names for the actual workhorse functions. To remain consistent, we also suffix _command to the command testing functions.
2025-09-01	Do not require output ciphertext file path.	Arun Isaac
	Make output ciphertext file path implicit; infer it by appending ".hegp" to the plaintext file. We take inspiration from GnuPG.
2025-09-01	Pass dtype to read_csv.	Arun Isaac
	read_csv can incorrectly infer that the string "00" is the integer 0. To avoid this ambiguity, pass the correct dtype to read_csv.
2025-09-01	Use open method of Path object, rather than the open function.	Arun Isaac

2025-09-01	Do not skip blank lines when reading TSV files.	Arun Isaac

2025-09-01	Decide to not use logging.	Arun Isaac
	Remove comments mentioning logging. Command-line error messages have their own place; they are not the same as logging.
2025-09-01	Test for existence of output files.	Arun Isaac
	We were testing for zero exit status. Now, in addition, we test for the existence of output files. This is slightly more robust.
2025-09-01	Title case sentence.	Arun Isaac

2025-09-01	Add phenotype file format and serialization functions.	Arun Isaac

2025-08-08	Clarify that the test suite is not for end users.	Arun Isaac
	End users who install pyhegp via pip cannot run the test suite. Clarify this in the README. Perhaps, in the future, we should move these developer-oriented instructions to a separate document.
2025-08-08	Separate table of contents from introduction.	Arun Isaac
	If not separated, GitHub combines the table of contents with the list of papers in the introduction.
2025-08-08	Add table of contents to README.	Arun Isaac
	A table of contents gives people a brief overview of what's in the README, and allows them to jump to the section they are interested in.
2025-08-08	Replace csv extension with tsv extension on genotype files.	Arun Isaac

2025-08-08	Remove txt extension from summary files.	Arun Isaac

2025-08-08	Link to file formats documentation from README.	Arun Isaac
	Readers are more likely to follow through to the file formats documentation if there is a link.
2025-08-08	Add example key file.	Arun Isaac

2025-08-08	Add example genotype file.	Arun Isaac

2025-08-08	Add example summary file.	Arun Isaac

2025-08-08	Reduce precision in test data files.	Arun Isaac
	Reducing precision lowers the file size and makes the files more human-comprehensible.
2025-08-08	Add instructions to install via Guix.	Arun Isaac

2025-08-08	Mark virtual environment creation as optional.	Arun Isaac
	Not everyone may want to create a virtual environment. For example, on some HPC machines, creating a virtual environment is complicated or does not work.
2025-08-08	Package as a CLI utility only, not a Python library.	Arun Isaac
	We have not exposed a Python library interface, and it is not clear if we need to. We can revisit this decision later, if need be.
2025-08-06	Subset to common SNPs.	Arun Isaac
	* pyhegp/pyhegp.py: Import reduce from functools. (pool_summaries, encrypt_genotype): New functions. (pool): Use pool_summaries. (encrypt): Use encrypt_genotype. * tests/test_pyhegp.py: Import pandas; Summary, read_summary and read_genotype from pyhegp.serialization. (test_pool, test_encrypt): New tests. * test-data/encrypt-test-encrypted-genotype.tsv, test-data/encrypt-test-genotype.tsv, test-data/encrypt-test-key, test-data/encrypt-test-summary, test-data/pool-test-complete-summary, test-data/pool-test-summary1, test-data/pool-test-summary2: New files.
2025-08-06	Standardize key files.	Arun Isaac
	* doc/file-formats.md (File formats)[key file]: New section. * pyhegp/serialization.py: Import numpy. (read_key, write_key): New functions. * pyhegp/pyhegp.py: Import write_key from pyhegp.serialization. (encrypt): Use write_key. * tests/test_serialization.py: Import arrays and array_shapes from hypothesis.extra.numpy; approx from pytest; read_key and write_key from pyhegp.serialization. (test_read_write_key_are_inverses): New test.
2025-08-06	Compute summary on encryption if not provided.	Arun Isaac
	* pyhegp/pyhegp.py (genotype_summary): New function. (summary): Use genotype_summary. (encrypt): Compute summary if not provided. * tests/test_pyhegp.py (test_simple_workflow): Remove xfail mark.