about summary refs log tree commit diff
path: root/tests
AgeCommit message (Collapse)Author
2025-09-06Generalize split_data_frame to split along any axis.Arun Isaac
2025-09-06Simplify split_data_frame so it is more composable.Arun Isaac
split_data_frame should only split the data frame. It should not be filtering out metadata columns.
2025-09-05Generate unique SNPs in genotype frames without dropping duplicates.Arun Isaac
Earlier, we were generating unique SNPs in genotype frames by dropping duplicates. This meant we couldn't control the number of SNPs. Rejection sampling is also not an option because it is too expensive. So, we now generate unique SNPs directly, by first generating a list with unique elements and then converting to a data frame.
2025-09-05Deduplicate genotype frame metadata generation.Arun Isaac
Abstract out generation of genotype frame metadata (namely chromosome, position and reference) from summaries and genotype_frames into a new helper function genotype_metadata.
2025-09-05Drop SNPs with a zero standard deviation.Arun Isaac
2025-09-04Avoid wildcard import from helpers.strategies.Arun Isaac
2025-09-04Limit values in genotype and phenotype strategies.Arun Isaac
2025-09-04Test that ciphertext does not contain NA values.Arun Isaac
2025-09-04Parameterize number of samples in phenotype frame strategy.Arun Isaac
2025-09-04Parameterize number of samples in genotype frame strategy.Arun Isaac
2025-09-04Parameterize presence of reference column in genotype frame strategy.Arun Isaac
2025-09-04Add keys strategy.Arun Isaac
Add keys strategy, and use it.
2025-09-04Compare complete frame in test_cat_*.Arun Isaac
It is so much simpler and much more robust to simply compare expected and actual data frames.
2025-09-04Do not import unused settings from hypothesis.Arun Isaac
2025-09-04Test cat_phenotype.Arun Isaac
2025-09-02Rename cat subcommand to cat-genotype.Arun Isaac
A cat-phenotype subcommand is coming. Hence rename this.
2025-09-02Add is_phenotype_metadata_column.Arun Isaac
Promote phenotype_reserved_column_name_p from helpers.strategies to is_phenotype_metadata_column in pyhegp.serialization.
2025-09-02Drop duplicates in generated test phenotype frames.Arun Isaac
2025-09-02Merge, not concat, genotype frames.Arun Isaac
pd.concat duplicates the metadata columns, and is generally the wrong approach to the problem.
2025-09-02Test cat_genotype.Arun Isaac
Test cat_genotype extensively using hypothesis.
2025-09-02Add is_genotype_metadata_column.Arun Isaac
Promote genotype_reserved_column_name_p from helpers.strategies to is_genotype_metadata_column in pyhegp.serialization, and use it everywhere.
2025-09-02Drop duplicates in generated test genotype frames.Arun Isaac
2025-09-02Move hypothesis strategies to separate file.Arun Isaac
These strategies may be used by other test modules as well.
2025-09-02Suffix CLI subcommand functions with _command.Arun Isaac
We distinguish CLI subcommand functions using the _command suffix. This way, we don't have to concoct weird names for the actual workhorse functions. To remain consistent, we also suffix _command to the command testing functions.
2025-09-01Do not require output ciphertext file path.Arun Isaac
Make output ciphertext file path implicit; infer it by appending ".hegp" to the plaintext file. We take inspiration from GnuPG.
2025-09-01Use open method of Path object, rather than the open function.Arun Isaac
2025-09-01Test for existence of output files.Arun Isaac
We were testing for zero exit status. Now, in addition, we test for the existence of output files. This is slightly more robust.
2025-09-01Add phenotype file format and serialization functions.Arun Isaac
2025-08-06Subset to common SNPs.Arun Isaac
* pyhegp/pyhegp.py: Import reduce from functools. (pool_summaries, encrypt_genotype): New functions. (pool): Use pool_summaries. (encrypt): Use encrypt_genotype. * tests/test_pyhegp.py: Import pandas; Summary, read_summary and read_genotype from pyhegp.serialization. (test_pool, test_encrypt): New tests. * test-data/encrypt-test-encrypted-genotype.tsv, test-data/encrypt-test-genotype.tsv, test-data/encrypt-test-key, test-data/encrypt-test-summary, test-data/pool-test-complete-summary, test-data/pool-test-summary1, test-data/pool-test-summary2: New files.
2025-08-06Standardize key files.Arun Isaac
* doc/file-formats.md (File formats)[key file]: New section. * pyhegp/serialization.py: Import numpy. (read_key, write_key): New functions. * pyhegp/pyhegp.py: Import write_key from pyhegp.serialization. (encrypt): Use write_key. * tests/test_serialization.py: Import arrays and array_shapes from hypothesis.extra.numpy; approx from pytest; read_key and write_key from pyhegp.serialization. (test_read_write_key_are_inverses): New test.
2025-08-06Compute summary on encryption if not provided.Arun Isaac
* pyhegp/pyhegp.py (genotype_summary): New function. (summary): Use genotype_summary. (encrypt): Compute summary if not provided. * tests/test_pyhegp.py (test_simple_workflow): Remove xfail mark.
2025-08-06Add simple workflow.Arun Isaac
* README.md (How to use): Indent down into "Joint/federated analysis with many data owners" section. [Simple data sharing]: New section. * doc/generate-images.sh: Add simple workflow. * doc/workflow.png: Rename to doc/joint-workflow.png. * doc/workflow.uml: Rename to doc/joint-workflow.uml. * doc/simple-workflow.png, doc/simple-workflow.uml: New files. * tests/test_pyhegp.py: Import pytest. (test_simple_workflow): New test. * test-data/genotype.tsv: New file.
2025-08-06Test joint workflow CLI.Arun Isaac
* tests/test_pyhegp.py: Import CliRunner from click.testing, and main from pyhegp.pyhegp. (test_joint_workflow): New test. * test-data/genotype0.tsv, test-data/genotype1.tsv, test-data/genotype2.tsv, test-data/genotype3.tsv: New files.
2025-08-06Standardize file formats in the likeness of plink files.Arun Isaac
* pyhegp/pyhegp.py: Import pandas. (summary, pool, encrypt, cat): Use pandas data frames and new data format. * pyhegp/serialization.py: Import csv and pandas. (Summary)[mean, std]: Delete fields. [data]: New field. (read_summary, write_summary, read_genotype, write_genotype): Use pandas data frames and new data format. * tests/test_serialization.py: Import column, columns and data_frames from hypothesis.extra.pandas; pandas; negate from pyhegp.utils. Do not import hypothesis.extra.numpy and approx from pytest. (tabless_printable_ascii_text, chromosome_column, position_column, reference_column, sample_names): New variables. (summaries, genotype_reserved_column_name_p, genotype_frames): New functions. (test_read_write_summary_are_inverses): Use pandas data frames and new data format. (test_read_write_genotype_are_inverses): Use pandas for testing. * doc/file-formats.md (File formats)[summary file]: Describe new standard. [genotype file]: New section. * .guix/pyhegp-package.scm (pyhegp-package): Import python-pandas from (gnu packages python-science). (python-pyhegp)[propagated-inputs]: Add python-pandas. * pyproject.toml (dependencies): Add pandas.
2025-08-06Move negate to pyhegp.utils.Arun Isaac
* tests/test_pyhegp.py (negate): Move to pyhegp.utils. Import negate from pyhegp.utils. * pyhegp/utils.py: New file.
2025-08-06Loosen relative tolerance in test_pool_stats.Arun Isaac
* tests/test_pyhegp.py (test_pool_stats): Set relative tolerance to 1e-6.
2025-08-01Test that read_genotype and write_genotype are inverses.Arun Isaac
* tests/test_serialization.py: Import read_genotype and write_genotype from pyhegp.serialization. (test_read_write_genotype_are_inverses): New test.
2025-08-01Test solution of linear system after encryption.Arun Isaac
* tests/test_pyhegp.py: Import math. (square_matrices, negate, is_singular): New functions. (test_conservation_of_solutions): New test.
2025-08-01Separate standardization from encryption.Arun Isaac
* pyhegp/pyhegp.py (hegp_encrypt, hegp_decrypt): Do not standardize or unstandardize. (encrypt): Standardize before calling hegp_encrypt. * tests/test_pyhegp.py (test_hegp_encryption_decryption_are_inverses): Do not pass mean and standard deviation for standardization and unstandardization.
2025-08-01Do not test encryption on order 1 matrices.Arun Isaac
* tests/test_pyhegp.py (test_hegp_encryption_decryption_are_inverses): Do not test encryption on order 1 matrices.
2025-07-17Standardize before encryption.Arun Isaac
* pyhegp/pyhegp.py (hegp_encrypt): Standardize before encryption. (hegp_decrypt): Unstandardize after decryption. (encrypt): Pass in mean and standard deviation from summary file to hegp_encrypt. * tests/test_pyhegp.py (test_hegp_encryption_decryption_are_inverses): Pass in mean and standard deviation to hegp_encrypt.
2025-07-17Add standardization.Arun Isaac
* pyhegp/pyhegp.py (standardize): Standardize using mean and standard deviation, instead of the minor allele frequency. (unstandardize): New function. * tests/test_pyhegp.py: Import standardize and unstandardize from pyhegp.pyhegp. (no_column_zero_standard_deviation): New function. (test_standardize_unstandardize_are_inverses): New test.
2025-07-17Add pool subcommand.Arun Isaac
* pyhegp/pyhegp.py: Import namedtuple from collections, and read_summary from pyhegp.serialization. (Stats): New type. (pool_stats, pool): New functions. * tests/test_pyhegp.py: Import Stats and pool_stats from pyhegp.pyhegp. (test_pool_stats): New test.
2025-07-17Implement the summary file format.Arun Isaac
* doc/file-formats.md, pyhegp/serialization.py, tests/test_serialization.py: New files.
2025-07-17Use default array shapes testing encryption/decryption.Arun Isaac
It may be better to sample a smaller set of matrices finely than a large set of matrices coarsely. * tests/test_pyhegp.py (test_hegp_encryption_decryption_are_inverses): Use default array shapes testing encryption/decryption.
2025-07-17Reduce maximum matrix size testing encryption/decryption.Arun Isaac
* tests/test_pyhegp.py (test_hegp_encryption_decryption_are_inverses): Reduce maximum matrix size to 100.
2025-07-17Organize source into directory structure.Arun Isaac
* pyhegp/__init__.py: New file. * pyhegp.py: Move to pyhegp/pyhegp.py. * test_pyhegp.py: Move to tests/test_pyhegp.py. Import from pyhegp.pyhegp instead of from pyhegp. * pyproject.toml (project.scripts)[pyhegp]: Switch to pyhegp.pyhegp:main.