| Age | Commit message (Collapse) | Author | 
|---|
|  |  | 
|  |  | 
|  |  | 
|  |  | 
|  |  | 
|  |  | 
|  |  | 
|  | Add keys strategy, and use it. | 
|  | It is so much simpler and much more robust to simply compare expected
and actual data frames. | 
|  |  | 
|  |  | 
|  | A cat-phenotype subcommand is coming. Hence rename this. | 
|  | Promote phenotype_reserved_column_name_p from helpers.strategies to
is_phenotype_metadata_column in pyhegp.serialization. | 
|  |  | 
|  | pd.concat duplicates the metadata columns, and is generally the wrong
approach to the problem. | 
|  | Test cat_genotype extensively using hypothesis. | 
|  | Promote genotype_reserved_column_name_p from helpers.strategies to
is_genotype_metadata_column in pyhegp.serialization, and use it
everywhere. | 
|  |  | 
|  | These strategies may be used by other test modules as well. | 
|  | We distinguish CLI subcommand functions using the _command suffix.
This way, we don't have to concoct weird names for the actual
workhorse functions.
To remain consistent, we also suffix _command to the command testing
functions. | 
|  | Make output ciphertext file path implicit; infer it by appending
".hegp" to the plaintext file. We take inspiration from GnuPG. | 
|  |  | 
|  | We were testing for zero exit status. Now, in addition, we test for
the existence of output files. This is slightly more robust. | 
|  |  | 
|  | * pyhegp/pyhegp.py: Import reduce from functools.
(pool_summaries, encrypt_genotype): New functions.
(pool): Use pool_summaries.
(encrypt): Use encrypt_genotype.
* tests/test_pyhegp.py: Import pandas; Summary, read_summary and
read_genotype from pyhegp.serialization.
(test_pool, test_encrypt): New tests.
* test-data/encrypt-test-encrypted-genotype.tsv,
test-data/encrypt-test-genotype.tsv, test-data/encrypt-test-key,
test-data/encrypt-test-summary, test-data/pool-test-complete-summary,
test-data/pool-test-summary1, test-data/pool-test-summary2: New files. | 
|  | * doc/file-formats.md (File formats)[key file]: New section.
* pyhegp/serialization.py: Import numpy.
(read_key, write_key): New functions.
* pyhegp/pyhegp.py: Import write_key from pyhegp.serialization.
(encrypt): Use write_key.
* tests/test_serialization.py: Import arrays and array_shapes from
hypothesis.extra.numpy; approx from pytest; read_key and write_key
from pyhegp.serialization.
(test_read_write_key_are_inverses): New test. | 
|  | * pyhegp/pyhegp.py (genotype_summary): New function.
(summary): Use genotype_summary.
(encrypt): Compute summary if not provided.
* tests/test_pyhegp.py (test_simple_workflow): Remove xfail mark. | 
|  | * README.md (How to use): Indent down into "Joint/federated analysis
with many data owners" section.
[Simple data sharing]: New section.
* doc/generate-images.sh: Add simple workflow.
* doc/workflow.png: Rename to doc/joint-workflow.png.
* doc/workflow.uml: Rename to doc/joint-workflow.uml.
* doc/simple-workflow.png, doc/simple-workflow.uml: New files.
* tests/test_pyhegp.py: Import pytest.
(test_simple_workflow): New test.
* test-data/genotype.tsv: New file. | 
|  | * tests/test_pyhegp.py: Import CliRunner from click.testing, and main
from pyhegp.pyhegp.
(test_joint_workflow): New test.
* test-data/genotype0.tsv, test-data/genotype1.tsv,
test-data/genotype2.tsv, test-data/genotype3.tsv: New files. | 
|  | * pyhegp/pyhegp.py: Import pandas.
(summary, pool, encrypt, cat): Use pandas data frames and new data
format.
* pyhegp/serialization.py: Import csv and pandas.
(Summary)[mean, std]: Delete fields.
[data]: New field.
(read_summary, write_summary, read_genotype, write_genotype): Use
pandas data frames and new data format.
* tests/test_serialization.py: Import column, columns and data_frames
from hypothesis.extra.pandas; pandas; negate from pyhegp.utils. Do not
import hypothesis.extra.numpy and approx from pytest.
(tabless_printable_ascii_text, chromosome_column, position_column,
reference_column, sample_names): New variables.
(summaries, genotype_reserved_column_name_p, genotype_frames): New
functions.
(test_read_write_summary_are_inverses): Use pandas data frames and new
data format.
(test_read_write_genotype_are_inverses): Use pandas for testing.
* doc/file-formats.md (File formats)[summary file]: Describe new
standard.
[genotype file]: New section.
* .guix/pyhegp-package.scm (pyhegp-package): Import python-pandas
from (gnu packages python-science).
(python-pyhegp)[propagated-inputs]: Add python-pandas.
* pyproject.toml (dependencies): Add pandas. | 
|  | * tests/test_pyhegp.py (negate): Move to pyhegp.utils.
Import negate from pyhegp.utils.
* pyhegp/utils.py: New file. | 
|  | * tests/test_pyhegp.py (test_pool_stats): Set relative tolerance to
1e-6. | 
|  | * tests/test_serialization.py: Import read_genotype and write_genotype
from pyhegp.serialization.
(test_read_write_genotype_are_inverses): New test. | 
|  | * tests/test_pyhegp.py: Import math.
(square_matrices, negate, is_singular): New functions.
(test_conservation_of_solutions): New test. | 
|  | * pyhegp/pyhegp.py (hegp_encrypt, hegp_decrypt): Do not standardize or
unstandardize.
(encrypt): Standardize before calling hegp_encrypt.
* tests/test_pyhegp.py (test_hegp_encryption_decryption_are_inverses):
Do not pass mean and standard deviation for standardization and
unstandardization. | 
|  | * tests/test_pyhegp.py (test_hegp_encryption_decryption_are_inverses):
Do not test encryption on order 1 matrices. | 
|  | * pyhegp/pyhegp.py (hegp_encrypt): Standardize before encryption.
(hegp_decrypt): Unstandardize after decryption.
(encrypt): Pass in mean and standard deviation from summary file to
hegp_encrypt.
* tests/test_pyhegp.py (test_hegp_encryption_decryption_are_inverses):
Pass in mean and standard deviation to hegp_encrypt. | 
|  | * pyhegp/pyhegp.py (standardize): Standardize using mean and standard
deviation, instead of the minor allele frequency.
(unstandardize): New function.
* tests/test_pyhegp.py: Import standardize and unstandardize from
pyhegp.pyhegp.
(no_column_zero_standard_deviation): New function.
(test_standardize_unstandardize_are_inverses): New test. | 
|  | * pyhegp/pyhegp.py: Import namedtuple from collections, and
read_summary from pyhegp.serialization.
(Stats): New type.
(pool_stats, pool): New functions.
* tests/test_pyhegp.py: Import Stats and pool_stats from
pyhegp.pyhegp.
(test_pool_stats): New test. | 
|  | * doc/file-formats.md, pyhegp/serialization.py,
tests/test_serialization.py: New files. | 
|  | It may be better to sample a smaller set of matrices finely than a
large set of matrices coarsely.
* tests/test_pyhegp.py (test_hegp_encryption_decryption_are_inverses):
Use default array shapes testing encryption/decryption. | 
|  | * tests/test_pyhegp.py (test_hegp_encryption_decryption_are_inverses):
Reduce maximum matrix size to 100. | 
|  | * pyhegp/__init__.py: New file.
* pyhegp.py: Move to pyhegp/pyhegp.py.
* test_pyhegp.py: Move to tests/test_pyhegp.py. Import from
pyhegp.pyhegp instead of from pyhegp.
* pyproject.toml (project.scripts)[pyhegp]: Switch to
pyhegp.pyhegp:main. |