| Age | Commit message (Collapse) | Author | 
|---|
|  |  | 
|  |  | 
|  |  | 
|  | Add keys strategy, and use it. | 
|  | This should never occur, but can occur due to bugs in the code; we
wish to protect against that. | 
|  |  | 
|  |  | 
|  | It is so much simpler and much more robust to simply compare expected
and actual data frames. | 
|  |  | 
|  |  | 
|  |  | 
|  | A cat-phenotype subcommand is coming. Hence rename this. | 
|  | Promote phenotype_reserved_column_name_p from helpers.strategies to
is_phenotype_metadata_column in pyhegp.serialization. | 
|  |  | 
|  |  | 
|  | pd.concat duplicates the metadata columns, and is generally the wrong
approach to the problem. | 
|  | Test cat_genotype extensively using hypothesis. | 
|  | Promote genotype_reserved_column_name_p from helpers.strategies to
is_genotype_metadata_column in pyhegp.serialization, and use it
everywhere. | 
|  |  | 
|  | We handle this as a special case. | 
|  | These strategies may be used by other test modules as well. | 
|  | Move workhorse logic of the cat command to a separate function. This
will make it easy to test the logic without having to invoke the
command itself. | 
|  | We distinguish CLI subcommand functions using the _command suffix.
This way, we don't have to concoct weird names for the actual
workhorse functions.
To remain consistent, we also suffix _command to the command testing
functions. | 
|  | Make output ciphertext file path implicit; infer it by appending
".hegp" to the plaintext file. We take inspiration from GnuPG. | 
|  | read_csv can incorrectly infer that the string "00" is the integer 0.
To avoid this ambiguity, pass the correct dtype to read_csv. | 
|  |  | 
|  |  | 
|  | Remove comments mentioning logging.
Command-line error messages have their own place; they are not the
same as logging. | 
|  | We were testing for zero exit status. Now, in addition, we test for
the existence of output files. This is slightly more robust. | 
|  |  | 
|  |  | 
|  | End users who install pyhegp via pip cannot run the test suite.
Clarify this in the README. Perhaps, in the future, we should move
these developer-oriented instructions to a separate document. | 
|  | If not separated, GitHub combines the table of contents with the list
of papers in the introduction. | 
|  | A table of contents gives people a brief overview of what's in the
README, and allows them to jump to the section they are interested in. | 
|  |  | 
|  |  | 
|  | Readers are more likely to follow through to the file formats
documentation if there is a link. | 
|  |  | 
|  |  | 
|  |  | 
|  | Reducing precision lowers the file size and makes the files more
human-comprehensible. | 
|  |  | 
|  | Not everyone may want to create a virtual environment. For example, on
some HPC machines, creating a virtual environment is complicated or
does not work. | 
|  | We have not exposed a Python library interface, and it is not clear if
we need to. We can revisit this decision later, if need be. | 
|  | * pyhegp/pyhegp.py: Import reduce from functools.
(pool_summaries, encrypt_genotype): New functions.
(pool): Use pool_summaries.
(encrypt): Use encrypt_genotype.
* tests/test_pyhegp.py: Import pandas; Summary, read_summary and
read_genotype from pyhegp.serialization.
(test_pool, test_encrypt): New tests.
* test-data/encrypt-test-encrypted-genotype.tsv,
test-data/encrypt-test-genotype.tsv, test-data/encrypt-test-key,
test-data/encrypt-test-summary, test-data/pool-test-complete-summary,
test-data/pool-test-summary1, test-data/pool-test-summary2: New files. | 
|  | * doc/file-formats.md (File formats)[key file]: New section.
* pyhegp/serialization.py: Import numpy.
(read_key, write_key): New functions.
* pyhegp/pyhegp.py: Import write_key from pyhegp.serialization.
(encrypt): Use write_key.
* tests/test_serialization.py: Import arrays and array_shapes from
hypothesis.extra.numpy; approx from pytest; read_key and write_key
from pyhegp.serialization.
(test_read_write_key_are_inverses): New test. | 
|  | * pyhegp/pyhegp.py (genotype_summary): New function.
(summary): Use genotype_summary.
(encrypt): Compute summary if not provided.
* tests/test_pyhegp.py (test_simple_workflow): Remove xfail mark. | 
|  | * README.md (How to use): Indent down into "Joint/federated analysis
with many data owners" section.
[Simple data sharing]: New section.
* doc/generate-images.sh: Add simple workflow.
* doc/workflow.png: Rename to doc/joint-workflow.png.
* doc/workflow.uml: Rename to doc/joint-workflow.uml.
* doc/simple-workflow.png, doc/simple-workflow.uml: New files.
* tests/test_pyhegp.py: Import pytest.
(test_simple_workflow): New test.
* test-data/genotype.tsv: New file. | 
|  | * tests/test_pyhegp.py: Import CliRunner from click.testing, and main
from pyhegp.pyhegp.
(test_joint_workflow): New test.
* test-data/genotype0.tsv, test-data/genotype1.tsv,
test-data/genotype2.tsv, test-data/genotype3.tsv: New files. | 
|  | * pyhegp/pyhegp.py: Import pandas.
(summary, pool, encrypt, cat): Use pandas data frames and new data
format.
* pyhegp/serialization.py: Import csv and pandas.
(Summary)[mean, std]: Delete fields.
[data]: New field.
(read_summary, write_summary, read_genotype, write_genotype): Use
pandas data frames and new data format.
* tests/test_serialization.py: Import column, columns and data_frames
from hypothesis.extra.pandas; pandas; negate from pyhegp.utils. Do not
import hypothesis.extra.numpy and approx from pytest.
(tabless_printable_ascii_text, chromosome_column, position_column,
reference_column, sample_names): New variables.
(summaries, genotype_reserved_column_name_p, genotype_frames): New
functions.
(test_read_write_summary_are_inverses): Use pandas data frames and new
data format.
(test_read_write_genotype_are_inverses): Use pandas for testing.
* doc/file-formats.md (File formats)[summary file]: Describe new
standard.
[genotype file]: New section.
* .guix/pyhegp-package.scm (pyhegp-package): Import python-pandas
from (gnu packages python-science).
(python-pyhegp)[propagated-inputs]: Add python-pandas.
* pyproject.toml (dependencies): Add pandas. |