Age | Commit message (Collapse) | Author |
|
|
|
* Add hsmice dataset wrangling and test scripts.
* Add G-expression script to run test.
* Depend on the guix-bioinformatics Guix channel for r-genio.
|
|
|
|
cat_data_frames is no longer a special function used by both
cat_genotype and cat_phenotype. Specialize it and roll it into
cat_genotype.
|
|
|
|
Phenotype frames are split by sample IDs. This corresponds to
splitting along the index, unlike genotype frames which need to be
split along the columns.
|
|
|
|
split_data_frame should only split the data frame. It should not be
filtering out metadata columns.
|
|
Earlier, we were generating unique SNPs in genotype frames by dropping
duplicates. This meant we couldn't control the number of SNPs.
Rejection sampling is also not an option because it is too expensive.
So, we now generate unique SNPs directly, by first generating a list
with unique elements and then converting to a data frame.
|
|
Abstract out generation of genotype frame metadata (namely chromosome,
position and reference) from summaries and genotype_frames into a
new helper function genotype_metadata.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Add keys strategy, and use it.
|
|
This should never occur, but can occur due to bugs in the code; we
wish to protect against that.
|
|
|
|
|
|
It is so much simpler and much more robust to simply compare expected
and actual data frames.
|
|
|
|
|
|
|
|
A cat-phenotype subcommand is coming. Hence rename this.
|
|
Promote phenotype_reserved_column_name_p from helpers.strategies to
is_phenotype_metadata_column in pyhegp.serialization.
|
|
|
|
|
|
pd.concat duplicates the metadata columns, and is generally the wrong
approach to the problem.
|
|
Test cat_genotype extensively using hypothesis.
|
|
Promote genotype_reserved_column_name_p from helpers.strategies to
is_genotype_metadata_column in pyhegp.serialization, and use it
everywhere.
|
|
|
|
We handle this as a special case.
|
|
These strategies may be used by other test modules as well.
|
|
Move workhorse logic of the cat command to a separate function. This
will make it easy to test the logic without having to invoke the
command itself.
|
|
We distinguish CLI subcommand functions using the _command suffix.
This way, we don't have to concoct weird names for the actual
workhorse functions.
To remain consistent, we also suffix _command to the command testing
functions.
|
|
Make output ciphertext file path implicit; infer it by appending
".hegp" to the plaintext file. We take inspiration from GnuPG.
|
|
read_csv can incorrectly infer that the string "00" is the integer 0.
To avoid this ambiguity, pass the correct dtype to read_csv.
|
|
|
|
|
|
Remove comments mentioning logging.
Command-line error messages have their own place; they are not the
same as logging.
|
|
We were testing for zero exit status. Now, in addition, we test for
the existence of output files. This is slightly more robust.
|
|
|
|
|
|
End users who install pyhegp via pip cannot run the test suite.
Clarify this in the README. Perhaps, in the future, we should move
these developer-oriented instructions to a separate document.
|
|
If not separated, GitHub combines the table of contents with the list
of papers in the introduction.
|
|
A table of contents gives people a brief overview of what's in the
README, and allows them to jump to the section they are interested in.
|
|
|