about summary refs log tree commit diff
AgeCommit message (Collapse)Author
3 hoursUpdate r-mixed-model-gwas to 1.3.1. HEAD mainArun Isaac
7 hoursAdd end-to-end tests for hsmice dataset.Arun Isaac
* Add hsmice dataset wrangling and test scripts. * Add G-expression script to run test. * Depend on the guix-bioinformatics Guix channel for r-genio.
11 hoursConstruct empty genotype frame using series.Arun Isaac
11 hoursRoll cat_data_frames into cat_genotype.Arun Isaac
cat_data_frames is no longer a special function used by both cat_genotype and cat_phenotype. Specialize it and roll it into cat_genotype.
11 hoursCatenate phenotype frames along the index.Arun Isaac
12 hoursSplit catenable phenotype frames along the index.Arun Isaac
Phenotype frames are split by sample IDs. This corresponds to splitting along the index, unlike genotype frames which need to be split along the columns.
12 hoursGeneralize split_data_frame to split along any axis.Arun Isaac
13 hoursSimplify split_data_frame so it is more composable.Arun Isaac
split_data_frame should only split the data frame. It should not be filtering out metadata columns.
29 hoursGenerate unique SNPs in genotype frames without dropping duplicates.Arun Isaac
Earlier, we were generating unique SNPs in genotype frames by dropping duplicates. This meant we couldn't control the number of SNPs. Rejection sampling is also not an option because it is too expensive. So, we now generate unique SNPs directly, by first generating a list with unique elements and then converting to a data frame.
34 hoursDeduplicate genotype frame metadata generation.Arun Isaac
Abstract out generation of genotype frame metadata (namely chromosome, position and reference) from summaries and genotype_frames into a new helper function genotype_metadata.
37 hoursDrop SNPs with a zero standard deviation.Arun Isaac
38 hoursFix typo in comment: tha->that.Arun Isaac
2 daysAvoid wildcard import from helpers.strategies.Arun Isaac
2 daysLimit values in genotype and phenotype strategies.Arun Isaac
2 daysTest that ciphertext does not contain NA values.Arun Isaac
2 daysParameterize number of samples in phenotype frame strategy.Arun Isaac
2 daysParameterize number of samples in genotype frame strategy.Arun Isaac
2 daysParameterize presence of reference column in genotype frame strategy.Arun Isaac
2 daysAdd keys strategy.Arun Isaac
Add keys strategy, and use it.
2 daysRaise exception if data frame to be written has NA values.Arun Isaac
This should never occur, but can occur due to bugs in the code; we wish to protect against that.
2 daysAdd --force flag to encrypt subcommand permitting file overwriting.Arun Isaac
3 daysSupport encrypting phenotypes.Arun Isaac
3 daysCompare complete frame in test_cat_*.Arun Isaac
It is so much simpler and much more robust to simply compare expected and actual data frames.
3 daysDo not import unused settings from hypothesis.Arun Isaac
3 daysTest cat_phenotype.Arun Isaac
3 daysAdd cat-phenotype subcommand.Arun Isaac
4 daysRename cat subcommand to cat-genotype.Arun Isaac
A cat-phenotype subcommand is coming. Hence rename this.
4 daysAdd is_phenotype_metadata_column.Arun Isaac
Promote phenotype_reserved_column_name_p from helpers.strategies to is_phenotype_metadata_column in pyhegp.serialization.
4 daysDrop duplicates in generated test phenotype frames.Arun Isaac
4 daysSet CI environment variable when building Guix package.Arun Isaac
4 daysMerge, not concat, genotype frames.Arun Isaac
pd.concat duplicates the metadata columns, and is generally the wrong approach to the problem.
4 daysTest cat_genotype.Arun Isaac
Test cat_genotype extensively using hypothesis.
4 daysAdd is_genotype_metadata_column.Arun Isaac
Promote genotype_reserved_column_name_p from helpers.strategies to is_genotype_metadata_column in pyhegp.serialization, and use it everywhere.
4 daysDrop duplicates in generated test genotype frames.Arun Isaac
4 daysCatenate an empty list of genotypes.Arun Isaac
We handle this as a special case.
4 daysMove hypothesis strategies to separate file.Arun Isaac
These strategies may be used by other test modules as well.
5 daysAdd cat_genotype workhorse function.Arun Isaac
Move workhorse logic of the cat command to a separate function. This will make it easy to test the logic without having to invoke the command itself.
5 daysSuffix CLI subcommand functions with _command.Arun Isaac
We distinguish CLI subcommand functions using the _command suffix. This way, we don't have to concoct weird names for the actual workhorse functions. To remain consistent, we also suffix _command to the command testing functions.
5 daysDo not require output ciphertext file path.Arun Isaac
Make output ciphertext file path implicit; infer it by appending ".hegp" to the plaintext file. We take inspiration from GnuPG.
5 daysPass dtype to read_csv.Arun Isaac
read_csv can incorrectly infer that the string "00" is the integer 0. To avoid this ambiguity, pass the correct dtype to read_csv.
5 daysUse open method of Path object, rather than the open function.Arun Isaac
5 daysDo not skip blank lines when reading TSV files.Arun Isaac
5 daysDecide to not use logging.Arun Isaac
Remove comments mentioning logging. Command-line error messages have their own place; they are not the same as logging.
5 daysTest for existence of output files.Arun Isaac
We were testing for zero exit status. Now, in addition, we test for the existence of output files. This is slightly more robust.
5 daysTitle case sentence.Arun Isaac
5 daysAdd phenotype file format and serialization functions.Arun Isaac
2025-08-08Clarify that the test suite is not for end users.Arun Isaac
End users who install pyhegp via pip cannot run the test suite. Clarify this in the README. Perhaps, in the future, we should move these developer-oriented instructions to a separate document.
2025-08-08Separate table of contents from introduction.Arun Isaac
If not separated, GitHub combines the table of contents with the list of papers in the introduction.
2025-08-08Add table of contents to README.Arun Isaac
A table of contents gives people a brief overview of what's in the README, and allows them to jump to the section they are interested in.
2025-08-08Replace csv extension with tsv extension on genotype files.Arun Isaac