From bcdb235949c06db07172b0c6355a0059436b86fb Mon Sep 17 00:00:00 2001 From: Arun Isaac Date: Mon, 4 Aug 2025 12:52:39 +0100 Subject: Standardize file formats in the likeness of plink files. * pyhegp/pyhegp.py: Import pandas. (summary, pool, encrypt, cat): Use pandas data frames and new data format. * pyhegp/serialization.py: Import csv and pandas. (Summary)[mean, std]: Delete fields. [data]: New field. (read_summary, write_summary, read_genotype, write_genotype): Use pandas data frames and new data format. * tests/test_serialization.py: Import column, columns and data_frames from hypothesis.extra.pandas; pandas; negate from pyhegp.utils. Do not import hypothesis.extra.numpy and approx from pytest. (tabless_printable_ascii_text, chromosome_column, position_column, reference_column, sample_names): New variables. (summaries, genotype_reserved_column_name_p, genotype_frames): New functions. (test_read_write_summary_are_inverses): Use pandas data frames and new data format. (test_read_write_genotype_are_inverses): Use pandas for testing. * doc/file-formats.md (File formats)[summary file]: Describe new standard. [genotype file]: New section. * .guix/pyhegp-package.scm (pyhegp-package): Import python-pandas from (gnu packages python-science). (python-pyhegp)[propagated-inputs]: Add python-pandas. * pyproject.toml (dependencies): Add pandas. --- pyproject.toml | 1 + 1 file changed, 1 insertion(+) (limited to 'pyproject.toml') diff --git a/pyproject.toml b/pyproject.toml index c0a6ab2..636a56d 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -18,6 +18,7 @@ license = {file = "COPYING"} dependencies = [ "click", "numpy", + "pandas", "scipy" ] -- cgit 1.4.1