pyhegp is a Python library and CLI utility implementing homomorphic encryption of genotypes and phenotypes as described in - Private Genomes and Public SNPs: Homomorphic Encryption of Genotypes and Phenotypes for Shared Quantitative Genetics - Using encrypted genotypes and phenotypes for collaborative genomic analyses to maintain data confidentiality
Table of contents
- Install development version
- Using pip
- Using Guix
- How to use
- Simple data sharing
- Joint/federated analysis with many data owners
- File formats
- Run tests
- License
Install development version
Using pip
Create a virtual environment (optional)
In a new directory, create a python virtual environment and activate it.
mkdir pyhegp
cd pyhegp
python3 -m venv .venv
source .venv/bin/activate
Install pyhegp
Install the development version of pyhegp. If you are in a virtual environment, pyhegp will be installed in it. If you skipped the previous step, pyhegp will be installed in your user install directory (that's typically in your home directory).
pip install git+https://github.com/encryption4genetics/pyhegp
Using Guix
Put the following into a channels.scm
file.
(use-modules (guix ci))
(list (channel
(name 'pyhegp)
(url "https://github.com/encryption4genetics/pyhegp")
(branch "main"))
(channel-with-substitutes-available %default-guix-channel
"https://ci.guix.gnu.org"))
Build a Guix profile using this channels.scm
and activate it.
guix pull -C channels.scm -p pyhegp-profile
source ./pyhegp-profile/etc/profile
Drop into a shell where pyhegp is installed.
guix shell pyhegp
Now, you can use pyhegp.
pyhegp --help
How to use
Simple data sharing
In this simple scenario, there is only one data owner and they wish to share their encrypted data with a researcher. The data owner encrypts their data with:
pyhegp encrypt -o encrypted-genotype.tsv genotype.tsv
They then send the encrypted data to the researcher. Note that data sharing is carried out-of-band and is outside the scope of pyhegp
.
Joint/federated analysis with many data owners
Data owners generate summary statistics for their data.
pyhegp summary genotype.tsv -o summary
They share this with the data broker who pools it to compute the summary statistics of the complete dataset.
pyhegp pool -o complete-summary summary1 summary2 ...
The data broker shares these summary statistics with the data owners. The data owners standardize their data using these summary statistics, and encrypt their data using a random key.
pyhegp encrypt -s complete-summary -o encrypted-genotype.tsv genotype.tsv
Finally, the data owners share the encrypted data with the broker who concatenates it and shares it with all parties.
pyhegp cat -o complete-encrypted-genotype.tsv encrypted-genotype1.tsv encrypted-genotype2.tsv ...
Note that all data sharing is carried out-of-band and is outside the scope of pyhegp
.
File formats
See File formats for documentation of file formats used by pyhegp.
Run tests
Run the test suite using
python3 -m pytest
The test suite is not meant to be run by end users. It is meant to be run by developers when hacking on the code.
License
pyhegp is free software released under the terms of the GNU General Public License, either version 3 of the License, or (at your option) any later version.