about summary refs log tree commit diff

Laminar

pyhegp is a Python library and CLI utility implementing homomorphic encryption of genotypes and phenotypes as described in - Private Genomes and Public SNPs: Homomorphic Encryption of Genotypes and Phenotypes for Shared Quantitative Genetics - Using encrypted genotypes and phenotypes for collaborative genomic analyses to maintain data confidentiality

Table of contents

Install development version

Using pip

Create a virtual environment (optional)

In a new directory, create a python virtual environment and activate it.

mkdir pyhegp
cd pyhegp
python3 -m venv .venv
source .venv/bin/activate

Install pyhegp

Install the development version of pyhegp. If you are in a virtual environment, pyhegp will be installed in it. If you skipped the previous step, pyhegp will be installed in your user install directory (that's typically in your home directory).

pip install git+https://github.com/encryption4genetics/pyhegp

Using Guix

Put the following into a channels.scm file.

(use-modules (guix ci))

(list (channel
        (name 'pyhegp)
        (url "https://github.com/encryption4genetics/pyhegp")
        (branch "main"))
      (channel-with-substitutes-available %default-guix-channel
                                          "https://ci.guix.gnu.org"))

Build a Guix profile using this channels.scm and activate it.

guix pull -C channels.scm -p pyhegp-profile
source ./pyhegp-profile/etc/profile

Drop into a shell where pyhegp is installed.

guix shell pyhegp

Now, you can use pyhegp.

pyhegp --help

How to use

Simple data sharing

Simple data sharing workflow

In this simple scenario, there is only one data owner and they wish to share their encrypted data with a researcher. The data owner encrypts their data with:

pyhegp encrypt -o encrypted-genotype.tsv genotype.tsv

They then send the encrypted data to the researcher. Note that data sharing is carried out-of-band and is outside the scope of pyhegp.

Joint/federated analysis with many data owners

Joint/federated analysis workflow

Data owners generate summary statistics for their data.

pyhegp summary genotype.tsv -o summary

They share this with the data broker who pools it to compute the summary statistics of the complete dataset.

pyhegp pool -o complete-summary summary1 summary2 ...

The data broker shares these summary statistics with the data owners. The data owners standardize their data using these summary statistics, and encrypt their data using a random key.

pyhegp encrypt -s complete-summary -o encrypted-genotype.tsv genotype.tsv

Finally, the data owners share the encrypted data with the broker who concatenates it and shares it with all parties.

pyhegp cat -o complete-encrypted-genotype.tsv encrypted-genotype1.tsv encrypted-genotype2.tsv ...

Note that all data sharing is carried out-of-band and is outside the scope of pyhegp.

File formats

See File formats for documentation of file formats used by pyhegp.

Run tests

Run the test suite using

python3 -m pytest

The test suite is not meant to be run by end users. It is meant to be run by developers when hacking on the code.

License

pyhegp is free software released under the terms of the GNU General Public License, either version 3 of the License, or (at your option) any later version.