blob: 83b0c68cd680575d4e166b111772e7d4de6bdb37 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
|
[](https://ci.systemreboot.net/jobs/pyhegp)
pyhegp is a Python library and CLI utility implementing homomorphic encryption of genotypes and phenotypes as described in
- [Private Genomes and Public SNPs: Homomorphic Encryption of Genotypes and Phenotypes for Shared Quantitative Genetics](https://academic.oup.com/genetics/article/215/2/359/5930450)
- [Using encrypted genotypes and phenotypes for collaborative genomic analyses to maintain data confidentiality](https://academic.oup.com/genetics/article/226/3/iyad210/7470728)
# Install development version
## Using pip
### Create a virtual environment (optional)
In a new directory, create a python virtual environment and activate it.
```
mkdir pyhegp
cd pyhegp
python3 -m venv .venv
source .venv/bin/activate
```
### Install pyhegp
Install the development version of pyhegp. If you are in a virtual environment, pyhegp will be installed in it. If you skipped the previous step, pyhegp will be installed in your user install directory (that's typically in your home directory).
```
pip install git+https://github.com/encryption4genetics/pyhegp
```
## Using Guix
Put the following into a `channels.scm` file.
```scheme
(use-modules (guix ci))
(list (channel
(name 'pyhegp)
(url "https://github.com/encryption4genetics/pyhegp")
(branch "main"))
(channel-with-substitutes-available %default-guix-channel
"https://ci.guix.gnu.org"))
```
Build a Guix profile using this `channels.scm` and activate it.
```
guix pull -C channels.scm -p pyhegp-profile
source ./pyhegp-profile/etc/profile
```
Drop into a shell where pyhegp is installed.
```
guix shell pyhegp
```
Now, you can use pyhegp.
```
pyhegp --help
```
# How to use
## Simple data sharing

In this simple scenario, there is only one data owner and they wish to share their encrypted data with a researcher. The data owner encrypts their data with:
```
pyhegp encrypt -o encrypted-genotype.tsv genotype.tsv
```
They then send the encrypted data to the researcher. Note that data sharing is carried out-of-band and is outside the scope of `pyhegp`.
## Joint/federated analysis with many data owners

Data owners generate summary statistics for their data.
```
pyhegp summary genotype.csv -o summary.txt
```
They share this with the data broker who pools it to compute the summary statistics of the complete dataset.
```
pyhegp pool -o complete-summary.txt summary1.txt summary2.txt ...
```
The data broker shares these summary statistics with the data owners. The data owners standardize their data using these summary statistics, and encrypt their data using a random key.
```
pyhegp encrypt -s complete-summary.txt -o encrypted-genotype.csv genotype.csv
```
Finally, the data owners share the encrypted data with the broker who concatenates it and shares it with all parties.
```
pyhegp cat -o complete-encrypted-genotype.csv encrypted-genotype1.csv encrypted-genotype2.csv ...
```
Note that all data sharing is carried out-of-band and is outside the scope of `pyhegp`.
# Run tests
Run the test suite using
```
python3 -m pytest
```
# License
pyhegp is free software released under the terms of the [GNU General Public License](https://www.gnu.org/licenses/gpl.html), either version 3 of the License, or (at your option) any later version.
|