From d270d35cbfe9bc94d1bef16a63e3ca89e87e739e Mon Sep 17 00:00:00 2001 From: Arun Isaac Date: Thu, 17 Jul 2025 17:59:17 +0100 Subject: Document usage instructions and workflow. * doc/workflow.uml, doc/workflow.png, doc/generate-images.sh: New files. * README.md (How to use): New section. --- README.md | 22 ++++++++++++++++++++++ doc/generate-images.sh | 3 +++ doc/workflow.png | Bin 0 -> 27569 bytes doc/workflow.uml | 16 ++++++++++++++++ 4 files changed, 41 insertions(+) create mode 100755 doc/generate-images.sh create mode 100644 doc/workflow.png create mode 100644 doc/workflow.uml diff --git a/README.md b/README.md index 1396f0f..1130c7a 100644 --- a/README.md +++ b/README.md @@ -14,6 +14,28 @@ Install the development version of pyhegp into the virtual environment. pip install git+https://github.com/encryption4genetics/pyhegp ``` +# How to use + +![Workflow](doc/workflow.png) + +Data owners generate summary statistics for their data. +``` +pyhegp summary genotype.csv -o summary.txt +``` +They share this with the data broker who pools it to compute the summary statistics of the complete dataset. +``` +pyhegp pool -o complete-summary.txt summary1.txt summary2.txt ... +``` +The data broker shares these summary statistics with the data owners. The data owners standardize their data using these summary statistics, and encrypt their data using a random key. +``` +pyhegp encrypt -s complete-summary.txt -o encrypted-genotype.csv genotype.csv +``` +Finally, the data owners share the encrypted data with the broker who concatenates it and shares it with all parties. +``` +pyhegp cat -o complete-encrypted-genotype.csv encrypted-genotype1.csv encrypted-genotype2.csv ... +``` +Note that all data sharing is carried out-of-band and is outside the scope of `pyhegp`. + # Run tests Run the test suite using diff --git a/doc/generate-images.sh b/doc/generate-images.sh new file mode 100755 index 0000000..e1ee0ba --- /dev/null +++ b/doc/generate-images.sh @@ -0,0 +1,3 @@ +#! /bin/sh + +cat workflow.uml | guix shell plantuml -- plantuml -p > workflow.png diff --git a/doc/workflow.png b/doc/workflow.png new file mode 100644 index 0000000..b2ff1b2 Binary files /dev/null and b/doc/workflow.png differ diff --git a/doc/workflow.uml b/doc/workflow.uml new file mode 100644 index 0000000..2d1542c --- /dev/null +++ b/doc/workflow.uml @@ -0,0 +1,16 @@ +actor "Data Broker" as broker +actor "Data Owner 1" as owner1 +actor "Data Owner 2" as owner2 +actor "Data Owner 3" as owner3 +owner1 -> broker: Send summary statistics +owner2 -> broker: Send summary statistics +owner3 -> broker: Send summary statistics +broker --> owner1: Send pooled statistics +broker --> owner2: Send pooled statistics +broker --> owner3: Send pooled statistics +owner1 -> broker: Encrypt and share ciphertext +owner2 -> broker: Encrypt and share ciphertext +owner3 -> broker: Encrypt and share ciphertext +broker -> owner1: Share concatenated ciphertext +broker -> owner2: Share concatenated ciphertext +broker -> owner3: Share concatenated ciphertext -- cgit v1.2.3