aboutsummaryrefslogtreecommitdiff
path: root/workflows
diff options
context:
space:
mode:
Diffstat (limited to 'workflows')
-rw-r--r--workflows/pull-data/genbank/README.md12
-rwxr-xr-xworkflows/tools/pubseq-fetch-ids2
2 files changed, 11 insertions, 3 deletions
diff --git a/workflows/pull-data/genbank/README.md b/workflows/pull-data/genbank/README.md
index 5464d1d..188ff6f 100644
--- a/workflows/pull-data/genbank/README.md
+++ b/workflows/pull-data/genbank/README.md
@@ -11,7 +11,8 @@ The following workflow sends GenBank data into PubSeq
```sh
# --- get list of IDs already in PubSeq
-../../tools/sparql-fetch-ids > pubseq_ids.txt
+../../tools/pubseq-fetch-ids > pubseq_ids.txt
+
# --- get list of missing genbank IDs
python3 genbank-fetch-ids.py --skip pubseq_ids.txt > genbank_ids.txt
@@ -26,6 +27,13 @@ python3 ../../workflows/tools/normalize-yamlfa.py -s ~/tmp/yamlfa/state.json --s
```
+## Validate GenBank data
+
+To pull the data from PubSeq use the list of pubseq ids generated
+above.
+
+
+
# TODO
-- [ ] Add id for GenBank accession - i.e. how can we tell a record is from GenBank
+- [X] Add id for GenBank accession - i.e. how can we tell a record is from GenBank
diff --git a/workflows/tools/pubseq-fetch-ids b/workflows/tools/pubseq-fetch-ids
index 19b2d82..f5920ec 100755
--- a/workflows/tools/pubseq-fetch-ids
+++ b/workflows/tools/pubseq-fetch-ids
@@ -2,7 +2,7 @@
#
# Use a SPARQL query to fetch all IDs in the PubSeq database
#
-# sparql-fetch-ids > pubseq_ids.txt
+# pubseq-fetch-ids > pubseq_ids.txt
#
# Note: requires Ruby 3.x. Older Ruby gives a syntax error