aboutsummaryrefslogtreecommitdiff
path: root/scripts/download_genbank_data
AgeCommit message (Collapse)Author
2020-09-28new countries; updated genbank/sra scripts to manage more specimen sourcesAndreaGuarracino
2020-09-28genbank and sra scripts more picky on the ontologies; added utils.py for ↵AndreaGuarracino
shared functions
2020-09-04added in the sra script an option to include only a subset of idsAndreaGuarracino
2020-09-04synchronized the create_sra_metadata.py script with the latest updatesAndreaGuarracino
2020-08-28added control (locally and in the validation) that sample_id has to be the ↵AndreaGuarracino
same in the metadata and in the FASTA header #103
2020-08-27updated dependency from clustalw to minimap2; the genbank script no longer ↵AndreaGuarracino
creates YAML/FASTA pairs for too short sequences
2020-08-26added option in the genbank script to ignore (already validated) IDs; code ↵AndreaGuarracino
cleaning; typos
2020-08-25the YAML/FASTA pair is not created for samples where at least one mandatory ↵AndreaGuarracino
field is missing
2020-08-23genbank/sra scripts update to be more generic with the specimen sourcesAndreaGuarracino
2020-08-22genbank/sra scripts updated to read the dictionaries in a more general wayAndreaGuarracino
2020-07-12added a suffix to distinguish which script created the error/warning filesAndreaGuarracino
2020-07-10an output file is created with the accessions for which no YAML file is createdAndreaGuarracino
2020-07-09fixed bug that lead to invalid sample_sequencing_technology valuesAndrea Guarracino
2020-07-07fix missing authors #91AndreaGuarracino
2020-07-07if the technology is not found, the YAML file is not created; managed longer ↵AndreaGuarracino
species strings
2020-07-06added seq technology in its additional information field if the term is ↵AndreaGuarracino
missing in the dicts
2020-07-03Improving genbank import workflowPeter Amstutz
2020-06-22moved the genbank script in his specific directoryAndreaGuarracino