Age | Commit message (Collapse) | Author |
|
the date is now handled more formally (YYYY-MM-DD)
|
|
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
|
|
all the date are saved as "YYYY-MM-DD"
|
|
swab
|
|
makes sense. This allows us to have multiple values where in makes sense
|
|
- additional_submitter_information for information not equal to name or address
- added another check for coverage
|
|
- the script checks for country and specimen_source
- now the missing terms are written on a tsv file
|
|
|
|
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
|
|
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
|
|
|
|
|
|
|
|
- now the script is more gentle with the server, requesting metadata in batches, reducing the ovrall execution time;
- in the YAML files are created field for sample_sequencing_technology, sample_sequencing_technology2, sample_sequencing_technology3, specimen_source, and specimen_source2;
- in sequencing_coverage stuff like 'x', 'X', etc... is stripped, and the ',' replaced by '.';
- the script exploits the dictionaries in the /scripts/dict_ontology_standardization. Now I have used ncbi_specesman_source.csv, ncbi_sequencing_technology.csv, and ncbi_countries.csv.
- in ncbi_sequencing_technology.csv I've added 'Oxford Nanopore' and 'MinION Oxford Nanopore'
- for specimen_source, when there is one of 'NP/OP swab', 'nasopharyngeal and oropharyngeal swab', 'nasopharyngeal/oropharyngeal swab', or 'np/np swab', I put both of them.
|
|
scripts/from_genbank_to_fasta_and_yaml.py
|