bh20-seq-resource - Tool to upload SARS-CoV-2 sequences to BH20 Arvados instance and orchestrate analysis

Age	Commit message (Collapse)	Author
2020-05-31	Added new species and specimen sources	Andrea Guarracino

2020-05-31	Updated the host_sex and host_age management	Andrea Guarracino

2020-05-31	The NCBI Virus entries are updated automatically	Andrea Guarracino

2020-04-30	fixed UO_0000036 for year	Andrea Guarracino

2020-04-30	Merge pull request #41 from AndreaGuarracino/patch-14	LLTommy
	the date is now handled more formally (YYYY-MM-DD)
2020-04-30	Wrap import script to run as a workflow	Peter Amstutz
	Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
2020-04-29	the date is now handled more formally	Andrea Guarracino
	all the date are saved as "YYYY-MM-DD"
2020-04-28	updated to manage list fields and added new control on nasopharyngeal/throat ↵	Andrea Guarracino
	swab
2020-04-28	Changes to the structure - we use lists now instead of strings where it ↵	lltommy
	makes sense. This allows us to have multiple values where in makes sense
2020-04-23	code cleaning, refactoring, submitter name and address	Andrea Guarracino
	- additional_submitter_information for information not equal to name or address - added another check for coverage
2020-04-22	code cleaning, checking and writing missing term on file	Andrea Guarracino
	- the script checks for country and specimen_source - now the missing terms are written on a tsv file
2020-04-22	Small changes all around, trying to make the importer/metadata better	lltommy

2020-04-21	Tweak handling of "coverage" also fix typo	Peter Amstutz
	Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
2020-04-21	Working on NCBI import	Peter Amstutz
	Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
2020-04-21	Updated shex and manditory fields and stuff	lltommy

2020-04-19	added 'np/op' control for specimen_source	Andrea Guarracino

2020-04-19	fixed missing variable and managed comma in dicts	Andrea Guarracino

2020-04-18	new script release	Andrea Guarracino
	- now the script is more gentle with the server, requesting metadata in batches, reducing the ovrall execution time; - in the YAML files are created field for sample_sequencing_technology, sample_sequencing_technology2, sample_sequencing_technology3, specimen_source, and specimen_source2; - in sequencing_coverage stuff like 'x', 'X', etc... is stripped, and the ',' replaced by '.'; - the script exploits the dictionaries in the /scripts/dict_ontology_standardization. Now I have used ncbi_specesman_source.csv, ncbi_sequencing_technology.csv, and ncbi_countries.csv. - in ncbi_sequencing_technology.csv I've added 'Oxford Nanopore' and 'MinION Oxford Nanopore' - for specimen_source, when there is one of 'NP/OP swab', 'nasopharyngeal and oropharyngeal swab', 'nasopharyngeal/oropharyngeal swab', or 'np/np swab', I put both of them.
2020-04-14	Rename script/from_genbank_to_fasta_and_yaml.py to ↵	Andrea Guarracino
	scripts/from_genbank_to_fasta_and_yaml.py