You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: upgrade to SILO input format 0.8.0 - sr2silo v1.2.0
This PR upgrades sr2silo to support SILO input format version 0.8.0, implementing a new JSON schema structure that flattens metadata fields to the root level and restructures genomic segments with explicit sequence, insertions, and offset fields.
Key changes:
- Migrated from nested JSON structure to flat schema with root-level metadata
- Replaced padded alignments with offset-based positioning for better efficiency
- Updated schema validation to distinguish between nucleotide and amino acid segments
- Leading to a major bump in version sr2silo v1.2.0
### General Use: Convert Nucleotide Alignment Reads - CIGAR in .BAM to Cleartext JSON
21
-
sr2silo can convert millions of Short-Read nucleotide read in the form of a .bam CIGAR
22
-
alignments to cleartext alignments. Further, it will gracefully extract insertions
23
-
and deletions. Optionally, sr2silo can translate and align each read using [diamond / blastX](https://github.com/bbuchfink/diamond). And again handle insertions and deletions.
21
+
sr2silo can convert millions of Short-Read nucleotide reads in the form of .bam CIGAR
22
+
alignments to cleartext alignments compatible with LAPIS-SILO v0.8.0+. It gracefully extracts insertions
23
+
and deletions. Optionally, sr2silo can translate and align each read using [diamond / blastX](https://github.com/bbuchfink/diamond), handling insertions and deletions in amino acid sequences as well.
@@ -75,22 +81,29 @@ For detailed information about resource requirements, especially for cluster env
75
81
76
82
### Wrangling Short-Read Genomic Alignments for SILO Database
77
83
78
-
Originally this was started for wargeling short-read genomic alignments for from wastewater-sampling, into a format for easy import into [Loculus](https://github.com/loculus-project/loculus) and its sequence database SILO.
84
+
Originally this was started for wrangling short-read genomic alignments from wastewater-sampling, into a format for easy import into [Loculus](https://github.com/loculus-project/loculus) and its sequence database SILO.
79
85
80
-
sr2silo is designed to process a nucliotide alignments from `.bam` files with metadata, translate and align reads in amino acids, gracefully handling all insertions and deletions and upload the results to the backend [LAPIS-SILO](https://github.com/GenSpectrum/LAPIS-SILO).
86
+
sr2silo is designed to process nucleotide alignments from `.bam` files with metadata, translate and align reads in amino acids, gracefully handling all insertions and deletions and upload the results to the backend [LAPIS-SILO](https://github.com/GenSpectrum/LAPIS-SILO) v0.8.0+.
81
87
82
-
For the V-Pipe to Silo implementation we carry through the following metadata:
0 commit comments