You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+31-42Lines changed: 31 additions & 42 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,13 +12,12 @@
12
12
13
13
[](https://www.repostatus.org/#concept)
### General Use: Convert Nucleotide Alignment Reads - CIGAR in .BAM to Cleartext JSON
21
-
sr2silo can convert millions of Short-Read nucleotide reads in the form of .bam CIGAR
20
+
sr2silo can convert millions of Short-Read nucleotide reads in the form of `.bam` CIGAR
22
21
alignments to cleartext alignments compatible with LAPIS-SILO v0.8.0+. It gracefully extracts insertions
23
22
and deletions. Optionally, sr2silo can translate and align each read using [diamond / blastX](https://github.com/bbuchfink/diamond), handling insertions and deletions in amino acid sequences as well.
24
23
@@ -31,19 +30,18 @@ sr2silo outputs per read a JSON (compatible with LAPIS-SILO v0.8.0+):
@@ -85,24 +83,34 @@ Originally this was started for wrangling short-read genomic alignments from was
85
83
86
84
sr2silo is designed to process nucleotide alignments from `.bam` files with metadata, translate and align reads in amino acids, gracefully handling all insertions and deletions and upload the results to the backend [LAPIS-SILO](https://github.com/GenSpectrum/LAPIS-SILO) v0.8.0+.
87
85
88
-
**New Output Format for LAPIS-SILO v0.8.0+:**
89
-
- Metadata fields are now at the root level (no nested "metadata" object)
86
+
**Output Format for LAPIS-SILO v0.8.0+:**
87
+
- Metadata fields use camelCase naming (e.g., `readId`, `sampleId`, `batchId`) to align with Loculus standards
88
+
- Metadata fields are at the root level (no nested "metadata" object)
90
89
- Genomic segments use a structured format with `sequence`, `insertions`, and `offset` fields
91
90
- The main nucleotide segment is required and contains the primary alignment
92
91
- Gene segments (S, ORF1a, etc.) contain amino acid sequences or `null` if empty
93
92
- Insertions use the format `"position:sequence"` (e.g., `"123:ACGT"`)
94
-
- Unaligned sequences are prefixed with `unaligned_` (e.g., `unaligned_main`)
93
+
94
+
**Output Schema Configuration:**
95
+
96
+
The output schema is defined in `src/sr2silo/silo_read_schema.py` using Pydantic models with field aliases for camelCase output. To modify the metadata fields:
97
+
98
+
1. Edit `src/sr2silo/silo_read_schema.py` - Add/modify fields in `ReadMetadata` class
99
+
2. Update `resources/silo/database_config.yaml` - Ensure field names match the Pydantic aliases
100
+
3. Run validation: `python tests/test_database_config_validation.py`
101
+
102
+
The validation ensures your Pydantic schema matches the SILO database configuration.
95
103
96
104
For the V-Pipe to Silo implementation we include the following metadata fields at the root level:
sr2silo supports flexible configuration through environment variables, making it easy to use in different deployment scenarios including conda packages and pip installations.
220
228
221
-
**Key features:**
222
-
- CLI parameters override environment variables
223
-
-**Recommended for credentials to avoid exposing sensitive information in command history**
The code quality checks run on GitHub can be seen in
248
-
-``.github/workflows/test.yml`` for the python package CI/CD,
249
-
250
-
We are using:
251
-
252
-
*[Ruff](https://github.com/charliermarsh/ruff) to lint the code.
253
-
*[Black](https://github.com/psf/black) to format the code.
254
-
*[Pyright](https://github.com/microsoft/pyright) to check the types.
255
-
*[Pytest](https://docs.pytest.org/) to run the unit tests code and workflows.
256
-
*[Interrogate](https://interrogate.readthedocs.io/) to check the documentation.
257
-
258
-
259
-
## Contributing
260
-
261
-
This project welcomes contributions and suggestions. For details, visit the repository's [Contributor License Agreement (CLA)](https://cla.opensource.microsoft.com) and [Code of Conduct](https://opensource.microsoft.com/codeofconduct/) pages.
0 commit comments