Skip to content

Commit 29e66c1

Browse files
authored
release 1.6.0 (#374)
2 parents 65e1bb9 + a3f44c7 commit 29e66c1

38 files changed

+1795
-1200
lines changed

.github/workflows/test.yml

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,6 @@ jobs:
2323
python-version: '3.11'
2424
cache: 'pip'
2525

26-
- name: Run black formatting check
27-
uses: psf/black@stable
28-
2926
- name: Set up Conda
3027
uses: conda-incubator/setup-miniconda@v3
3128
with:
@@ -49,12 +46,18 @@ jobs:
4946
conda activate sr2silo-dev
5047
poetry run pytest
5148
52-
- name: Run Ruff
49+
- name: Run Ruff check
5350
run: |
5451
source $(conda info --base)/etc/profile.d/conda.sh
5552
conda activate sr2silo-dev
5653
poetry run ruff check .
5754
55+
- name: Run Ruff format check
56+
run: |
57+
source $(conda info --base)/etc/profile.d/conda.sh
58+
conda activate sr2silo-dev
59+
poetry run ruff format --check .
60+
5861
- name: Run interrogate
5962
run: |
6063
source $(conda info --base)/etc/profile.d/conda.sh

.pre-commit-config.yaml

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -31,15 +31,12 @@ repos:
3131
hooks:
3232
- id: poetry-check
3333
- id: poetry-lock
34-
- repo: https://github.com/charliermarsh/ruff-pre-commit
35-
rev: 'v0.9.9'
34+
- repo: https://github.com/astral-sh/ruff-pre-commit
35+
rev: 'v0.14.2'
3636
hooks:
37-
- id: ruff
38-
args: [--fix, --exit-non-zero-on-fix]
39-
- repo: https://github.com/psf/black
40-
rev: 25.1.0
41-
hooks:
42-
- id: black
37+
- id: ruff-check
38+
args: [--fix]
39+
- id: ruff-format
4340
- repo: https://github.com/econchick/interrogate
4441
rev: 1.7.0
4542
hooks:

README.md

Lines changed: 31 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,12 @@
1212

1313
[![Project Status: POC – This project is currently under active development.](https://www.repostatus.org/badges/latest/concept.svg)](https://www.repostatus.org/#concept)
1414
[![CI/CD](https://github.com/cbg-ethz/sr2silo/actions/workflows/test.yml/badge.svg)](https://github.com/cbg-ethz/sr2silo/actions/workflows/test.yml)
15-
[![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
1615
[![Pytest](https://img.shields.io/badge/tested%20with-pytest-0A9EDC.svg)](https://docs.pytest.org/en/stable/)
1716
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v2.json)](https://github.com/charliermarsh/ruff)
1817
[![Pyright](https://img.shields.io/badge/type%20checked-pyright-blue.svg)](https://github.com/microsoft/pyright)
1918

2019
### General Use: Convert Nucleotide Alignment Reads - CIGAR in .BAM to Cleartext JSON
21-
sr2silo can convert millions of Short-Read nucleotide reads in the form of .bam CIGAR
20+
sr2silo can convert millions of Short-Read nucleotide reads in the form of `.bam` CIGAR
2221
alignments to cleartext alignments compatible with LAPIS-SILO v0.8.0+. It gracefully extracts insertions
2322
and deletions. Optionally, sr2silo can translate and align each read using [diamond / blastX](https://github.com/bbuchfink/diamond), handling insertions and deletions in amino acid sequences as well.
2423

@@ -31,19 +30,18 @@ sr2silo outputs per read a JSON (compatible with LAPIS-SILO v0.8.0+):
3130

3231
```json
3332
{
34-
"read_id": "AV233803:AV044:2411515907:1:10805:5199:3294",
35-
"sample_id": "A1_05_2024_10_08",
36-
"batch_id": "20241024_2411515907",
37-
"sampling_date": "2024-10-08",
38-
"location_name": "Lugano (TI)",
39-
"read_length": "250",
40-
"location_code": "05",
33+
"readId": "AV233803:AV044:2411515907:1:10805:5199:3294",
34+
"sampleId": "A1_05_2024_10_08",
35+
"batchId": "20241024_2411515907",
36+
"samplingDate": "2024-10-08",
37+
"locationName": "Lugano (TI)",
38+
"locationCode": "5",
39+
"sr2siloVersion": "1.3.0",
4140
"main": {
4241
"sequence": "CGGTTTCGTCCGTGTTGCAGCCG...GTGTCAACATCTTAAAGATGGCACTTGTG",
4342
"insertions": ["10:ACTG", "456:TACG"],
4443
"offset": 4545
4544
},
46-
"unaligned_main": "CGGTTTCGTCCGTGTTGCAGCCGATCATCTAGGT...TACAGGTTCGCGACGTGCTCGTGTGAAAGATGGCACTTGTG",
4745
"S": {
4846
"sequence": "MESLVPGFNEKTHVQLSLPVLQVRVRGFGDSVEEVLSEARQHLKDGTCGLVEVEKGV",
4947
"insertions": ["23:A", "145:KLM"],
@@ -85,24 +83,34 @@ Originally this was started for wrangling short-read genomic alignments from was
8583

8684
sr2silo is designed to process nucleotide alignments from `.bam` files with metadata, translate and align reads in amino acids, gracefully handling all insertions and deletions and upload the results to the backend [LAPIS-SILO](https://github.com/GenSpectrum/LAPIS-SILO) v0.8.0+.
8785

88-
**New Output Format for LAPIS-SILO v0.8.0+:**
89-
- Metadata fields are now at the root level (no nested "metadata" object)
86+
**Output Format for LAPIS-SILO v0.8.0+:**
87+
- Metadata fields use camelCase naming (e.g., `readId`, `sampleId`, `batchId`) to align with Loculus standards
88+
- Metadata fields are at the root level (no nested "metadata" object)
9089
- Genomic segments use a structured format with `sequence`, `insertions`, and `offset` fields
9190
- The main nucleotide segment is required and contains the primary alignment
9291
- Gene segments (S, ORF1a, etc.) contain amino acid sequences or `null` if empty
9392
- Insertions use the format `"position:sequence"` (e.g., `"123:ACGT"`)
94-
- Unaligned sequences are prefixed with `unaligned_` (e.g., `unaligned_main`)
93+
94+
**Output Schema Configuration:**
95+
96+
The output schema is defined in `src/sr2silo/silo_read_schema.py` using Pydantic models with field aliases for camelCase output. To modify the metadata fields:
97+
98+
1. Edit `src/sr2silo/silo_read_schema.py` - Add/modify fields in `ReadMetadata` class
99+
2. Update `resources/silo/database_config.yaml` - Ensure field names match the Pydantic aliases
100+
3. Run validation: `python tests/test_database_config_validation.py`
101+
102+
The validation ensures your Pydantic schema matches the SILO database configuration.
95103

96104
For the V-Pipe to Silo implementation we include the following metadata fields at the root level:
97105
```json
98106
{
99-
"read_id": "AV233803:AV044:2411515907:1:10805:5199:3294",
100-
"sample_id": "A1_05_2024_10_08",
101-
"batch_id": "20241024_2411515907",
102-
"sampling_date": "2024-10-08",
103-
"location_name": "Lugano (TI)",
104-
"read_length": "250",
105-
"location_code": "05"
107+
"readId": "AV233803:AV044:2411515907:1:10805:5199:3294",
108+
"sampleId": "A1_05_2024_10_08",
109+
"batchId": "20241024_2411515907",
110+
"samplingDate": "2024-10-08",
111+
"locationName": "Lugano (TI)",
112+
"locationCode": "5",
113+
"sr2siloVersion": "1.3.0"
106114
}
107115
```
108116

@@ -202,7 +210,7 @@ sr2silo process-from-vpipe \
202210

203211
# Example: Submit to Loculus (use environment variables for credentials)
204212
export KEYCLOAK_TOKEN_URL=https://auth.example.com/token
205-
export SUBMISSION_URL=https://api.example.com/submit
213+
export BACKEND_URL=https://api.example.com/submit
206214
export GROUP_ID=123
207215
export USERNAME=your-username
208216
export PASSWORD=your-password
@@ -218,15 +226,13 @@ sr2silo submit-to-loculus --processed-file output.ndjson.zst
218226

219227
sr2silo supports flexible configuration through environment variables, making it easy to use in different deployment scenarios including conda packages and pip installations.
220228

221-
**Key features:**
222-
- CLI parameters override environment variables
223-
- **Recommended for credentials to avoid exposing sensitive information in command history**
229+
**Note:** CLI parameters override environment variables
224230

225231
**Common configuration via environment variables:**
226232
```bash
227233
# Authentication credentials (recommended approach for security)
228234
export KEYCLOAK_TOKEN_URL=https://auth.example.com/token
229-
export SUBMISSION_URL=https://backend.example.com/api
235+
export BACKEND_URL=https://backend.example.com/api
230236
export GROUP_ID=123
231237
export USERNAME=your-username
232238
export PASSWORD=your-password
@@ -242,20 +248,3 @@ sr2silo process-from-vpipe \
242248
sr2silo submit-to-loculus \
243249
--processed-file output.ndjson.zst
244250
```
245-
246-
### Tool Sections
247-
The code quality checks run on GitHub can be seen in
248-
- ``.github/workflows/test.yml`` for the python package CI/CD,
249-
250-
We are using:
251-
252-
* [Ruff](https://github.com/charliermarsh/ruff) to lint the code.
253-
* [Black](https://github.com/psf/black) to format the code.
254-
* [Pyright](https://github.com/microsoft/pyright) to check the types.
255-
* [Pytest](https://docs.pytest.org/) to run the unit tests code and workflows.
256-
* [Interrogate](https://interrogate.readthedocs.io/) to check the documentation.
257-
258-
259-
## Contributing
260-
261-
This project welcomes contributions and suggestions. For details, visit the repository's [Contributor License Agreement (CLA)](https://cla.opensource.microsoft.com) and [Code of Conduct](https://opensource.microsoft.com/codeofconduct/) pages.

conda-recipe/meta.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# conda recipe
22
{% set name = "sr2silo" %}
3-
{% set version = "1.5.0" %}
3+
{% set version = "1.6.0" %}
44

55
package:
66
name: {{ name|lower }}

0 commit comments

Comments
 (0)