Skip to content

Commit d09e393

Browse files
committed
disambiguate region-type onlists
1 parent 3d986d4 commit d09e393

File tree

13 files changed

+455
-76
lines changed

13 files changed

+455
-76
lines changed

README.md

Lines changed: 22 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,17 @@ We have multiple tutorials to get you up and running with `seqspec`:
1616

1717
2. Understand how to [manipulate `seqspec` files](docs/USING_SEQSPEC.ipynb) using the `seqspec` command-line tool.
1818

19+
## Current release
20+
21+
`seqspec 0.4.0` keeps the Python and Rust implementations aligned around the same core command set.
22+
23+
- `seqspec upgrade` upgrades `0.3.0` specs to `0.4.0` in both implementations.
24+
- `seqspec` loads gzipped specs directly, so `.yaml.gz` works anywhere a spec path is accepted.
25+
- `seqspec auth` manages host-matched auth profiles for remote resources, and `seqspec check` / `seqspec onlist` can use them with `--auth-profile`.
26+
- `seqspec onlist -s region-type` now errors when the same region type appears across multiple reads, so ambiguous joins are explicit.
27+
- `seqspec print -f seqspec-html` writes a self-contained HTML view of the library and reads.
28+
- `seqspec build` is deprecated.
29+
1930
## Citation
2031

2132
The `seqspec` format and tool are described in this [publication](https://doi.org/10.1093/bioinformatics/btae168). If you use `seqspec` please cite
@@ -29,18 +40,21 @@ Ali Sina Booeshaghi, Xi Chen, Lior Pachter, A machine-readable specification for
2940
## Documentation
3041

3142
- [Install `seqspec`: `docs/INSTALLATION.md`](docs/INSTALLATION.md)
32-
- [Learn about the `seqspec` file format: `docs/DOCUMENTATION.md`](docs/SEQSPEC_FILE.md)
33-
- [Learn about the `seqspec` tool: `docs/DOCUMENTATION.md`](docs/SEQSPEC_TOOL.md)
34-
- [Learn about the `seqspec` specification : `docs/SPECIFICATION.md`](docs/SPECIFICATION.md)
35-
- [Write a `seqspec`: `docs/TUTORIAL.md`](docs/TUTORIAL.md)
43+
- [Learn about the `seqspec` file format: `docs/SEQSPEC_FILE.md`](docs/SEQSPEC_FILE.md)
44+
- [Learn about the `seqspec` tool: `docs/SEQSPEC_TOOL.md`](docs/SEQSPEC_TOOL.md)
45+
- [Learn about the `seqspec` specification: `docs/SPECIFICATION.md`](docs/SPECIFICATION.md)
46+
- [Write a `seqspec` from a simple example: `docs/TUTORIAL_SIMPLE.md`](docs/TUTORIAL_SIMPLE.md)
47+
- [Write a `seqspec` from a template: `docs/TUTORIAL_FROM_TEMPLATE.md`](docs/TUTORIAL_FROM_TEMPLATE.md)
48+
- [Write a more complex `seqspec`: `docs/TUTORIAL_COMPLEX.md`](docs/TUTORIAL_COMPLEX.md)
3649
- [View example `seqspec` files: `https://www.sina.bio/seqspec-builder/assays.html`](https://www.sina.bio/seqspec-builder/assays.html)
3750
- [Contribute a `seqspec` : `docs/CONTRIBUTING.md`](docs/CONTRIBUTING.md)
3851
- [Watch a YouTube video about `seqspec`](https://youtu.be/NSj6Vpzy8tU)
3952
- [Read the manuscript that describes `seqspec`](https://doi.org/10.1093/bioinformatics/btae168)
4053

4154
## Rust implementation
4255

43-
- [] build : Generate a complete seqspec with natural language.
56+
- [x] auth : Manage remote authentication profiles.
57+
- build : Deprecated in both CLIs.
4458
- [x] check : Validate seqspec file against specification (verify check)
4559
- [x] find : Find objects in seqspec file
4660
- [x] file : List files present in seqspec file
@@ -52,7 +66,9 @@ Ali Sina Booeshaghi, Xi Chen, Lior Pachter, A machine-readable specification for
5266
- [x] methods : Convert seqspec file into methods section
5367
- [x] modify : Modify attributes of various elements in seqspec file
5468
- [x] onlist : Get onlist file for elements in seqspec file
55-
- [] print : Display the sequence and/or library structure from seqspec file
69+
- [x] print : Display the sequence and/or library structure from seqspec file
5670
- [x] split : Split seqspec file by modality
5771
- [x] upgrade : Upgrade seqspec file to current version
5872
- [x] version: Get seqspec tool version and seqspec file version
73+
74+
The standalone Rust CLI supports `library-ascii`, `seqspec-ascii`, and `seqspec-html` in `seqspec print`. `seqspec-png` remains Python-only for now.

docs/CHANGELOG.md

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,23 +7,27 @@ authors:
77

88
# Changelog
99

10-
## [0.X.X] - XXXX-XX-XX
10+
## [0.4.1] - Unreleased
1111

1212
### Added
1313

14-
- Implemented core data objects in Rust using PyO3 for improved performance and safety.
15-
- Added extensive tests to ensure full parity between Python and Rust implementations.
14+
- `seqspec auth` in Python and Rust with `init`, `path`, `list`, and `resolve` subcommands.
15+
- `seqspec print -f seqspec-html`, a self-contained HTML view that shows the library molecule, reads, and nested region metadata.
16+
- Additional parity tests for Python and Rust command behavior.
1617

1718
### Changed
1819

19-
- Switched build system in `pyproject.toml` to use `maturin` for Rust extension integration.
20-
- Updated packaging and development workflow to support Rust-backed modules.
20+
- `seqspec upgrade` now upgrades `0.3.0` specs to `0.4.0` in both implementations.
21+
- Python and Rust now share the same core command surface for `auth`, `check`, `find`, `file`, `format`, `index`, `info`, `init`, `insert`, `methods`, `modify`, `onlist`, `print`, `split`, `upgrade`, and `version`.
22+
- `seqspec build` is deprecated in both CLIs and remains as a compatibility stub.
23+
- Older specs are loaded more permissively before upgrade, which makes `0.2.x` and `0.3.x` specs easier to normalize.
24+
- `seqspec onlist -s region-type` now errors when matches span multiple reads in a modality. Use `-s read` or `-s region` to disambiguate.
2125

22-
### Removed
23-
24-
- Removed `to_dict` and `update_from` attributes from all objects; refactored related tests and class structures.
26+
### Fixed
2527

26-
#### Breaking changes
28+
- Rust `load_spec` now reads gzipped seqspec YAML.
29+
- Python `seqspec check` and `seqspec onlist` can use auth profiles for remote resources.
30+
- Python local gzipped onlist validation now detects `.gz` files correctly.
2731

2832
## [0.4.0] - 2025-08-24
2933

docs/INSTALLATION.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,21 @@ pip install seqspec
2525
uv pip install seqspec
2626
```
2727

28-
Verify the installation
28+
Install from source if you want the current working tree.
29+
30+
```bash
31+
# Python package with the Rust core
32+
uv run maturin develop
33+
34+
# standalone Rust CLI
35+
cargo install --path .
36+
```
37+
38+
Verify the installation.
2939

3040
```bash
3141
seqspec --version
42+
seqspec auth path
3243
```
44+
45+
`seqspec` accepts plain YAML and gzipped YAML (`.yaml.gz`). Remote resources can be configured with `seqspec auth` and used with `--auth-profile` in commands such as `seqspec check` and `seqspec onlist`.

docs/SEQSPEC_TOOL.md

Lines changed: 96 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -22,14 +22,15 @@ The `seqspec` specification is detailed in [here](SEQSPEC_FILE.md). Please revie
2222
```
2323
usage: seqspec [-h] <CMD> ...
2424
25-
seqspec 0.3.0: A machine-readable file format for genomic library sequence and structure.
25+
seqspec 0.4.0: A machine-readable file format for genomic library sequence and structure.
2626
2727
GitHub: https://github.com/pachterlab/seqspec
2828
Documentation: https://pachterlab.github.io/seqspec/
2929
3030
positional arguments:
3131
<CMD>
32-
build Generate a complete seqspec with natural language (LLM-assisted)
32+
auth Manage remote authentication profiles
33+
build Deprecated. This command will be removed.
3334
check Validate seqspec file against specification
3435
find Find objects in seqspec file
3536
file List files present in seqspec file
@@ -52,27 +53,73 @@ optional arguments:
5253

5354
`seqspec` operates on `seqspec` compatible YAML files that follow the specification. All of the following examples will use the `seqspec` specification for the [DOGMAseq-DIG](https://doi.org/10.1186/s13059-022-02698-8) assay which can be found here: `seqspec/examples/specs/dogmaseq-dig/spec.yaml`.
5455

56+
Any command that takes `yaml` also accepts gzipped specs such as `spec.yaml.gz`.
57+
58+
The `build` command is deprecated. Use `seqspec init`, `seqspec insert`, and `seqspec modify` instead.
59+
5560
:::{attention}
5661
**IMPORTANT**: Many `seqspec` commands require that the specification be properly formatted and error-corrected. Errors in the spec can be found with `seqspec check` (see below for instructions). The spec can be properly formatted (or "filled in") with `seqspec format`. It is recommended to run `seqspec format` followed by `seqspec check` after writing a new `seqspec` (or correcting errors in an existing one).
5762
:::
5863

64+
## `seqspec auth`: Manage remote authentication profiles
65+
66+
Use auth profiles when a spec points to protected remote files such as IGVF-hosted onlists or FASTQs.
67+
68+
```bash
69+
seqspec auth <AUTH_CMD> ...
70+
```
71+
72+
`seqspec auth` has four subcommands:
73+
74+
- `init`: create or update a profile that maps one or more hosts to credential environment variables
75+
- `path`: show where the auth config file lives
76+
- `list`: list configured profiles
77+
- `resolve`: show which profile would be used for a given URL
78+
79+
The auth config is host-based. The profile stores environment variable names, not secrets.
80+
81+
### Examples
82+
83+
```bash
84+
# create an IGVF profile
85+
seqspec auth init \
86+
--profile igvf \
87+
--host api.data.igvf.org \
88+
--host data.igvf.org \
89+
--kind basic \
90+
--username-env IGVF_ACCESS_KEY_ID \
91+
--password-env IGVF_ACCESS_KEY_SECRET
92+
93+
# inspect the config path
94+
seqspec auth path
95+
96+
# list configured profiles
97+
seqspec auth list
98+
99+
# resolve a URL to a profile
100+
seqspec auth resolve https://api.data.igvf.org/reference-files/IGVFFI5429KKCK/
101+
```
102+
59103
## `seqspec check`: Validate seqspec file against specification
60104

61105
Check that the `seqspec` file is correctly formatted and consistent with the [specification](https://github.com/IGVF/seqspec/blob/main/docs/SPECIFICATION.md).
62106

63107
```bash
64-
seqspec check [-h] [-o OUT] [--skip {igvf,igvf_onlist_skip}] yaml
108+
seqspec check [-h] [-o OUT] [--skip {igvf,igvf_onlist_skip}] [--auth-profile PROFILE] yaml
65109
```
66110

67111
```python
68-
from seqspec.seqspec_check import run_check
112+
from seqspec.seqspec_check import seqspec_check
113+
from seqspec.utils import load_spec
69114

70-
run_check(schema_fn: str, spec_fn: str, o: str)
115+
spec = load_spec("spec.yaml", strict=False)
116+
seqspec_check(spec, filter_type=None, auth_profile=None)
71117
```
72118

73119
- optionally, `-o OUT` can be used to write the output to a file.
74120
- optionally, `--skip {igvf,igvf_onlist_skip}` can filter out known IGVF-specific warnings (see source for list).
75-
- `yaml` corresponds to the `seqspec` file.
121+
- optionally, `--auth-profile PROFILE` uses a named auth profile when checking remote files.
122+
- `yaml` corresponds to the `seqspec` file and may be plain YAML or `.yaml.gz`.
76123

77124
A list of checks performed:
78125

@@ -138,6 +185,9 @@ Below are a list of example errors one may encounter when checking a spec:
138185
$ seqspec check spec.yaml
139186
[error 1] None is not of type 'string' in spec['assay']
140187
[error 2] 'Ribonucleic acid' is not one of ['rna', 'tag', 'protein', 'atac', 'crispr'] in spec['modalities'][0]
188+
189+
# check a spec with protected remote resources
190+
$ seqspec check --auth-profile igvf spec.yaml
141191
```
142192

143193
## `seqspec find`: Find objects in seqspec file
@@ -556,13 +606,15 @@ $ seqspec modify -m atac -o mod_spec.yaml -i atac_R1 --files "R1_1.fastq.gz,fast
556606
## `seqspec onlist`: Get onlist file(s) for elements in seqspec file
557607
558608
```bash
559-
seqspec onlist [-h] [-o OUT] [-s SELECTOR] [-f {product,multi}] -m MODALITY [-i ID] yaml
609+
seqspec onlist [-h] [-o OUT] [-s SELECTOR] [-f {product,multi}] [--auth-profile PROFILE] -m MODALITY [-i ID] yaml
560610
```
561611
562612
```python
563-
from seqspec.seqspec_onlist import run_onlist
613+
from seqspec.seqspec_onlist import get_onlists
614+
from seqspec.utils import load_spec
564615

565-
run_onlist(spec_fn, modality, ids, idtype, fmt, o)
616+
spec = load_spec("spec.yaml")
617+
get_onlists(spec, modality="rna", selector="region-type", id="barcode")
566618
```
567619
568620
- optionally, `-o OUT` when set with `-f`, writes the joined onlist to this file; when set without `-f`, downloads remote onlists locally and prints paths.
@@ -575,9 +627,10 @@ run_onlist(spec_fn, modality, ids, idtype, fmt, o)
575627
- `-f` selects how to combine multiple onlists:
576628
- `product` (cartesian product)
577629
- `multi` (row-aligned, zip with padding)
578-
- `yaml` corresponds to the `seqspec` file.
630+
- optionally, `--auth-profile PROFILE` uses a named auth profile for protected remote onlists.
631+
- `yaml` corresponds to the `seqspec` file and may be plain YAML or `.yaml.gz`.
579632
580-
_Note_: If, for example, there are multiple regions with the specified `region_type` in the modality (e.g. multiple barcodes), then `seqspec onlist` will return a path to an onlist that it generates where the entries in that onlist are the cartesian product of the onlists for all of the regions found.
633+
_Note_: `-s region-type` is only valid when the matching regions come from one read geometry. If the same `region_type` appears across multiple reads in the modality, `seqspec onlist` errors and asks you to use `-s read` or `-s region` instead.
581634
582635
### Examples
583636
@@ -589,6 +642,14 @@ $ seqspec onlist -m rna -s read -i rna_R1 spec.yaml
589642
# Get onlist for barcode region type
590643
$ seqspec onlist -m rna -s region-type -i barcode spec.yaml
591644
/path/to/spec/folder/RNA-737K-arc-v1.txt
645+
646+
# Ambiguous region-type matches across reads are rejected
647+
$ seqspec onlist -m rna -s region-type -i barcode ambiguous_spec.yaml
648+
region-type 'barcode' matches regions in multiple reads for modality 'rna': rna_R1, rna_R2. Use -s read or -s region to disambiguate.
649+
650+
# Get an onlist from a protected remote source
651+
$ seqspec onlist --auth-profile igvf -m crispr -s region-type -i barcode spec.yaml
652+
/path/to/spec/folder/IGVFFI5429KKCK.txt.gz
592653
```
593654
594655
## `seqspec print`: Display the sequence and/or library structure from seqspec file
@@ -600,17 +661,21 @@ seqspec print [-h] [-o OUT] [-f FORMAT] yaml
600661
```
601662
602663
```python
603-
from seqspec.seqspec_print import run_seqspec_print
604-
run_seqspec_print(spec_fn, fmt, o)
664+
from seqspec.seqspec_print import seqspec_print
665+
from seqspec.utils import load_spec
666+
667+
seqspec_print(load_spec("spec.yaml"), "seqspec-html")
605668
```
606669
607670
- optionally, `-o OUT` to set the path of printed file.
608671
- optionally, `-f FORMAT` is the format of the printed file. Can be one of:
609672
- `library-ascii`: prints an ascii tree of the library_spec
610-
- `seqspec-html`: prints an html of both the library_spec and sequence_spec (TODO this is incomplete)
673+
- `seqspec-html`: prints a self-contained interactive HTML view of the library structure, reads, and metadata
611674
- `seqspec-png`: prints a png summary of modality structures
612675
- `seqspec-ascii`: prints an ascii representation of both the library_spec and sequence_spec
613-
- `yaml` corresponds to the `seqspec` file.
676+
- `yaml` corresponds to the `seqspec` file and may be plain YAML or `.yaml.gz`.
677+
678+
The Python CLI supports all four formats. The standalone Rust CLI supports `library-ascii`, `seqspec-ascii`, and `seqspec-html`.
614679
615680
### Examples
616681
@@ -670,17 +735,7 @@ TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
670735

671736

672737
# Print the sequence and library structure as html
673-
$ seqspec print -f seqspec-html spec.yaml
674-
<!DOCTYPE html>
675-
<html>
676-
<head>
677-
<meta name="viewport" content="width=device-width, initial-scale=1" />
678-
<style>
679-
highlight {
680-
color: green;
681-
}
682-
...
683-
# long output omitted
738+
$ seqspec print -f seqspec-html -o spec.html spec.yaml
684739

685740
# Print the library structure as a png
686741
$ seqspec print -o spec.png -f seqspec-png spec.yaml
@@ -720,38 +775,43 @@ seqspec version [-h] [-o OUT] yaml
720775
```
721776
722777
```python
723-
from seqspec.seqspec_version import run_version
724-
run_version(spec_fn, o)
778+
from seqspec.seqspec_version import seqspec_version
779+
from seqspec.utils import load_spec
780+
781+
seqspec_version(load_spec("spec.yaml"))
725782
```
726783
727784
- optionally, `-o OUT` path to file to write output.
728-
- `yaml` corresponds to the `seqspec` file.
785+
- `yaml` corresponds to the `seqspec` file and may be plain YAML or `.yaml.gz`.
729786
730787
### Examples
731788
732789
```bash
733790
# Get versions of tool and file
734791
$ seqspec version spec.yaml
735-
seqspec version: 0.3.0
736-
seqspec file version: 0.3.0
792+
seqspec version: 0.4.0
793+
seqspec file version: 0.4.0
737794
```
738795
739796
## (HIDDEN) `seqspec upgrade`: Upgrade seqspec file from older versions to the current version
740797
741-
This is a hidden subcommand that upgrades an old version of the spec to the current one. It is not intended to be used in a production environment.
798+
This is a hidden subcommand that upgrades an old version of the spec to the current one. It upgrades `0.0.x`, `0.1.x`, `0.2.0`, and `0.3.0` specs to `0.4.0`.
742799
743800
```bash
744801
seqspec upgrade [-h] [-o OUT] yaml
745802
```
746803
747804
```python
748-
from seqspec.seqspec_upgrade import run_upgrade
749-
run_upgrade(spec_fn, o)
805+
from seqspec.seqspec_upgrade import seqspec_upgrade
806+
from seqspec.utils import load_spec
807+
808+
spec = load_spec("spec.v0_3_0.yaml", strict=False)
809+
seqspec_upgrade(spec, spec.seqspec_version or "0.0.0")
750810
```
751811
752812
### Examples
753813
754814
```bash
755-
# upgrade spec
756-
$ seqspec upgrade -o spec.yaml spec.yaml
815+
# upgrade a 0.3.0 spec to 0.4.0
816+
$ seqspec upgrade -o spec.v0_4_0.yaml spec.v0_3_0.yaml
757817
```

seqspec/seqspec_onlist.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,7 @@ def get_onlists(spec: Assay, modality: str, selector: str, id: str) -> List[Onli
155155
if selector == "region-type":
156156
# Prefer ordering by read orientation when possible to ensure
157157
# consistency with the `read` selector behavior.
158+
matches_by_read: List[tuple[str, List[Onlist]]] = []
158159
reads: List[Read] = spec.get_seqspec(modality)
159160
for rd in reads:
160161
try:
@@ -168,7 +169,15 @@ def get_onlists(spec: Assay, modality: str, selector: str, id: str) -> List[Onli
168169
if ol:
169170
ordered_onlists.append(ol)
170171
if ordered_onlists:
171-
return ordered_onlists
172+
matches_by_read.append((rd.read_id, ordered_onlists))
173+
174+
if len(matches_by_read) == 1:
175+
return matches_by_read[0][1]
176+
if len(matches_by_read) > 1:
177+
read_ids = ", ".join(read_id for read_id, _ in matches_by_read)
178+
raise ValueError(
179+
f"region-type '{id}' matches regions in multiple reads for modality '{modality}': {read_ids}. Use -s read or -s region to disambiguate."
180+
)
172181

173182
# Fallback: original region-type traversal order
174183
regions = find_by_region_type(spec, modality, id)

0 commit comments

Comments
 (0)