You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> See [`docs/formats.md`](docs/formats.md) for the exact structure of every
14
+
> file the pipeline reads and writes — QPX inputs, the `.dat` binary,
15
+
> `.scan_titles.txt`, cluster-DB parquets, MSP, and the pre-existing cluster
16
+
> DB layout expected by `--existing_cluster_db`.
20
17
21
-
SpectrafUSE is a single pipeline. New QPX projects are always converted to MaRaCluster's `.dat` binary format (~100 bytes/spectrum), sliced into precursor m/z windows, clustered, and written to a cluster DB plus an MSP spectral library. If `--existing_cluster_db <path>` is supplied, representative spectra from that DB are extracted to `.dat` and clustered alongside the new data — the rest of the pipeline is identical, and the final step merges into the existing DB instead of writing a fresh one.
22
18
23
-
```
24
-
(--existing_cluster_db) ┌─ EXTRACT_REPS_DAT ─┐
25
-
│ │
26
-
new QPX projects ─────────── ├─ PARQUET_TO_DAT ───┤
BUILD_CLUSTER_DB ▲ GENERATE_MSP_FORMAT (*.msp.gz per partition)
44
-
MERGE_INTO_EXISTING_DB │
45
-
(if --existing_cluster_db)
46
-
```
19
+
SpectrafUSE is a single pipeline. New QPX projects are always converted to MaRaCluster's `.dat` binary format (~100 bytes/spectrum), sliced into precursor m/z windows, clustered, and written to a cluster DB plus an MSP spectral library. If `--existing_cluster_db <path>` is supplied, representative spectra from that DB are extracted to `.dat` and clustered alongside the new data — the rest of the pipeline is identical, and the final step merges into the existing DB instead of writing a fresh one.
0 commit comments