Skip to content

Commit ef841af

Browse files
committed
use optimized rsid lookups also in variants/phenotypes
1 parent 8f8f646 commit ef841af

2 files changed

Lines changed: 24 additions & 0 deletions

File tree

clickhouserewrite.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -248,3 +248,9 @@ All other routers in `src/routers/datatypeRouters/edges/` and `src/routers/datat
248248
4. Use three-query pagination for endpoints that merge results from multiple sources
249249
5. Use lean projections + two-step ID resolution for high-cardinality string lookups on large tables
250250
6. Use materialized view lookup tables for array column lookups (e.g. `rsid`)
251+
252+
### Important: always use optimized variant lookups
253+
254+
Any endpoint that resolves variant identifiers (`spdi`, `hgvs`, `ca_id`, `rsid`) to variant IDs **must** use the optimized lookup paths — lean projections for `spdi`/`hgvs`/`ca_id` and the `rsid_to_variant` materialized view for `rsid`. Never query the `variants` table directly with `WHERE spdi = ...`, `WHERE hgvs = ...`, or `WHERE has(rsid, ...)` when selecting full rows or when used as a subquery without the lean projection.
255+
256+
This applies to both direct variant queries (`/variants`) and any edge endpoint that accepts variant identifiers as input (e.g. `/variants/phenotypes`, `/variants/genes`, etc.). The `variantIDSearch()` and `findVariantIDByRSID()` functions in `variants.ts` already use the optimized paths and should be reused by all edge routers that resolve variant identifiers. Bypassing these functions with direct `has(rsid, ...)` or unaliased `WHERE spdi = ...` queries against the 1.2B-row `variants` table will result in 60s+ timeouts.

src/routers/datatypeRouters/nodes/variants.ts

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -614,6 +614,24 @@ export async function variantIDSearch (input: paramsFormatType): Promise<any[]>
614614
}
615615
}
616616

617+
// Fast path: resolve rsid via lookup table, spdi/ca_id/hgvs via lean projections
618+
if (input.rsid !== undefined) {
619+
const rows = await chQuery<{ variant_id: string }>(
620+
'SELECT variant_id FROM rsid_to_variant WHERE rsid = {_rsid:String}',
621+
{ _rsid: input.rsid as string }
622+
)
623+
return rows.map(r => r.variant_id)
624+
}
625+
for (const col of LEAN_LOOKUP_COLS) {
626+
if (input[col] !== undefined) {
627+
const rows = await chQuery<{ id: string }>(
628+
`SELECT id FROM ${TABLE} WHERE ${col} = {_lp_val:String}`,
629+
{ _lp_val: input[col] as string }
630+
)
631+
return rows.map(r => r.id)
632+
}
633+
}
634+
617635
const params: QueryParams = {}
618636
const where = buildVariantWhere(input, params)
619637
if (where === '') return []

0 commit comments

Comments
 (0)