Skip to content

Support historical variant page views via as_of temporal queries #657

@bencap

Description

@bencap

Summary

Variant pages are built by aggregating data across multiple score sets via ClinGen allele ID lookups. When score sets are remapped or superseded, the data shown on these pages changes — breaking bookmarks and making previously viewed data inaccessible. This issue adds support for viewing historical versions of variant pages by resolving variant data as it existed at a user-specified date.

This feature has two prerequisites:

  1. Introduce a mapping table at the target level dcd_mapping2#86 — introduces a per-target Mapping record that captures alignment provenance, QC, and revision history. The as_of mechanism layers on top of that table.
  2. Traceability and Auditing for Variant-level Job Results mavedb-api#627 — the VariantAnnotationStatus model introduced by the job monitoring work tracks per-variant, per-annotation-type status with created_at timestamps. This enables temporal resolution of annotation state, closing the annotation drift gap.

Problem

When a user bookmarks a variant page (keyed by ClinGen allele ID), the data they see can change or disappear due to two independent staleness vectors:

  1. Mapping staleness: A score set's variants are remapped (e.g. due to a mapping API update). Old MappedVariant records get current=False, new ones are created. The new mapping may produce different HGVS representations, VRS IDs, ClinGen allele IDs, or annotation linkages.

  2. Score set staleness: A score set is superseded by a newer version. The old score set still exists, but the variant page aggregation shifts to the new score set's data — which may map to different alleles or have different scores.

Currently, all variant endpoints filter on MappedVariant.current == True, and the allele ID lookup in routers/variants.py only returns current mapped variants. There is no mechanism to access historical data, and a single endpoint (GET /vrs/{identifier}) supports an only_current parameter.

Proposed behavior

as_of query parameter

Add an optional as_of (date) query parameter to variant page endpoints. When provided, the API resolves data as it existed at that date across three dimensions:

  • Score set resolution: For each superseding chain contributing variants to the requested allele, select the latest score set in the chain with published_date <= as_of.
  • Mapping resolution: For each resolved score set's targets, select the Mapping record with the latest mapped_date <= as_of (requires the Mapping table from the prerequisite issue).
  • Annotation resolution: For each variant, resolve annotation state by selecting VariantAnnotationStatus records where created_at <= as_of for each annotation type. This enables accurate point-in-time reconstruction of which annotations (gnomAD, ClinVar, ClinGen, VEP) were present at the requested date.

When as_of is omitted, behavior is unchanged — return current data only.

Response metadata

The API response must include enough context for the frontend to render versioning UI. Per score set in the response:

Field Purpose
mapping.id The Mapping record being shown
mapping.mapped_date When this mapping was produced
mapping.revision Which revision of this target's mapping history
mapping.current Whether this is the latest mapping
current_mapping (if viewing historical) Pointer to the current Mapping for comparison
superseded_by (if applicable) URN of the superseding score set

Top-level response fields:

Field Purpose
as_of The requested date, or null if viewing current
viewing_latest Boolean — false when any score set is showing non-current data

This gives the frontend what it needs to render a banner (e.g. "Viewing data as of Jan 15, 2025. Some score sets have been remapped since then.") and per-score-set staleness indicators.

Denormalized current on MappedVariant

Keep the current boolean on MappedVariant as a denormalized copy of the Mapping.current flag. This preserves the performance of existing hot-path queries (allele ID lookups, score set variant listings) which filter on MappedVariant.current without requiring a join. The worker job already touches every MappedVariant during remapping, so keeping the two booleans in sync within the same transaction is straightforward.

Acceptance criteria

  • as_of query parameter is accepted on the allele ID lookup endpoint (POST /variants/clingen-allele-id-lookups) and mapped variant endpoints
  • When as_of is provided, the response resolves the correct Mapping revision per target based on mapped_date <= as_of
  • When as_of is provided, superseded score sets are included if they were the active version in their chain as of that date (based on the superseding score set's published_date)
  • When as_of is provided, annotation state is resolved from VariantAnnotationStatus records with created_at <= as_of where available
  • When VariantAnnotationStatus records do not exist for a given annotation (pre-deployment data), the current annotation associations on the MappedVariant are returned as a fallback
  • When as_of is omitted, behavior is identical to current behavior (only current=True mapped variants returned)
  • Response includes per-score-set mapping metadata (revision, mapped_date, current, current_mapping pointer)
  • Response includes top-level as_of and viewing_latest fields
  • MappedVariant.current is kept in sync with Mapping.current during remapping
  • Existing queries that filter on MappedVariant.current == True continue to work without joins or performance regression

Implementation notes

Prerequisites

  1. Mapping table — introduces a per-target record:

    ScoreSet ──1:N──▶ TargetGene ──1:N──▶ Mapping ──1:N──▶ MappedVariant
    

    The Mapping record stores alignment provenance, QC metrics, revision (integer, per target), current (boolean, indexed), and mapped_date (DateTime). MappedVariant gains a mapping_id FK. Fields currently duplicated across every MappedVariant in a run (mapped_date, mapping_api_version, vrs_version) move to the Mapping record.

  2. Annotation status tracking — the VariantAnnotationStatus model (from the job monitoring branch) tracks per-variant annotation outcomes with created_at timestamps, annotation_type discrimination, version tracking, and a current boolean managed by AnnotationStatusManager. This provides the temporal metadata needed to resolve annotation state at a given as_of date.

as_of resolution logic

For a given allele ID and as_of date:

  1. Find all MappedVariant records matching the allele ID (across all mapping revisions, not just current)
  2. For each variant's score set, walk the superseding chain: pick the latest score set where published_date <= as_of
  3. For each resolved score set's targets, pick the Mapping where mapped_date <= as_of with the highest revision
  4. Return the MappedVariant records belonging to those resolved Mappings
  5. For each variant, query VariantAnnotationStatus for each annotation type where created_at <= as_of, ordered by created_at descending, to resolve point-in-time annotation state. Fall back to current MappedVariant annotation associations where no VariantAnnotationStatus records exist.

Affected code paths

  • routers/variants.py — allele ID lookup endpoint; currently hardcodes MappedVariant.current == True
  • routers/mapped_variant.py — mapped variant endpoints; filter on current.is_(True)
  • routers/score_sets.py — score set mapped variant listing and CSV export
  • worker/jobs.pymap_variants_for_score_set() must set mapping_id on new MappedVariant records and sync current flag
  • lib/score_sets.py — superseding chain traversal logic; will need a date-aware variant
  • lib/annotation_status_manager.py — may need a get_annotation_as_of() method for temporal queries
  • view_models/mapped_variant.py — response models need mapping metadata fields
  • models/mapped_variant.py — add mapping_id FK, keep denormalized current

Metadata

Metadata

Assignees

No one assigned

    Labels

    app: backendTask implementation touches the backendapp: databaseTask implementation requires database changesapp: frontendTask implementation touches the frontendtype: featureNew feature

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions