You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Variant pages are built by aggregating data across multiple score sets via ClinGen allele ID lookups. When score sets are remapped or superseded, the data shown on these pages changes — breaking bookmarks and making previously viewed data inaccessible. This issue adds support for viewing historical versions of variant pages by resolving variant data as it existed at a user-specified date.
Traceability and Auditing for Variant-level Job Results mavedb-api#627 — the VariantAnnotationStatus model introduced by the job monitoring work tracks per-variant, per-annotation-type status with created_at timestamps. This enables temporal resolution of annotation state, closing the annotation drift gap.
Problem
When a user bookmarks a variant page (keyed by ClinGen allele ID), the data they see can change or disappear due to two independent staleness vectors:
Mapping staleness: A score set's variants are remapped (e.g. due to a mapping API update). Old MappedVariant records get current=False, new ones are created. The new mapping may produce different HGVS representations, VRS IDs, ClinGen allele IDs, or annotation linkages.
Score set staleness: A score set is superseded by a newer version. The old score set still exists, but the variant page aggregation shifts to the new score set's data — which may map to different alleles or have different scores.
Currently, all variant endpoints filter on MappedVariant.current == True, and the allele ID lookup in routers/variants.py only returns current mapped variants. There is no mechanism to access historical data, and a single endpoint (GET /vrs/{identifier}) supports an only_current parameter.
Proposed behavior
as_of query parameter
Add an optional as_of (date) query parameter to variant page endpoints. When provided, the API resolves data as it existed at that date across three dimensions:
Score set resolution: For each superseding chain contributing variants to the requested allele, select the latest score set in the chain with published_date <= as_of.
Mapping resolution: For each resolved score set's targets, select the Mapping record with the latest mapped_date <= as_of (requires the Mapping table from the prerequisite issue).
Annotation resolution: For each variant, resolve annotation state by selecting VariantAnnotationStatus records where created_at <= as_of for each annotation type. This enables accurate point-in-time reconstruction of which annotations (gnomAD, ClinVar, ClinGen, VEP) were present at the requested date.
When as_of is omitted, behavior is unchanged — return current data only.
Response metadata
The API response must include enough context for the frontend to render versioning UI. Per score set in the response:
Field
Purpose
mapping.id
The Mapping record being shown
mapping.mapped_date
When this mapping was produced
mapping.revision
Which revision of this target's mapping history
mapping.current
Whether this is the latest mapping
current_mapping
(if viewing historical) Pointer to the current Mapping for comparison
superseded_by
(if applicable) URN of the superseding score set
Top-level response fields:
Field
Purpose
as_of
The requested date, or null if viewing current
viewing_latest
Boolean — false when any score set is showing non-current data
This gives the frontend what it needs to render a banner (e.g. "Viewing data as of Jan 15, 2025. Some score sets have been remapped since then.") and per-score-set staleness indicators.
Denormalized current on MappedVariant
Keep the current boolean on MappedVariant as a denormalized copy of the Mapping.current flag. This preserves the performance of existing hot-path queries (allele ID lookups, score set variant listings) which filter on MappedVariant.current without requiring a join. The worker job already touches every MappedVariant during remapping, so keeping the two booleans in sync within the same transaction is straightforward.
Acceptance criteria
as_of query parameter is accepted on the allele ID lookup endpoint (POST /variants/clingen-allele-id-lookups) and mapped variant endpoints
When as_of is provided, the response resolves the correct Mapping revision per target based on mapped_date <= as_of
When as_of is provided, superseded score sets are included if they were the active version in their chain as of that date (based on the superseding score set's published_date)
When as_of is provided, annotation state is resolved from VariantAnnotationStatus records with created_at <= as_of where available
When VariantAnnotationStatus records do not exist for a given annotation (pre-deployment data), the current annotation associations on the MappedVariant are returned as a fallback
When as_of is omitted, behavior is identical to current behavior (only current=True mapped variants returned)
Response includes per-score-set mapping metadata (revision, mapped_date, current, current_mapping pointer)
Response includes top-level as_of and viewing_latest fields
MappedVariant.current is kept in sync with Mapping.current during remapping
Existing queries that filter on MappedVariant.current == True continue to work without joins or performance regression
The Mapping record stores alignment provenance, QC metrics, revision (integer, per target), current (boolean, indexed), and mapped_date (DateTime). MappedVariant gains a mapping_id FK. Fields currently duplicated across every MappedVariant in a run (mapped_date, mapping_api_version, vrs_version) move to the Mapping record.
Annotation status tracking — the VariantAnnotationStatus model (from the job monitoring branch) tracks per-variant annotation outcomes with created_at timestamps, annotation_type discrimination, version tracking, and a current boolean managed by AnnotationStatusManager. This provides the temporal metadata needed to resolve annotation state at a given as_of date.
as_of resolution logic
For a given allele ID and as_of date:
Find all MappedVariant records matching the allele ID (across all mapping revisions, not just current)
For each variant's score set, walk the superseding chain: pick the latest score set where published_date <= as_of
For each resolved score set's targets, pick the Mapping where mapped_date <= as_of with the highest revision
Return the MappedVariant records belonging to those resolved Mappings
For each variant, query VariantAnnotationStatus for each annotation type where created_at <= as_of, ordered by created_at descending, to resolve point-in-time annotation state. Fall back to current MappedVariant annotation associations where no VariantAnnotationStatus records exist.
Affected code paths
routers/variants.py — allele ID lookup endpoint; currently hardcodes MappedVariant.current == True
routers/mapped_variant.py — mapped variant endpoints; filter on current.is_(True)
routers/score_sets.py — score set mapped variant listing and CSV export
worker/jobs.py — map_variants_for_score_set() must set mapping_id on new MappedVariant records and sync current flag
lib/score_sets.py — superseding chain traversal logic; will need a date-aware variant
lib/annotation_status_manager.py — may need a get_annotation_as_of() method for temporal queries
view_models/mapped_variant.py — response models need mapping metadata fields
models/mapped_variant.py — add mapping_id FK, keep denormalized current
Summary
Variant pages are built by aggregating data across multiple score sets via ClinGen allele ID lookups. When score sets are remapped or superseded, the data shown on these pages changes — breaking bookmarks and making previously viewed data inaccessible. This issue adds support for viewing historical versions of variant pages by resolving variant data as it existed at a user-specified date.
This feature has two prerequisites:
Mappingrecord that captures alignment provenance, QC, and revision history. Theas_ofmechanism layers on top of that table.VariantAnnotationStatusmodel introduced by the job monitoring work tracks per-variant, per-annotation-type status withcreated_attimestamps. This enables temporal resolution of annotation state, closing the annotation drift gap.Problem
When a user bookmarks a variant page (keyed by ClinGen allele ID), the data they see can change or disappear due to two independent staleness vectors:
Mapping staleness: A score set's variants are remapped (e.g. due to a mapping API update). Old
MappedVariantrecords getcurrent=False, new ones are created. The new mapping may produce different HGVS representations, VRS IDs, ClinGen allele IDs, or annotation linkages.Score set staleness: A score set is superseded by a newer version. The old score set still exists, but the variant page aggregation shifts to the new score set's data — which may map to different alleles or have different scores.
Currently, all variant endpoints filter on
MappedVariant.current == True, and the allele ID lookup inrouters/variants.pyonly returns current mapped variants. There is no mechanism to access historical data, and a single endpoint (GET /vrs/{identifier}) supports anonly_currentparameter.Proposed behavior
as_ofquery parameterAdd an optional
as_of(date) query parameter to variant page endpoints. When provided, the API resolves data as it existed at that date across three dimensions:published_date <= as_of.Mappingrecord with the latestmapped_date <= as_of(requires the Mapping table from the prerequisite issue).VariantAnnotationStatusrecords wherecreated_at <= as_offor each annotation type. This enables accurate point-in-time reconstruction of which annotations (gnomAD, ClinVar, ClinGen, VEP) were present at the requested date.When
as_ofis omitted, behavior is unchanged — return current data only.Response metadata
The API response must include enough context for the frontend to render versioning UI. Per score set in the response:
mapping.idmapping.mapped_datemapping.revisionmapping.currentcurrent_mappingsuperseded_byTop-level response fields:
as_ofnullif viewing currentviewing_latestfalsewhen any score set is showing non-current dataThis gives the frontend what it needs to render a banner (e.g. "Viewing data as of Jan 15, 2025. Some score sets have been remapped since then.") and per-score-set staleness indicators.
Denormalized
currenton MappedVariantKeep the
currentboolean onMappedVariantas a denormalized copy of theMapping.currentflag. This preserves the performance of existing hot-path queries (allele ID lookups, score set variant listings) which filter onMappedVariant.currentwithout requiring a join. The worker job already touches everyMappedVariantduring remapping, so keeping the two booleans in sync within the same transaction is straightforward.Acceptance criteria
as_ofquery parameter is accepted on the allele ID lookup endpoint (POST /variants/clingen-allele-id-lookups) and mapped variant endpointsas_ofis provided, the response resolves the correct Mapping revision per target based onmapped_date <= as_ofas_ofis provided, superseded score sets are included if they were the active version in their chain as of that date (based on the superseding score set'spublished_date)as_ofis provided, annotation state is resolved fromVariantAnnotationStatusrecords withcreated_at <= as_ofwhere availableVariantAnnotationStatusrecords do not exist for a given annotation (pre-deployment data), the current annotation associations on theMappedVariantare returned as a fallbackas_ofis omitted, behavior is identical to current behavior (onlycurrent=Truemapped variants returned)revision,mapped_date,current,current_mappingpointer)as_ofandviewing_latestfieldsMappedVariant.currentis kept in sync withMapping.currentduring remappingMappedVariant.current == Truecontinue to work without joins or performance regressionImplementation notes
Prerequisites
Mapping table — introduces a per-target record:
The Mapping record stores alignment provenance, QC metrics,
revision(integer, per target),current(boolean, indexed), andmapped_date(DateTime).MappedVariantgains amapping_idFK. Fields currently duplicated across everyMappedVariantin a run (mapped_date,mapping_api_version,vrs_version) move to the Mapping record.Annotation status tracking — the
VariantAnnotationStatusmodel (from the job monitoring branch) tracks per-variant annotation outcomes withcreated_attimestamps,annotation_typediscrimination,versiontracking, and acurrentboolean managed byAnnotationStatusManager. This provides the temporal metadata needed to resolve annotation state at a givenas_ofdate.as_ofresolution logicFor a given allele ID and
as_ofdate:MappedVariantrecords matching the allele ID (across all mapping revisions, not just current)published_date <= as_ofMappingwheremapped_date <= as_ofwith the highest revisionMappedVariantrecords belonging to those resolved MappingsVariantAnnotationStatusfor each annotation type wherecreated_at <= as_of, ordered bycreated_atdescending, to resolve point-in-time annotation state. Fall back to currentMappedVariantannotation associations where noVariantAnnotationStatusrecords exist.Affected code paths
routers/variants.py— allele ID lookup endpoint; currently hardcodesMappedVariant.current == Truerouters/mapped_variant.py— mapped variant endpoints; filter oncurrent.is_(True)routers/score_sets.py— score set mapped variant listing and CSV exportworker/jobs.py—map_variants_for_score_set()must setmapping_idon newMappedVariantrecords and synccurrentflaglib/score_sets.py— superseding chain traversal logic; will need a date-aware variantlib/annotation_status_manager.py— may need aget_annotation_as_of()method for temporal queriesview_models/mapped_variant.py— response models need mapping metadata fieldsmodels/mapped_variant.py— addmapping_idFK, keep denormalizedcurrent