-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Description
BRC Analytics currently ingests related data via multiple scripts and upstream sources (e.g. organism and assembly metadata from NCBI; SRA-derived summaries from ENA) and potentially at different times.
As ingestion expands (e.g. adding SRA metadata), this raises concerns about internal consistency: related entities may reflect different upstream states even when sourced from the same resource.
Core concerns
- Organisms/ assemblies and derived summaries (e.g. SRA run counts) are generated by different scripts.
- Different scripts can potentially be run at different times, or using different resources (ENA and NCBI take time to sync)
- It is unclear yet how ingesting SRA metadata will work, but it does seem clear that the data ingested will relate to data produced by these existing scripts
- Relationships between all of these entities are implicit and at risk of internal inconsistency
Impact/ Urgency
This is another one that isn't really 'critical' yet but is probably worth thinking some about before we make it any worse. It seems to me we're at a sort of inflection point with the SRA metadata discussions. In the immediate term (pre-SRA-metadata) I think our worst case is stepper UI possibly claiming the wrong number of sequences to browse from ENA.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status