Coordinated ingestion and internal consistency across related data

## Description

BRC Analytics currently ingests related data via multiple scripts and upstream sources (e.g. organism and assembly metadata from NCBI; SRA-derived summaries from ENA) and potentially at different times. 

As ingestion expands (e.g. adding SRA metadata), this raises concerns about internal consistency: related entities may reflect different upstream states even when sourced from the same resource.

## Core concerns

 - Organisms/ assemblies and derived summaries (e.g. SRA run counts) are generated by different scripts.
 - Different scripts can potentially be run at different times, or using different resources (ENA and NCBI take time to sync)
 - It is unclear yet how ingesting SRA metadata will work, but it does seem clear that the data ingested will relate to data produced by these existing scripts
 - Relationships between all of these entities are implicit and at risk of internal inconsistency
 
## Impact/ Urgency

This is another one that isn't really 'critical' yet but is probably worth thinking some about before we make it any worse. It seems to me we're at a sort of inflection point with the SRA metadata discussions. In the immediate term (pre-SRA-metadata) I think our worst case is stepper UI possibly claiming the wrong number of sequences to browse from ENA. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Coordinated ingestion and internal consistency across related data #1049

Description

Core concerns

Impact/ Urgency

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Coordinated ingestion and internal consistency across related data #1049

Description

Description

Core concerns

Impact/ Urgency

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions