Skip to content
This repository was archived by the owner on Nov 12, 2024. It is now read-only.
This repository was archived by the owner on Nov 12, 2024. It is now read-only.

"Snapshot ID" Field Implementation (dependency mitigation) #450

@a27cheung

Description

@a27cheung

As data sources wind down (& perhaps shut down?)...

"I haven't seen any winding down yet, but yes this will be an issue in the future. In our data, we have a hack to work around data sources which stop updating which is to flag them with "skip" which prevents unit tests from failing and forces to fetch the latest valid data. But if we make any other configuration changes, the data would be lost.

A good solution to this would be to add a "snapshot ID" field which, if populated, we wouldn't even try to go to the original data source and instead we fetch the intermediate file from the last successful processing of that data source (which we have saved). Unfortunately, the intermediate files are not externally available so that makes the fetch step of the pipelines not reproducible. I don't think there's a way around that: the original data source is gone, so of course you can't reproduce our work."

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions