Context
Source metadata is deprecated in favor of Origin. Origins are richer, live at indicator level, and are uploaded to Grapher via origins / origins_variables; legacy Sources use variables.sourceId and sources.
PR #5978 migrates all active snapshot DVC legacy source: / source_name: metadata referenced from dag/migrated.yml to origin: and simplifies those backported snapshot scripts so they no longer use SnapshotMeta, snap_config, or fill_from_backport_snapshot.
This issue tracks the remaining active-DAG Source → Origin migration work after #5978 . Scope should stay on active DAG files first; repo-wide inactive snapshots are much larger and lower priority.
Current active-DAG status after #5978
Active legacy snapshot DVC files outside migrated snapshots: 73
Active metadata files with plural sources:: 35 (72 keys)
Active step code files with Source / .sources-style handling: 39
dag/migrated.yml legacy snapshot DVC files remaining: 0
Repo-wide snapshot DVC files with legacy source: / source_name: (mostly inactive): 6,519
Remaining active legacy snapshot DVC files
health.yml (19)
snapshots/fasttrack/2023-04-30/paratz.csv.dvc
snapshots/fasttrack/2023-05-31/cholera.csv.dvc
snapshots/fasttrack/2024-06-17/guinea_worm.csv.dvc
snapshots/health/2023-04-18/wgm_mental_health.zip.dvc
snapshots/health/2023-04-25/wgm_2018.xlsx.dvc
snapshots/health/2023-05-04/global_wellbeing.xlsx.dvc
snapshots/health/2023-08-22/unaids_deaths_averted_art.xlsx.dvc
snapshots/oecd/2018-03-11/road_deaths_and_injuries.feather.dvc
snapshots/oecd/2023-05-01/health_pharma_market.csv.dvc
snapshots/postnatal_care/2022-09-19/postnatal_care.csv.dvc
snapshots/unicef/2023-06-16/diarrhea.xlsx.dvc
snapshots/who/2022-09-01/autopsy.csv.dvc
snapshots/who/2023-04-03/flu_elderly.xlsx.dvc
snapshots/who/2023-04-03/flu_vaccine_policy.xlsx.dvc
snapshots/who/2023-06-29/guinea_worm.csv.dvc
snapshots/who/2023-07-14/standard_age_distribution.csv.dvc
snapshots/who/2025-08-01/guinea_worm.csv.dvc
snapshots/who/latest/fluid.csv.dvc
snapshots/who/latest/flunet.csv.dvc
environment.yml (10)
snapshots/unep/2023-03-17/consumption_controlled_substances.bromochloromethane.xlsx.dvc
snapshots/unep/2023-03-17/consumption_controlled_substances.carbon_tetrachloride.xlsx.dvc
snapshots/unep/2023-03-17/consumption_controlled_substances.chlorofluorocarbons.xlsx.dvc
snapshots/unep/2023-03-17/consumption_controlled_substances.halons.xlsx.dvc
snapshots/unep/2023-03-17/consumption_controlled_substances.hydrobromofluorocarbons.xlsx.dvc
snapshots/unep/2023-03-17/consumption_controlled_substances.hydrochlorofluorocarbons.xlsx.dvc
snapshots/unep/2023-03-17/consumption_controlled_substances.hydrofluorocarbons.xlsx.dvc
snapshots/unep/2023-03-17/consumption_controlled_substances.methyl_bromide.xlsx.dvc
snapshots/unep/2023-03-17/consumption_controlled_substances.methyl_chloroform.xlsx.dvc
snapshots/unep/2023-03-17/consumption_controlled_substances.other_fully_halogenated.xlsx.dvc
main.yml (10)
snapshots/fasttrack/2023-01-03/long_term_homicide_rates_in_europe.csv.dvc
snapshots/papers/2023-06-07/commodity_prices.xlsx.dvc
snapshots/research_development/2023-05-24/us_patents.htm.dvc
snapshots/technology/2023-03-08/microprocessor_trend.dat.dvc
snapshots/technology/2023-03-16/hcctad.txt.dvc
snapshots/un/2023-08-16/un_sdg.feather.dvc
snapshots/un/2023-08-16/un_sdg_dimension.json.dvc
snapshots/un/2023-08-16/un_sdg_unit.csv.dvc
snapshots/wb/2021-07-01/wb_income.xlsx.dvc
snapshots/wvs/2023-06-25/longitudinal_wvs.csv.dvc
fasttrack.yml (9)
snapshots/fasttrack/2023-06-19/world_population_comparison.csv.dvc
snapshots/fasttrack/2023-08-07/pain_hours_days_hen_systems.csv.dvc
snapshots/fasttrack/2023-10-05/great_pacific_garbage_lebreton.csv.dvc
snapshots/fasttrack/latest/baxter_2013_gbd_adult_coverage.csv.dvc
snapshots/fasttrack/latest/democracy_freedom_house.csv.dvc
snapshots/fasttrack/latest/global_maternal_offspring_loss.csv.dvc
snapshots/fasttrack/latest/treatment_gap_anxiety_disorders_world_mental_health_surveys.csv.dvc
snapshots/fasttrack/latest/under_five_mortality_lmics.csv.dvc
snapshots/fasttrack/latest/whm_treatment_gap_anxiety_disorders.csv.dvc
war.yml (7)
snapshots/war/2023-01-09/bouthoul_carrere_1978.csv.dvc
snapshots/war/2023-01-09/clodfelter_2017.csv.dvc
snapshots/war/2023-01-09/dunnigan_martel_1987.csv.dvc
snapshots/war/2023-01-09/eckhardt_1991.csv.dvc
snapshots/war/2023-01-09/kaye_1985.csv.dvc
snapshots/war/2023-01-09/sorokin_1937.csv.dvc
snapshots/war/2023-01-09/sutton_1971.csv.dvc
education.yml (5)
snapshots/education/2023-08-09/numeracy.xlsx.dvc
snapshots/education/2023-08-09/numeracy_gender.xlsx.dvc
snapshots/education/2023-08-09/years_of_education.xlsx.dvc
snapshots/education/2023-08-09/years_of_education_gender.xlsx.dvc
snapshots/education/2023-08-09/years_of_education_gini.xlsx.dvc
covid.yml (4)
snapshots/excess_mortality/latest/hmd_stmf.csv.dvc
snapshots/excess_mortality/latest/wmd.csv.dvc
snapshots/excess_mortality/latest/xm_karlinsky_kobak.csv.dvc
snapshots/excess_mortality/latest/xm_karlinsky_kobak_ages.csv.dvc
demography.yml (4)
snapshots/fasttrack/2023-06-19/world_population_comparison.csv.dvc
snapshots/hyde/2017/baseline.zip.dvc
snapshots/hyde/2017/general_files.zip.dvc
snapshots/un/2022-07-11/un_wpp.zip.dvc
artificial_intelligence.yml (2)
snapshots/artificial_intelligence/2023-07-07/semiconductors_cset.csv.dvc
snapshots/world_risk_poll/2023-06-26/wrp_2021.zip.dvc
emissions.yml (2)
snapshots/andrew/2019-12-03/co2_mitigation_curves_1p5celsius.csv.dvc
snapshots/andrew/2019-12-03/co2_mitigation_curves_2celsius.csv.dvc
agriculture.yml (1)
snapshots/usda_ers/2023-06-07/food_expenditure_since_2017.xlsx.dvc
biodiversity.yml (1)
snapshots/biodiversity/2021-01-01/habitat_loss.feather.dvc
Remaining active metadata files with sources:
health.yml (8)
etl/steps/data/garden/postnatal_care/2022-09-19/postnatal_care.meta.yml (1 sources: keys)
etl/steps/data/garden/who/2023-06-01/cholera.meta.yml (1 sources: keys)
etl/steps/data/garden/who/2023-06-29/guinea_worm_certification.meta.yml (1 sources: keys)
etl/steps/data/garden/who/2023-07-13/autopsy.meta.yml (1 sources: keys)
etl/steps/data/grapher/postnatal_care/2022-09-19/postnatal_care.meta.yml (1 sources: keys)
etl/steps/data/grapher/who/2023-07-13/autopsy.meta.yml (1 sources: keys)
etl/steps/data/meadow/postnatal_care/2022-09-19/postnatal_care.meta.yml (1 sources: keys)
etl/steps/data/meadow/unicef/2023-06-16/diarrhea.meta.yml (1 sources: keys)
migrated.yml (7)
etl/steps/data/garden/clio_infra/2017-09-09/clio_infra__biological_standards_of_living__baten_and_blum__2015.meta.yml (2 sources: keys)
etl/steps/data/garden/clio_infra/2017-09-09/clio_infra__human_capital.meta.yml (6 sources: keys)
etl/steps/data/garden/waste/2018-02-15/waste_production_and_management.meta.yml (6 sources: keys)
etl/steps/data/garden/worldbank_wdi/2017-11-14/world_bank_se4all_database__energy_efficiency.meta.yml (16 sources: keys)
etl/steps/data/grapher/biodiversity/2022/living_planet_index.meta.yml (2 sources: keys)
etl/steps/data/grapher/gapminder/2019-05-25/fertility_rate.meta.yml (1 sources: keys)
etl/steps/data/grapher/iucn/2022-12-08/threatened_and_evaluated_species.meta.yml (2 sources: keys)
war.yml (7)
etl/steps/data/garden/war/2023-01-18/bouthoul_carrere_1978.meta.yml (1 sources: keys)
etl/steps/data/garden/war/2023-01-18/clodfelter_2017.meta.yml (1 sources: keys)
etl/steps/data/garden/war/2023-01-18/dunnigan_martel_1987.meta.yml (1 sources: keys)
etl/steps/data/garden/war/2023-01-18/eckhardt_1991.meta.yml (1 sources: keys)
etl/steps/data/garden/war/2023-01-18/kaye_1985.meta.yml (1 sources: keys)
etl/steps/data/garden/war/2023-01-18/sorokin_1937.meta.yml (1 sources: keys)
etl/steps/data/garden/war/2023-01-18/sutton_1971.meta.yml (1 sources: keys)
main.yml (6)
etl/steps/data/garden/gapminder/2023-03-31/population.meta.yml (1 sources: keys)
etl/steps/data/garden/homicide/2024-07-30/homicide_long_run_omm.meta.yml (1 sources: keys)
etl/steps/data/garden/un/2024-09-11/igme.meta.yml (1 sources: keys)
etl/steps/data/garden/wvs/2023-06-25/longitudinal_wvs.meta.yml (1 sources: keys)
etl/steps/data/grapher/un/2023-08-16/un_sdg.meta.yml (2 sources: keys)
etl/steps/data/grapher/worldbank_wdi/2024-05-20/wdi.meta.yml (9 sources: keys)
artificial_intelligence.yml (2)
etl/steps/data/garden/artificial_intelligence/2023-06-26/ai_wrp_2021.meta.yml (1 sources: keys)
etl/steps/data/garden/artificial_intelligence/2023-06-26/ai_wrp_2021_grouped.meta.yml (1 sources: keys)
demography.yml (2)
etl/steps/data/garden/demography/2023-06-27/world_population_comparison.meta.yml (1 sources: keys)
etl/steps/data/garden/gapminder/2023-03-31/population.meta.yml (1 sources: keys)
education.yml (1)
etl/steps/data/garden/education/2023-08-09/clio_infra_education.meta.yml (1 sources: keys)
emissions.yml (1)
etl/steps/data/garden/andrew/2019-12-03/co2_mitigation_curves.meta.yml (1 sources: keys)
environment.yml (1)
etl/steps/data/garden/unep/2023-03-17/consumption_controlled_substances.meta.yml (1 sources: keys)
wizard.yml (1)
etl/steps/data/garden/dummy/2023-10-12/dummy_monster.meta.yml (1 sources: keys)
Active code files that still touch Source/sources
These need separate review because some are compatibility paths/backport code rather than live dataset metadata.
war.yml (12)
etl/steps/data/garden/war/2023-09-21/cow.py
etl/steps/data/grapher/war/2023-09-21/brecke.py
etl/steps/data/grapher/war/2023-09-21/cow.py
etl/steps/data/grapher/war/2023-09-21/cow_mid.py
etl/steps/data/grapher/war/2023-09-21/mars.py
etl/steps/data/grapher/war/2023-09-21/mie.py
etl/steps/data/grapher/war/2023-09-21/prio_v31.py
etl/steps/data/grapher/war/2025-06-13/ucdp.py
etl/steps/data/grapher/war/latest/ucdp_preview.py
etl/steps/data/meadow/war/2023-01-10/kaye_1985.py
etl/steps/data/meadow/war/2023-01-10/sorokin_1937.py
etl/steps/data/meadow/war/2023-01-10/sutton_1971.py
health.yml (8)
etl/steps/data/garden/health/2023-04-18/wgm_mental_health.py
etl/steps/data/garden/maternal_mortality/2024-07-08/maternal_mortality.py
etl/steps/data/garden/oecd/2023-05-01/health_pharma_market.py
etl/steps/data/garden/owid/latest/covid.py
etl/steps/data/meadow/gapminder/2024-07-08/maternal_mortality.py
etl/steps/data/meadow/health/2026-01-19/unaids.py
etl/steps/data/meadow/who/2024-01-03/gho.py
etl/steps/data/meadow/who/latest/fluid.py
poverty_inequality.yml (3)
etl/steps/data/external/owid_grapher/latest/int_dollar_conversions.py
etl/steps/data/meadow/cedlas/2025-04-01/sedlac.py
etl/steps/data/meadow/igh/2024-07-05/better_data_homelessness.py
chartbook.yml (2)
etl/steps/data/meadow/cedlas/2024-07-31/sedlac_poverty_2016.py
etl/steps/data/meadow/cedlas/2024-07-31/sedlac_poverty_2018.py
demography.yml (2)
etl/steps/data/grapher/un/2022-07-11/un_wpp.py
etl/steps/data/open_numbers/open_numbers/latest/gapminder__systema_globalis.py
faostat.yml (2)
etl/steps/data/meadow/faostat/2025-03-17/faostat_metadata.py
etl/steps/data/meadow/faostat/2026-02-25/faostat_metadata.py
open_numbers.yml (2)
etl/steps/data/open_numbers/open_numbers/latest/gapminder__systema_globalis.py
etl/steps/data/open_numbers/open_numbers/latest/open_numbers__world_development_indicators.py
agriculture.yml (1)
etl/steps/data/meadow/agriculture/2024-05-23/harris_et_al_2015.py
biodiversity.yml (1)
etl/steps/data/meadow/biodiversity/2026-04-16/cherry_blossom.py
covid.yml (1)
etl/steps/data/garden/excess_mortality/latest/excess_mortality/__init__.py
emissions.yml (1)
etl/steps/data/meadow/emissions/2025-11-26/electricity_emission_factors.py
energy.yml (1)
etl/steps/data/meadow/uk_beis/2023-12-12/uk_historical_electricity.py
equality.yml (1)
etl/steps/data/garden/wb/2025-09-08/gender_statistics.py
growth.yml (1)
etl/steps/data/garden/maternal_mortality/2024-07-08/maternal_mortality.py
main.yml (1)
etl/steps/data/grapher/un/2023-08-16/un_sdg.py
migration.yml (1)
etl/steps/data/meadow/unicef/2026-01-07/child_migration.py
minerals.yml (1)
etl/steps/data/garden/usgs/2025-12-15/mineral_commodity_summaries.py
Suggested next steps
Finish dag/migrated.yml metadata YAML: convert the 7 remaining .meta.yml files with sources:. These are not snapshot DVCs; examples include Clio Infra, Waste, World Bank SE4ALL, and a few grapher files with empty sources: [] next to existing origins:.
Convert the 73 remaining active snapshot DVC files outside dag/migrated.yml using the pattern from Migrate Source → Origin for dag/migrated.yml snapshots #5978 . Use source metadata once as migration input, then hardcode origin: in the DVC and simplify scripts where they use backported config metadata.
Convert active .meta.yml variable-level sources: to variable-level origins:. For empty sources: [] with existing origins:, just remove the empty sources:.
Review active code files that still touch Source / .sources and separate true live metadata usage from compatibility/backport helpers.
Add a guardrail to prevent new legacy source: / sources: in active DAG files once the migration is complete.
Notes from #5978
Do not infer date_published as simply the largest year in the metadata. Codex caught this: data/projection ranges like 1970–2050 or 2030-50 can otherwise become publication dates.
Safer inference used in Migrate Source → Origin for dag/migrated.yml snapshots #5978 : prefer explicit publication_date / publication_year; otherwise look at citation-like source.published_by and source.name, skipping obvious range/projection years; use snapshot version only when clearly reviewing as a fallback.
Useful Source → Origin mapping:
source.name → origin.title
source.published_by → origin.producer
source.description → origin.description
source.url → origin.url_main
source.source_data_url → origin.url_download
source.date_accessed → origin.date_accessed
source.publication_date / publication_year → origin.date_published
Schema requires date_published and citation_full for snapshot origins.
Regenerate this audit
Use a small script over active dag/*.yml files to collect active DAG references. For broad repo searches:
rg -n ' ^\s+(source|source_name)\s*:' snapshots --glob ' *.dvc'
rg -n ' ^\s+sources\s*:' etl/steps snapshots dag --glob ' *.yml' --glob ' *.dvc' --glob ' !dag/archive/**'
For dag/migrated.yml specifically, #5978 adds scripts/migrate_migrated_sources_to_origins.py; after #5978 it should print only the TSV header, confirming no migrated snapshot DVC legacy sources remain.
Context
Sourcemetadata is deprecated in favor ofOrigin. Origins are richer, live at indicator level, and are uploaded to Grapher viaorigins/origins_variables; legacy Sources usevariables.sourceIdandsources.PR #5978 migrates all active snapshot DVC legacy
source:/source_name:metadata referenced fromdag/migrated.ymltoorigin:and simplifies those backported snapshot scripts so they no longer useSnapshotMeta,snap_config, orfill_from_backport_snapshot.This issue tracks the remaining active-DAG Source → Origin migration work after #5978. Scope should stay on active DAG files first; repo-wide inactive snapshots are much larger and lower priority.
Current active-DAG status after #5978
sources:: 35 (72 keys)Source/.sources-style handling: 39dag/migrated.ymllegacy snapshot DVC files remaining: 0source:/source_name:(mostly inactive): 6,519Remaining active legacy snapshot DVC files
health.yml(19)snapshots/fasttrack/2023-04-30/paratz.csv.dvcsnapshots/fasttrack/2023-05-31/cholera.csv.dvcsnapshots/fasttrack/2024-06-17/guinea_worm.csv.dvcsnapshots/health/2023-04-18/wgm_mental_health.zip.dvcsnapshots/health/2023-04-25/wgm_2018.xlsx.dvcsnapshots/health/2023-05-04/global_wellbeing.xlsx.dvcsnapshots/health/2023-08-22/unaids_deaths_averted_art.xlsx.dvcsnapshots/oecd/2018-03-11/road_deaths_and_injuries.feather.dvcsnapshots/oecd/2023-05-01/health_pharma_market.csv.dvcsnapshots/postnatal_care/2022-09-19/postnatal_care.csv.dvcsnapshots/unicef/2023-06-16/diarrhea.xlsx.dvcsnapshots/who/2022-09-01/autopsy.csv.dvcsnapshots/who/2023-04-03/flu_elderly.xlsx.dvcsnapshots/who/2023-04-03/flu_vaccine_policy.xlsx.dvcsnapshots/who/2023-06-29/guinea_worm.csv.dvcsnapshots/who/2023-07-14/standard_age_distribution.csv.dvcsnapshots/who/2025-08-01/guinea_worm.csv.dvcsnapshots/who/latest/fluid.csv.dvcsnapshots/who/latest/flunet.csv.dvcenvironment.yml(10)snapshots/unep/2023-03-17/consumption_controlled_substances.bromochloromethane.xlsx.dvcsnapshots/unep/2023-03-17/consumption_controlled_substances.carbon_tetrachloride.xlsx.dvcsnapshots/unep/2023-03-17/consumption_controlled_substances.chlorofluorocarbons.xlsx.dvcsnapshots/unep/2023-03-17/consumption_controlled_substances.halons.xlsx.dvcsnapshots/unep/2023-03-17/consumption_controlled_substances.hydrobromofluorocarbons.xlsx.dvcsnapshots/unep/2023-03-17/consumption_controlled_substances.hydrochlorofluorocarbons.xlsx.dvcsnapshots/unep/2023-03-17/consumption_controlled_substances.hydrofluorocarbons.xlsx.dvcsnapshots/unep/2023-03-17/consumption_controlled_substances.methyl_bromide.xlsx.dvcsnapshots/unep/2023-03-17/consumption_controlled_substances.methyl_chloroform.xlsx.dvcsnapshots/unep/2023-03-17/consumption_controlled_substances.other_fully_halogenated.xlsx.dvcmain.yml(10)snapshots/fasttrack/2023-01-03/long_term_homicide_rates_in_europe.csv.dvcsnapshots/papers/2023-06-07/commodity_prices.xlsx.dvcsnapshots/research_development/2023-05-24/us_patents.htm.dvcsnapshots/technology/2023-03-08/microprocessor_trend.dat.dvcsnapshots/technology/2023-03-16/hcctad.txt.dvcsnapshots/un/2023-08-16/un_sdg.feather.dvcsnapshots/un/2023-08-16/un_sdg_dimension.json.dvcsnapshots/un/2023-08-16/un_sdg_unit.csv.dvcsnapshots/wb/2021-07-01/wb_income.xlsx.dvcsnapshots/wvs/2023-06-25/longitudinal_wvs.csv.dvcfasttrack.yml(9)snapshots/fasttrack/2023-06-19/world_population_comparison.csv.dvcsnapshots/fasttrack/2023-08-07/pain_hours_days_hen_systems.csv.dvcsnapshots/fasttrack/2023-10-05/great_pacific_garbage_lebreton.csv.dvcsnapshots/fasttrack/latest/baxter_2013_gbd_adult_coverage.csv.dvcsnapshots/fasttrack/latest/democracy_freedom_house.csv.dvcsnapshots/fasttrack/latest/global_maternal_offspring_loss.csv.dvcsnapshots/fasttrack/latest/treatment_gap_anxiety_disorders_world_mental_health_surveys.csv.dvcsnapshots/fasttrack/latest/under_five_mortality_lmics.csv.dvcsnapshots/fasttrack/latest/whm_treatment_gap_anxiety_disorders.csv.dvcwar.yml(7)snapshots/war/2023-01-09/bouthoul_carrere_1978.csv.dvcsnapshots/war/2023-01-09/clodfelter_2017.csv.dvcsnapshots/war/2023-01-09/dunnigan_martel_1987.csv.dvcsnapshots/war/2023-01-09/eckhardt_1991.csv.dvcsnapshots/war/2023-01-09/kaye_1985.csv.dvcsnapshots/war/2023-01-09/sorokin_1937.csv.dvcsnapshots/war/2023-01-09/sutton_1971.csv.dvceducation.yml(5)snapshots/education/2023-08-09/numeracy.xlsx.dvcsnapshots/education/2023-08-09/numeracy_gender.xlsx.dvcsnapshots/education/2023-08-09/years_of_education.xlsx.dvcsnapshots/education/2023-08-09/years_of_education_gender.xlsx.dvcsnapshots/education/2023-08-09/years_of_education_gini.xlsx.dvccovid.yml(4)snapshots/excess_mortality/latest/hmd_stmf.csv.dvcsnapshots/excess_mortality/latest/wmd.csv.dvcsnapshots/excess_mortality/latest/xm_karlinsky_kobak.csv.dvcsnapshots/excess_mortality/latest/xm_karlinsky_kobak_ages.csv.dvcdemography.yml(4)snapshots/fasttrack/2023-06-19/world_population_comparison.csv.dvcsnapshots/hyde/2017/baseline.zip.dvcsnapshots/hyde/2017/general_files.zip.dvcsnapshots/un/2022-07-11/un_wpp.zip.dvcartificial_intelligence.yml(2)snapshots/artificial_intelligence/2023-07-07/semiconductors_cset.csv.dvcsnapshots/world_risk_poll/2023-06-26/wrp_2021.zip.dvcemissions.yml(2)snapshots/andrew/2019-12-03/co2_mitigation_curves_1p5celsius.csv.dvcsnapshots/andrew/2019-12-03/co2_mitigation_curves_2celsius.csv.dvcagriculture.yml(1)snapshots/usda_ers/2023-06-07/food_expenditure_since_2017.xlsx.dvcbiodiversity.yml(1)snapshots/biodiversity/2021-01-01/habitat_loss.feather.dvcRemaining active metadata files with
sources:health.yml(8)etl/steps/data/garden/postnatal_care/2022-09-19/postnatal_care.meta.yml(1sources:keys)etl/steps/data/garden/who/2023-06-01/cholera.meta.yml(1sources:keys)etl/steps/data/garden/who/2023-06-29/guinea_worm_certification.meta.yml(1sources:keys)etl/steps/data/garden/who/2023-07-13/autopsy.meta.yml(1sources:keys)etl/steps/data/grapher/postnatal_care/2022-09-19/postnatal_care.meta.yml(1sources:keys)etl/steps/data/grapher/who/2023-07-13/autopsy.meta.yml(1sources:keys)etl/steps/data/meadow/postnatal_care/2022-09-19/postnatal_care.meta.yml(1sources:keys)etl/steps/data/meadow/unicef/2023-06-16/diarrhea.meta.yml(1sources:keys)migrated.yml(7)etl/steps/data/garden/clio_infra/2017-09-09/clio_infra__biological_standards_of_living__baten_and_blum__2015.meta.yml(2sources:keys)etl/steps/data/garden/clio_infra/2017-09-09/clio_infra__human_capital.meta.yml(6sources:keys)etl/steps/data/garden/waste/2018-02-15/waste_production_and_management.meta.yml(6sources:keys)etl/steps/data/garden/worldbank_wdi/2017-11-14/world_bank_se4all_database__energy_efficiency.meta.yml(16sources:keys)etl/steps/data/grapher/biodiversity/2022/living_planet_index.meta.yml(2sources:keys)etl/steps/data/grapher/gapminder/2019-05-25/fertility_rate.meta.yml(1sources:keys)etl/steps/data/grapher/iucn/2022-12-08/threatened_and_evaluated_species.meta.yml(2sources:keys)war.yml(7)etl/steps/data/garden/war/2023-01-18/bouthoul_carrere_1978.meta.yml(1sources:keys)etl/steps/data/garden/war/2023-01-18/clodfelter_2017.meta.yml(1sources:keys)etl/steps/data/garden/war/2023-01-18/dunnigan_martel_1987.meta.yml(1sources:keys)etl/steps/data/garden/war/2023-01-18/eckhardt_1991.meta.yml(1sources:keys)etl/steps/data/garden/war/2023-01-18/kaye_1985.meta.yml(1sources:keys)etl/steps/data/garden/war/2023-01-18/sorokin_1937.meta.yml(1sources:keys)etl/steps/data/garden/war/2023-01-18/sutton_1971.meta.yml(1sources:keys)main.yml(6)etl/steps/data/garden/gapminder/2023-03-31/population.meta.yml(1sources:keys)etl/steps/data/garden/homicide/2024-07-30/homicide_long_run_omm.meta.yml(1sources:keys)etl/steps/data/garden/un/2024-09-11/igme.meta.yml(1sources:keys)etl/steps/data/garden/wvs/2023-06-25/longitudinal_wvs.meta.yml(1sources:keys)etl/steps/data/grapher/un/2023-08-16/un_sdg.meta.yml(2sources:keys)etl/steps/data/grapher/worldbank_wdi/2024-05-20/wdi.meta.yml(9sources:keys)artificial_intelligence.yml(2)etl/steps/data/garden/artificial_intelligence/2023-06-26/ai_wrp_2021.meta.yml(1sources:keys)etl/steps/data/garden/artificial_intelligence/2023-06-26/ai_wrp_2021_grouped.meta.yml(1sources:keys)demography.yml(2)etl/steps/data/garden/demography/2023-06-27/world_population_comparison.meta.yml(1sources:keys)etl/steps/data/garden/gapminder/2023-03-31/population.meta.yml(1sources:keys)education.yml(1)etl/steps/data/garden/education/2023-08-09/clio_infra_education.meta.yml(1sources:keys)emissions.yml(1)etl/steps/data/garden/andrew/2019-12-03/co2_mitigation_curves.meta.yml(1sources:keys)environment.yml(1)etl/steps/data/garden/unep/2023-03-17/consumption_controlled_substances.meta.yml(1sources:keys)wizard.yml(1)etl/steps/data/garden/dummy/2023-10-12/dummy_monster.meta.yml(1sources:keys)Active code files that still touch Source/sources
These need separate review because some are compatibility paths/backport code rather than live dataset metadata.
war.yml(12)etl/steps/data/garden/war/2023-09-21/cow.pyetl/steps/data/grapher/war/2023-09-21/brecke.pyetl/steps/data/grapher/war/2023-09-21/cow.pyetl/steps/data/grapher/war/2023-09-21/cow_mid.pyetl/steps/data/grapher/war/2023-09-21/mars.pyetl/steps/data/grapher/war/2023-09-21/mie.pyetl/steps/data/grapher/war/2023-09-21/prio_v31.pyetl/steps/data/grapher/war/2025-06-13/ucdp.pyetl/steps/data/grapher/war/latest/ucdp_preview.pyetl/steps/data/meadow/war/2023-01-10/kaye_1985.pyetl/steps/data/meadow/war/2023-01-10/sorokin_1937.pyetl/steps/data/meadow/war/2023-01-10/sutton_1971.pyhealth.yml(8)etl/steps/data/garden/health/2023-04-18/wgm_mental_health.pyetl/steps/data/garden/maternal_mortality/2024-07-08/maternal_mortality.pyetl/steps/data/garden/oecd/2023-05-01/health_pharma_market.pyetl/steps/data/garden/owid/latest/covid.pyetl/steps/data/meadow/gapminder/2024-07-08/maternal_mortality.pyetl/steps/data/meadow/health/2026-01-19/unaids.pyetl/steps/data/meadow/who/2024-01-03/gho.pyetl/steps/data/meadow/who/latest/fluid.pypoverty_inequality.yml(3)etl/steps/data/external/owid_grapher/latest/int_dollar_conversions.pyetl/steps/data/meadow/cedlas/2025-04-01/sedlac.pyetl/steps/data/meadow/igh/2024-07-05/better_data_homelessness.pychartbook.yml(2)etl/steps/data/meadow/cedlas/2024-07-31/sedlac_poverty_2016.pyetl/steps/data/meadow/cedlas/2024-07-31/sedlac_poverty_2018.pydemography.yml(2)etl/steps/data/grapher/un/2022-07-11/un_wpp.pyetl/steps/data/open_numbers/open_numbers/latest/gapminder__systema_globalis.pyfaostat.yml(2)etl/steps/data/meadow/faostat/2025-03-17/faostat_metadata.pyetl/steps/data/meadow/faostat/2026-02-25/faostat_metadata.pyopen_numbers.yml(2)etl/steps/data/open_numbers/open_numbers/latest/gapminder__systema_globalis.pyetl/steps/data/open_numbers/open_numbers/latest/open_numbers__world_development_indicators.pyagriculture.yml(1)etl/steps/data/meadow/agriculture/2024-05-23/harris_et_al_2015.pybiodiversity.yml(1)etl/steps/data/meadow/biodiversity/2026-04-16/cherry_blossom.pycovid.yml(1)etl/steps/data/garden/excess_mortality/latest/excess_mortality/__init__.pyemissions.yml(1)etl/steps/data/meadow/emissions/2025-11-26/electricity_emission_factors.pyenergy.yml(1)etl/steps/data/meadow/uk_beis/2023-12-12/uk_historical_electricity.pyequality.yml(1)etl/steps/data/garden/wb/2025-09-08/gender_statistics.pygrowth.yml(1)etl/steps/data/garden/maternal_mortality/2024-07-08/maternal_mortality.pymain.yml(1)etl/steps/data/grapher/un/2023-08-16/un_sdg.pymigration.yml(1)etl/steps/data/meadow/unicef/2026-01-07/child_migration.pyminerals.yml(1)etl/steps/data/garden/usgs/2025-12-15/mineral_commodity_summaries.pySuggested next steps
dag/migrated.ymlmetadata YAML: convert the 7 remaining.meta.ymlfiles withsources:. These are not snapshot DVCs; examples include Clio Infra, Waste, World Bank SE4ALL, and a few grapher files with emptysources: []next to existingorigins:.dag/migrated.ymlusing the pattern from Migrate Source → Origin for dag/migrated.yml snapshots #5978. Use source metadata once as migration input, then hardcodeorigin:in the DVC and simplify scripts where they use backported config metadata..meta.ymlvariable-levelsources:to variable-levelorigins:. For emptysources: []with existingorigins:, just remove the emptysources:.Source/.sourcesand separate true live metadata usage from compatibility/backport helpers.source:/sources:in active DAG files once the migration is complete.Notes from #5978
date_publishedas simply the largest year in the metadata. Codex caught this: data/projection ranges like1970–2050or2030-50can otherwise become publication dates.publication_date/publication_year; otherwise look at citation-likesource.published_byandsource.name, skipping obvious range/projection years; use snapshot version only when clearly reviewing as a fallback.source.name→origin.titlesource.published_by→origin.producersource.description→origin.descriptionsource.url→origin.url_mainsource.source_data_url→origin.url_downloadsource.date_accessed→origin.date_accessedsource.publication_date/publication_year→origin.date_publisheddate_publishedandcitation_fullfor snapshot origins.Regenerate this audit
Use a small script over active
dag/*.ymlfiles to collect active DAG references. For broad repo searches:For
dag/migrated.ymlspecifically, #5978 addsscripts/migrate_migrated_sources_to_origins.py; after #5978 it should print only the TSV header, confirming no migrated snapshot DVC legacy sources remain.