What is the purpose of DAGSTER_HOME/history/runs/index.db? #33407
-
|
I'm working on some ops to clean up DAGSTER_HOME on a sqlite backend. One of the biggest storage hogs is DAGSTER_HOME/history/runs/index.db. I can't find anything about this in the documentation- does anyone know what is the purpose of this db? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
|
What it storesWhen Dagster uses the default local SQLite storage backend, each run's event log is stored as a separate SQLite file under
Without this index, Dagster would have to open every individual run file to answer a query like "give me all failed runs from last week" — which is O(n) in the number of runs. The index makes this O(1). Why it gets largeFor long-lived Dagster deployments with many runs, How to clean it up safelyDagster exposes a cleanup command specifically for this: # Purge run history older than 7 days (keeps events too by default)
dagster run delete --older-than 7 --all
# Or use the Python API for finer control:
from dagster import DagsterInstance
from datetime import datetime, timedelta
with DagsterInstance.get() as instance:
cutoff = datetime.utcnow() - timedelta(days=30)
runs = instance.get_runs(
filters=RunsFilter(statuses=[DagsterRunStatus.SUCCESS],
updated_before=cutoff)
)
for run in runs:
instance.delete_run(run.run_id)After deleting runs, reclaim disk space by running import sqlite3
with sqlite3.connect("path/to/DAGSTER_HOME/history/runs/index.db") as conn:
conn.execute("VACUUM")Production noteIn production deployments (PostgreSQL or MySQL backend), this file does not exist — the equivalent data lives in the database. If |
Beta Was this translation helpful? Give feedback.
DAGSTER_HOME/history/runs/index.dbis a SQLite index database that Dagster maintains to make run-history queries fast without scanning individual run files.What it stores
When Dagster uses the default local SQLite storage backend, each run's event log is stored as a separate SQLite file under
DAGSTER_HOME/storage/{run_id}/. Theindex.dbkeeps a centralized, queryable index of run-level metadata across all of those per-run files:Without this index, Dagster would have to open every individual run file to answer a query like "give me all failed ru…