Skip to content

Update documentation for metric reloads and incremental reloads #2874

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions docs/statsig-warehouse-native/features/incremental-reloads.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,16 @@ last_update:
Incremental reloads save state from the last load, and load from the latest data read with a small buffer to ensure completeness. This job wipes data since the last load, plus that buffer, and then appends all new data to the staging datasets before calculating results for changed days.

This is the recommended way to load active experiments, and is used for ongoing, daily loads -- especially when datasets are large.

## The Buffer Period

Incremental reloads include a built-in buffer that reprocesses the last few days of data. This buffer exists because:
- Many data sources have streaming data landing that can't be tracked perfectly
- Late-landing data may arrive after initial processing
- It ensures data completeness even with slightly inconsistent data pipelines

## Repeated Incremental Loads

If an experiment is fully caught up and you run another incremental load, Statsig will still reprocess the last few days of data due to this buffer. This ensures that any late-arriving data is incorporated into your analysis.

For example, if you run one incremental load in the morning and another in the afternoon of the same day, the afternoon load will reprocess the buffer period to catch any data that landed between the two load times.
20 changes: 20 additions & 0 deletions docs/statsig-warehouse-native/features/metric-reloads.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,23 @@ last_update:
Metric reloads drop all data from staging pipelines associated with a metric, and restate that data from scratch. Where this data is interconnected (e.g. ratios, funnels, etc.), related entities will be updated as well.

This is a big time saver in cases where a new metric needs to be added to an experiment, or a metric definition has changed, since you can avoid reloading N unrelated experiment metrics.

## How Metric Reloads Work

Metric reloads:
- Drop all data for the experiment for that metric for the entire experiment time period
- Restate that data from scratch
- The date range used is [experiment start, latest incremental/full load date loaded]

These are essentially full reloads scoped to a specific metric. The problem they solve is if a metric definition or underlying data has changed meaningfully.

## Common Use Cases

- Adding a new metric to an existing experiment
- Updating a metric definition
- Fixing data quality issues for a specific metric
- Re-analyzing a subset of metrics without reloading everything

## Limitations

Currently, metric reloads are not available on an incremental basis. Each metric reload will perform a complete restatement of the data for that metric. This is by design to avoid complex data integrity issues that could arise if metrics were at different stages of processing.
18 changes: 18 additions & 0 deletions docs/statsig-warehouse-native/features/reloads.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,21 @@ To add to this, Statsig offers turbo mode, which skips some enrichment calculati
### Cleaning Up Storage

Statsig will automatically clean up after itself for explore datasets, power analyses, and stratification artifacts. Once you make a decision on an experiment, you can choose whether or not to delete the staging datasets and/or the result datasets; you can always come back to the experiment to clean up from the experiment menu as well.

### Handling Metrics with Different Processing Times

In many data environments, some metrics become available earlier than others. For example, simple event-based metrics might be processed by 9 AM, while complex aggregations or "core" metrics might not be ready until 2 PM.

This can create challenges when you want to process experiment results incrementally, as all metrics must be available for the experiment analysis logic to run.

#### Potential Approaches

1. **Wait for all metrics**: The standard approach is to wait until all metrics are available before running an incremental load. This ensures data consistency but may delay analysis.

2. **Separate processing for core metrics**: For metrics that consistently lag behind, you could:
- Run one incremental load for the main set of metrics
- Use a targeted metric reload for just the core metrics that finish later

This approach needs to be used carefully, as having metrics at different processing stages can potentially lead to confusing analysis states.

Statsig currently doesn't support incremental reloads for individual metrics to avoid complex data sync issues, but this is a potential future enhancement.