An Airflow plugin for clearing recently failed task instances with only a few clicks
The Big Red Button plugin provides a web interface and REST API for viewing and clearing recently failed task instances across your Airflow DAGs. Perfect for those moments when you need to quickly recover from cascading failures or retry multiple tasks at once.
- Bulk Clearing — Clear all recently failed and upstream-failed task instances across multiple DAGs with a single click
- Tag-Based Filtering — Filter DAGs by tags to selectively clear failures for specific groups of workflows
- Time Window Selection — Choose from 1 hour, 12 hours, 1 day, or 7 days
- Two-Step Confirmation — Every clearing operation requires explicit confirmation
- Audit Logging — All clearing operations are logged to Airflow's audit log
- REST API — Programmatic access for automation and integrations
- RBAC Integration — All endpoints require authentication; admin endpoints additionally require DAG task instance edit permission
- Apache Airflow 3.1+
- Python 3.9+
- Node.js 18+ (for building the UI)
- Download the latest release:
# Download from GitHub Releases (replace <version> with the tag from the Releases page, e.g. 3.1.6)
curl -L https://github.com/slackhq/bigredbutton/releases/latest/download/big_red_button-<version>.tar.gz -o big_red_button.tar.gzOr visit the Releases page and download the latest .tar.gz.
- Extract to your Airflow plugins directory:
tar -xzf big_red_button.tar.gz -C $AIRFLOW_HOME/plugins/- Restart your Airflow webserver:
airflow webserver- Access the plugin:
Navigate to your Airflow UI and look for:
- "Big Red Button" in the Admin menu (tag-filtered view)
- "Big Red Button: Admin" in the Admin menu (unrestricted view)
For Airflow 2.x installations, use the 2.10.2 tag:
curl -sL "https://github.com/slackhq/bigredbutton/archive/refs/tags/2.10.2.tar.gz" \
| tar -xz --strip-components=2 -C $AIRFLOW_HOME/plugins "bigredbutton-2.10.2/plugins/big_red_button"- Navigate to "Big Red Button" in the Airflow UI
- Select one or more tags (required)
- Choose a time window
- View failure counts grouped by DAG
- Click "Clear" on a specific DAG or "Clear All Failed DAGs"
- Confirm the operation
Route: /big-red-button
- Navigate to "Big Red Button: Admin" in the Airflow UI
- Choose a time window
- View all failures across all DAGs (tags are optional filters)
- Clear individual DAGs or all failures at once
Route: /big-red-button-admin
All API endpoints require authentication (JWT cookie or Bearer token). Admin endpoints additionally require DAG task instance edit permission (PUT on TASK_INSTANCE). Set BRB_AUTH_ENABLED=false to disable auth enforcement for deployments behind an external auth proxy.
All endpoints are mounted under /big-red-button.
| Method | Path | Description |
|---|---|---|
| GET | /api/failures?clear_window=1_hour&tags=my_tag |
Get failures (tags required, dag_id optional) |
| GET | /api/tags |
List all DAG tags (optional selected param marks active tags) |
| POST | /api/clear |
Clear failures (tags or dag_id required) |
| Method | Path | Description |
|---|---|---|
| GET | /api/admin/failures?clear_window=1_hour |
Get all failures (tags and dag_id optional) |
| POST | /api/admin/clear |
Clear failures (no tag/dag_id required — can clear all) |
Get failures:
curl "http://localhost:8080/big-red-button/api/admin/failures?clear_window=1_hour"Clear failures by tag:
curl -X POST "http://localhost:8080/big-red-button/api/clear" \
-H "Content-Type: application/json" \
-d '{"clear_window": "1_hour", "tags_filter": ["my_team"]}'Clear a specific DAG:
curl -X POST "http://localhost:8080/big-red-button/api/clear" \
-H "Content-Type: application/json" \
-d '{"clear_window": "1_hour", "dag_id": "etl_pipeline"}'| Variable | Default | Description |
|---|---|---|
BRB_AUTH_ENABLED |
true |
Enable JWT-based authentication on API endpoints. Set to false for deployments where Airflow runs behind an external auth proxy (e.g., with no_auth auth manager). When disabled, all endpoints are open with no permission checks and audit logs record the user as "anonymous". |
The plugin uses the following default settings (defined in big_red_button.py):
clear_windows = {
"1_hour": timedelta(hours=1),
"12_hours": timedelta(hours=12),
"1_day": timedelta(days=1),
"7_days": timedelta(days=7),
}
PAGE_SIZE = 200 # Tasks cleared per batch- Python 3.9+
- Node.js 18+
# Python setup
make setup
# UI setup and build
make ui-setup
make ui-build
# Run tests
make test
# Start UI dev server (hot reload, proxies to Airflow)
make ui-dev| Target | Description |
|---|---|
make setup |
Create venv and install Python dependencies |
make test |
Run tests |
make test-verbose |
Run tests with verbose output |
make test-coverage |
Run tests with coverage report |
make lint |
Run ruff linter |
make lint-fix |
Auto-fix lint issues |
make format |
Format code with ruff |
make ui-setup |
Install UI dependencies |
make ui-build |
Build UI bundle for production |
make ui-dev |
Start UI dev server with hot reload |
make clean |
Remove venv, node_modules, and build artifacts |
bigredbutton/
├── plugins/
│ └── big_red_button/
│ ├── big_red_button.py # Core backend logic and plugin registration
│ ├── api.py # FastAPI REST API
│ ├── auth.py # Authentication and RBAC dependencies
│ ├── static/ # Built UI bundle (generated by ui-build)
│ └── ui/ # React frontend source
│ ├── src/
│ │ ├── main.tsx
│ │ ├── App.tsx
│ │ ├── api.ts
│ │ └── styles.css
│ ├── package.json
│ └── vite.config.ts
├── tests/
│ ├── conftest.py
│ └── test_big_red_button.py
├── requirements.txt
├── requirements-dev.txt
└── Makefile
- Query: Finds DAG runs with at least one failed task within the time window (using
TaskInstance.last_heartbeat_at), then collects all failed and upstream-failed tasks from those runs - Filter: Optionally filters by DAG tags or specific DAG ID (only active, non-paused DAGs are included in tag-based filtering)
- Group: Groups failures by DAG for visualization
- Clear: Uses Airflow's built-in
clear_task_instances()in batches of 200 - Log: Records the operation to Airflow's audit log with the authenticated user identity