|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +This is the nf-core stats dashboard project. It consists of two main components: |
| 8 | +1. **Evidence.dev frontend** - A data visualization dashboard for nf-core statistics |
| 9 | +2. **DLT data pipelines** - Python pipelines that collect data from GitHub, Slack, and Twitter APIs |
| 10 | + |
| 11 | +## Build Commands |
| 12 | + |
| 13 | +### Frontend (Evidence.dev) |
| 14 | +```bash |
| 15 | +npm install # Install dependencies |
| 16 | +npm run sources # Refresh data sources |
| 17 | +npm run dev # Start development server |
| 18 | +npm run build # Build for production |
| 19 | +npm run test # Run tests (builds the project) |
| 20 | +``` |
| 21 | + |
| 22 | +### Data Pipelines (Python/DLT) |
| 23 | +```bash |
| 24 | +cd pipeline |
| 25 | +uv run python github_pipeline.py # Run GitHub stats collection |
| 26 | +uv run python slack_pipeline.py # Run Slack stats collection |
| 27 | +``` |
| 28 | + |
| 29 | +## Architecture |
| 30 | + |
| 31 | +### Data Flow |
| 32 | +1. **Data Collection**: Python DLT pipelines (`pipeline/`) fetch data from external APIs (GitHub, Slack) |
| 33 | +2. **Data Storage**: Data is stored in MotherDuck (cloud DuckDB) database `nf_core_stats_bot` |
| 34 | +3. **Data Visualization**: Evidence.dev reads from MotherDuck and renders interactive dashboards |
| 35 | + |
| 36 | +### Key Directories |
| 37 | +- `pipeline/` - DLT data pipelines for collecting stats |
| 38 | +- `pages/` - Evidence.dev markdown pages with SQL queries and visualizations |
| 39 | +- `sources/` - Database connection configurations |
| 40 | +- `.github/workflows/` - GitHub Actions for daily pipeline runs and Netlify builds |
| 41 | + |
| 42 | +### Database Schema |
| 43 | +The MotherDuck database contains tables for: |
| 44 | +- `github_traffic_stats` - Repository views and clones |
| 45 | +- `github_contributor_stats` - Contributor activity by week |
| 46 | +- `github_issue_stats` - Issues and pull requests |
| 47 | +- `slack_messages` - Slack channel message counts |
| 48 | +- `slack_members` - Slack member statistics |
| 49 | + |
| 50 | +### Environment Variables |
| 51 | +Required secrets for pipelines (set in GitHub Actions or local `.env`): |
| 52 | +- `SOURCES__GITHUB_PIPELINE__GITHUB__API_TOKEN` - GitHub personal access token |
| 53 | +- `SOURCES__SLACK_PIPELINE__SLACK__API_TOKEN` - Slack user token |
| 54 | +- `DESTINATION__MOTHERDUCK__CREDENTIALS__DATABASE` - MotherDuck database name |
| 55 | +- `DESTINATION__MOTHERDUCK__CREDENTIALS__PASSWORD` - MotherDuck token |
| 56 | + |
| 57 | +## Development Notes |
| 58 | + |
| 59 | +- Evidence pages use SQL queries embedded in markdown to fetch data |
| 60 | +- The GitHub pipeline uses incremental loading with merge strategy to update existing records |
| 61 | +- Pipelines run daily via GitHub Actions and are monitored with runitor |
| 62 | +- The frontend is deployed to Netlify and rebuilt daily after pipeline runs |
0 commit comments