Overview
To help LLM agents (like Claude Code) better understand and work with the nf-core stats data, we should document the database table schemas in a machine-readable format.
Current situation
- LLM agents working with this project need to understand the data structure
- Evidence.dev pages reference tables like
github_traffic_stats, github_contributor_stats, etc.
- SQL queries in
sources/nfcore_db/ and queries/ directories reference various table schemas
- No centralized schema documentation exists for automated tools
Proposed solution
Create comprehensive table schema documentation that includes:
1. Schema documentation file
- Create
docs/database-schemas.md or similar
- Document all main tables with column descriptions, types, and sample data
- Include relationships between tables
- Add notes about data collection frequency and sources
2. Key tables to document
From the existing SQL queries and pipeline code, prioritize:
github_traffic_stats (repository views/clones)
github_contributor_stats (contributor activity by week)
github_issue_stats (issues and pull requests)
nfcore_pipelines (repository metadata)
slack_messages (Slack channel activity)
slack_members (Slack membership stats)
org_members (GitHub organization members)
3. Machine-readable format considerations
- Use consistent markdown tables
- Include JSON schema definitions if helpful
- Consider adding dlt schema exports
- Make it easy for LLMs to parse and understand
Benefits
- LLM agents can write better SQL queries
- Faster development when creating new visualizations
- Better understanding of available data for new features
- Improved onboarding for developers
- Self-documenting codebase
Acceptance criteria
This will significantly improve the ability of Claude Code and other LLM agents to understand and work with the nf-core stats data effectively.
Overview
To help LLM agents (like Claude Code) better understand and work with the nf-core stats data, we should document the database table schemas in a machine-readable format.
Current situation
github_traffic_stats,github_contributor_stats, etc.sources/nfcore_db/andqueries/directories reference various table schemasProposed solution
Create comprehensive table schema documentation that includes:
1. Schema documentation file
docs/database-schemas.mdor similar2. Key tables to document
From the existing SQL queries and pipeline code, prioritize:
github_traffic_stats(repository views/clones)github_contributor_stats(contributor activity by week)github_issue_stats(issues and pull requests)nfcore_pipelines(repository metadata)slack_messages(Slack channel activity)slack_members(Slack membership stats)org_members(GitHub organization members)3. Machine-readable format considerations
Benefits
Acceptance criteria
This will significantly improve the ability of Claude Code and other LLM agents to understand and work with the nf-core stats data effectively.