generated from dataforgoodfr/d4g-project-template
-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
Improve Project Documentation for Better Contributor Onboarding
Problem
Our documentation is fragmented and lacks a unified structure, making it difficult for new and existing contributors to understand how components interact. This leads to wasted time and stalled progress.
Key Issues:
- Minimal root README with no complete system overview
- Disconnected submodule documentation (rag_system, src, pipeline_scripts, taxonomy)
- No architecture diagram showing how modules interact
- Missing standardized onboarding guide for setup, testing, and deployment
Proposed Solution
1. Create Centralized Documentation Structure
Add a /docs folder as the single source of truth:
docs/
├── index.md # Project introduction and quick links
├── architecture/
│ ├── overview.md # High-level architecture + component diagram
│ ├── data-flow.md # How data moves through the system
│ └── components.md # Detailed component descriptions
├── getting-started/
│ ├── installation.md # Environment setup (poetry/uv)
│ ├── quick-start.md # Run your first pipeline
│ └── configuration.md # Settings and environment variables
├── guides/
│ ├── pdf-extraction.md # Using the PDF extraction module
│ ├── rag-system.md # Working with kotaemon RAG
│ ├── ingestion-pipeline.md # Running ingestion workflows
│ ├── taxonomy.md # Understanding taxonomies
│ └── scraping.md # OpenAlex data extraction
├── development/
│ ├── setup.md # Dev environment setup
│ ├── contributing.md # Contribution guidelines
│ ├── testing.md # Running tests
│ └── project-structure.md # Directory layout explanation
├── deployment/
│ ├── docker.md # Docker setup and compose
│ └── production.md # Production considerations
└── reference/
├── api.md # Code API reference
└── scripts.md # Available scripts and their usage
2. Add Architecture Diagram
Create a visual data flow diagram (Mermaid or image) showing:
- OpenAlex scraping → PDF extraction → Taxonomy classification → RAG ingestion
- How
src/,rag_system/,pipeline_scripts/, andtaxonomy/interact
3. Update Root README
Make it a lightweight entry point with:
- Project overview and mission
- Architecture diagram
- Quick start commands
- Links to full documentation in
/docs
4. Standardize Subproject READMEs
Each main directory (src/, rag_system/, pipeline_scripts/, taxonomy/) should include:
- Purpose and scope
- Key files/scripts
- Connection to other modules
- Usage examples
- Link back to
/docsfor detailed documentation
Expected Outcome
✅ Clear, unified documentation structure
✅ Faster contributor onboarding
✅ Easier project navigation and maintenance
✅ Better understanding of system architecture and data flow
Affected Areas
- Root
README.md - New
/docsfolder structure src/,rag_system/,pipeline_scripts/,taxonomy/READMEs
Metadata
Metadata
Assignees
Labels
No labels