Contributing to build-kg

Thanks for considering contributing to build-kg! This project turns any topic into a structured knowledge graph on your own PostgreSQL, and there are many ways to help -- from adding domain profiles to improving the pipeline itself.

Ways to Contribute

Contribution	Difficulty	Impact
Add a domain profile for your industry	Low	High -- unlocks a new domain with pre-built ontology
Add ID extraction patterns for a jurisdiction	Low	Medium -- improves provision ID quality
Fix a bug	Varies	High
Improve documentation	Low	Medium
Add a new jurisdiction	Low	Medium -- expands country support
Improve chunking or parsing	Medium-High	High -- better graph quality
Improve ontology generation	Medium	High -- better auto-generated graph structures

Development Setup

# 1. Fork and clone the repository
git clone https://github.com/YOUR_USERNAME/build-kg.git
cd build-kg

# 2. Install all dependencies (venv, packages, browser)
make setup

# 3. Start the database
docker compose -f db/docker-compose.yml up -d

# 6. Configure environment
cp .env.example .env
# Edit .env with your settings

# 7. Initialize the graph
python -m build_kg.setup_graph

# 8. Run tests to verify
pytest tests/ -v

Project Structure

build-kg/
├── src/build_kg/          # Main package
│   ├── config.py          # Configuration from .env
│   ├── crawl.py           # Web crawler (Crawl4AI)
│   ├── chunk.py           # Document chunker (Unstructured)
│   ├── load.py            # Database loader
│   ├── parse.py           # Sync parser (Anthropic or OpenAI)
│   ├── parse_batch.py     # Batch parser (Batch API)
│   ├── setup_graph.py     # AGE graph setup
│   ├── verify.py          # Setup verification
│   ├── id_extractors.py   # Regex-based ID extraction
│   ├── domain.py          # Domain profile system
│   └── domains/           # YAML domain profiles
│       ├── default.yaml
│       ├── food-safety.yaml
│       ├── financial-aml.yaml
│       └── data-privacy.yaml
├── db/                    # Database setup
├── docs/                  # Documentation
├── examples/              # Example manifests and data
├── tests/                 # Test suite
├── AGENTS.md              # OpenAI Codex instructions
├── .claude/skills/        # Claude Code / Kiro / Qoder / Antigravity skill
│   └── build-kg/
│       └── SKILL.md       # /build-kg skill definition (Agent Skills standard)
├── .github/copilot-instructions.md  # GitHub Copilot instructions
├── .cursor/rules/build-kg.mdc      # Cursor rules
└── .windsurf/rules/build-kg.md     # Windsurf rules

Code Style

We use Ruff for linting:

# Check for issues
ruff check src/ tests/

# Auto-fix what's possible
ruff check --fix src/ tests/

Line length: 120 characters
Rules: E (pycodestyle errors), F (pyflakes), I (isort), W (pycodestyle warnings)

Running Tests

# Run all tests
pytest tests/ -v

# Run a specific test file
pytest tests/test_id_extractors.py -v

# Run domain profile tests
pytest tests/test_domain.py -v

Tests are designed to run without a database connection. Integration tests that require a running database are skipped automatically if the DB is not available.

Pull Request Process

Fork the repository and create a feature branch from main
Make your changes and ensure tests pass
Run the linter: ruff check src/ tests/
Write tests for new functionality
Submit a PR with a clear description of what changed and why

Adding a New Domain Profile

This is one of the highest-impact contributions you can make. Each new profile unlocks build-kg for an entirely new domain with a pre-built ontology.

Create src/build_kg/domains/your-domain.yaml using food-safety.yaml as a template
Set extends: default to inherit base configuration
Define domain-specific configuration:

Ontology -- the graph structure:
- ontology.nodes -- node types with labels, descriptions, and properties
- ontology.edges -- edge types with source/target labels and descriptions
- ontology.root_node -- primary node type that maps 1:1 to source fragments
- ontology.json_schema -- expected LLM output JSON format
Parsing -- what the LLM extracts:
- parsing.requirement_types -- e.g., [consent, data_processing, breach_notification] for privacy
- parsing.target_signal_examples -- e.g., [data.retention_period, consent.mechanism]
- parsing.scope_examples -- e.g., [data_controller, data_processor, data_subject]
ID Patterns -- regex for domain-specific IDs:
- id_patterns.patterns -- e.g., GDPR Article patterns, FATF Recommendation patterns
- id_patterns.authority_priorities -- which patterns to try first for each authority
Discovery -- how the /build-kg skill finds sources:
- discovery.search_templates -- search queries for finding sources
- discovery.sub_domains -- checklist of sub-areas to cover
Add tests to tests/test_domain.py to verify the profile loads correctly
Update the profile table in README.md
Test with /build-kg <your topic> using your new domain profile (set DOMAIN=your-domain in .env)

Example profile ideas:

pharma -- FDA drug regulations, clinical trial requirements, GMP
environmental -- EPA regulations, emissions standards, waste management
telecom -- FCC rules, spectrum licensing, net neutrality
construction -- building codes, safety standards, permits
aviation -- FAA regulations, airworthiness, pilot licensing
maritime -- IMO conventions, port state control, SOLAS

Adding a New Jurisdiction

The jurisdiction field is a freeform TEXT column in the database, so no schema changes are needed to support a new country. To add support for a new jurisdiction:

Add authority-specific regex patterns to src/build_kg/id_extractors.py if the jurisdiction uses a unique ID format
Update the jurisdiction list in .claude/skills/build-kg/SKILL.md

Adding ID Extraction Patterns

To add new regex patterns for regulatory ID formats:

Edit src/build_kg/id_extractors.py

Add patterns to ProvisionIDExtractor.PATTERNS:

'your_pattern_name': re.compile(r'your_regex_here'),

Add authority mapping to AUTHORITY_PATTERNS:

'Authority Name': ['your_pattern_name', 'other_patterns'],

Add format rules to ProvisionIDValidator.FORMAT_RULES (optional)
Add test cases to tests/test_id_extractors.py

Reporting Issues

When reporting a bug, please include:

What happened: Describe the error or unexpected behavior
What you expected: What should have happened instead
How to reproduce: Step-by-step commands to reproduce the issue
Environment: Python version, OS, Docker version
Error output: Full traceback or error message

Code of Conduct

We follow the Contributor Covenant Code of Conduct. Be kind, be respectful, be constructive.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing to build-kg

Ways to Contribute

Development Setup

Project Structure

Code Style

Running Tests

Pull Request Process

Adding a New Domain Profile

Adding a New Jurisdiction

Adding ID Extraction Patterns

Reporting Issues

Code of Conduct

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing to build-kg

Ways to Contribute

Development Setup

Project Structure

Code Style

Running Tests

Pull Request Process

Adding a New Domain Profile

Adding a New Jurisdiction

Adding ID Extraction Patterns

Reporting Issues

Code of Conduct