Text2ODP is a reproducible Python pipeline that converts scientific text (e.g., paper abstracts) into Ontology Design Patterns (ODPs) using open-source LLMs.
Given a corpus of paper abstracts, Text2ODP performs:
- Scenario elicitation + competency question generation.
- Concept and relation extraction into a semantic graph.
- Ontology Design Pattern synthesis (classes, properties, axioms, Turtle fragment).
- Quantitative evaluation with publication-friendly metrics.
This repository is designed to be a strong starting point for scientific studies on LLM-assisted ontology engineering.
Paper abstracts (Semantic Scholar API)
↓
LLM step 1: scenario + competency questions (JSON)
↓
LLM step 2: concepts/relations/triples (JSON)
↓
LLM step 3: ODP synthesis (JSON + TTL snippet)
↓
Evaluation:
- lexical_coverage
- graph_density
- cq_answerability_proxy
- self_consistency
python -m venv .venv
source .venv/bin/activate
pip install -e .Optional LLM extras:
pip install -e .[llm]- Install Ollama and pull a model:
ollama pull llama3.1:8b
- Run pipeline:
text2odp --query "ontology engineering healthcare" --limit 10 --backend ollama --model llama3.1:8b
text2odp --query "knowledge graph construction" --limit 5 --backend transformers --model mistralai/Mistral-7B-Instruct-v0.3Generated under outputs/:
dataset.jsonl: collected papers and abstracts.artifacts.json: scenario/CQ/graph/ODP/evaluation per paper.evaluation.csv: paper-level metrics.evaluation_summary.json: aggregate metrics.
Current implemented metrics:
- Lexical coverage: overlap between extracted concepts and abstract lexicon.
- Graph density: structural richness of concept graph.
- CQ answerability proxy: proportion of CQs touching extracted concept/relation vocabulary.
- Self-consistency: overlap between generated ODP classes and extracted concepts.
For a publishable paper, extend with:
- Human expert annotation (inter-rater reliability, Cohen's/Fleiss' kappa).
- Baselines: rule-based IE, non-LLM ontology extraction, and alternative LLMs.
- Statistical significance tests (paired bootstrap / Wilcoxon signed-rank).
- Robustness checks across domains, model sizes, and prompt variants.
Use scripts/run_experiment.py to aggregate multiple run folders:
python scripts/run_experiment.py --results-root outputsExpected directory structure example:
outputs/
run_1/evaluation.csv
run_2/evaluation.csv
run_3/evaluation.csv
- Fix random seeds (if you add sampling-heavy components).
- Log exact model name and version.
- Archive prompts and all generated JSON artifacts.
- Report hardware setup and decoding parameters.
- Provide failure analysis and representative error cases.
- LLM-generated ontologies may contain hallucinations and domain bias.
- Never deploy ODPs in safety-critical domains without expert validation.
- Use this pipeline as a semi-automatic assistant, not a fully autonomous authority.