Build a conversational AI system that learns semantic definitions through natural dialogue and constructs a formal RDF ontology. The system should proactively extract meanings, definitions, and relationships from conversational statements, building a shared knowledge base collaboratively with the user. Start knowing nothing about the domain - only standard ontology foundations - and progressively learn what things mean through conversation.
The system should proactively extract definitions and meanings from natural conversational statements. When someone defines or describes something, capture that semantic knowledge without requiring formal statements.
- Extract definitions - "X is Y" or "X is how we Z" → Capture semantic relationship
- Proactively identify concepts - Recognize new terms and extract their meaning from context
- Ask clarifying questions - Only when definitions are vague, ambiguous, or missing
- Ground in standards when possible - Link to OWL-Time, org:, PROV-O, SKOS, Dublin Core, Schema.org for fallback
- Remember context - Once defined, use that definition consistently
- Allow refinement - Definitions can be extended or clarified in later conversation
- Handle semantic flexibility - Recognize when the same word has multiple valid meanings in different contexts (polysemy) and account for it by recognising that they are different entities but may have the same signifier.
This is about building shared understanding of what things mean, not translating rules or policies. Focus on:
- Definitions: What things are ("retention is how we measure...")
- Relationships: How things relate ("files have retention periods")
- Properties: What attributes things have ("files have a creation date")
- Taxonomies: How things categorize ("a document is a type of file")
Not about: Rules, obligations, requirements, policies (that's a different system)
Example - Definitional Extraction:
User: "retention is how we measure how long a file should be kept"
System: (proactively extracts)
✓ "retention" - concept related to duration measurement
✓ "file" - entity that can be retained
✓ relationship: retention applies to files
✓ temporal aspect: involves duration (time:Duration)
I understand "retention" is about duration for keeping files.
What is a "file" in your context?
User: "a file is a digital document"
System: ✓ file → digital document
✓ grounding in Dublin Core: dcterms:Document
Perfect! Adding:
:File a owl:Class ;
rdfs:subClassOf dcterms:Document ;
rdfs:label "File" ;
rdfs:comment "A digital document" .
:Retention a owl:Class ;
rdfs:label "Retention" ;
rdfs:comment "Measurement of how long a file should be kept" ;
:appliesToType :File ;
:measurementType time:Duration .
Got it. Continue...
User: "files can be documents or records"
System: (refining earlier definition)
✓ File has subtypes: Document, Record
Updating ontology:
:Document a owl:Class ;
rdfs:subClassOf :File .
:Record a owl:Class ;
rdfs:subClassOf :File .
Refined! File now has two subtypes. What else?
DO NOT treat the first definition as immutable truth. The same word can have multiple legitimate meanings in different contexts:
-
"file" could mean:
- Digital file (computer file like .docx, .pdf)
- Physical file (paper folder in a filing cabinet)
- Database record
- Tool for smoothing surfaces
-
"clearance" could mean:
- Security clearance (credential)
- Physical clearance (space measurement)
- Sale clearance (retail)
When a user mentions a term that already exists in the ontology, but uses it differently:
❌ WRONG - Rigid enforcement:
User: "we also track physical files in filing cabinets"
System: Error! File is defined as digital document (dcterms:Document)
✅ CORRECT - Flexible disambiguation:
User: "we also track physical files in filing cabinets"
System: I notice you mentioned "file" - I have that defined as a digital document (.docx, etc).
Are you talking about:
A) The same thing (digital files)
B) A different kind of file (physical filing cabinet files)
User: B - physical files are different
System: Got it! I'll create a distinction:
# Update ontology to handle both senses
:File a owl:Class ;
rdfs:label "File" ;
rdfs:comment "General concept - see specific types" .
:DigitalFile a owl:Class ;
rdfs:subClassOf :File, dcterms:Document ;
rdfs:label "Digital File" ;
rdfs:comment "Computer file (.docx, .pdf, etc.)" .
:PhysicalFile a owl:Class ;
rdfs:subClassOf :File ;
rdfs:label "Physical File" ;
rdfs:comment "Paper folder stored in filing cabinets" .
Now you have both digital and physical files! Which were you referring to?
When encountering a term that exists in ontology:
-
Check if usage matches existing definition
- Does context align with what's already defined?
- Are they talking about the same thing?
-
If usage conflicts, ask clarifying question:
- "I have [term] as [existing definition]. Are you talking about that, or something different?"
- Give user options: same concept, different concept, or refinement
-
If different, create disambiguation:
- Create qualified/scoped versions (:DigitalFile vs :PhysicalFile)
- Update original to be more general if needed
- Link both to appropriate standard ontologies
Ontology already has: :File → dcterms:Document (digital)
User: "physical files need retention periods too"
System: I have "file" as digital documents. Are you now talking about physical paper files?
User: Yes
System: Creating distinction:
:DigitalFile (for computer files)
:PhysicalFile (for paper files)
Both are types of :File
Which type needs retention periods - both?
Ontology has: :Record → database record
User: "legal records must be kept for 7 years"
System: I have "record" as a database record. Is "legal record" the same thing, or different?
User: Different - legal records are official documents
System: ✓ Creating separate concept:
:DatabaseRecord (existing - database entry)
:LegalRecord (new - official legal document, subclass of dcterms:Document)
Ontology has: :Clearance → security credential
User: "equipment needs 2 meters of clearance"
System: I have "clearance" as security credential. This sounds like you mean physical space/distance?
User: Yes, different meaning
System: Creating separate concepts:
:SecurityClearance (credential)
:PhysicalClearance (spatial measurement, links to qudt:Length)
Both are valid meanings of "clearance" in different contexts.
- Never assume first definition is only definition
- Context matters - same word, different domains → likely different meanings
- Ask don't assume - when usage doesn't match existing definition, clarify
- Disambiguate with qualifiers - DigitalFile vs PhysicalFile, not File1 vs File2
- Both can be correct - multiple senses are normal in natural language
- Update ontology structure - may need to make existing concept more general
- Ground each sense appropriately - digital file → dcterms:Document, physical file → maybe prov:Entity
# Pseudocode for handling term conflicts
if term exists in ontology:
if user_context matches existing_definition:
use_existing_concept()
else:
ask_user: "I have {term} as {existing}. Same thing or different?"
if different:
# Create qualified versions
create_specific_term(f"{qualifier}{term}") # e.g., DigitalFile
update_existing_to_general_if_needed()
ground_both_appropriately()
show_user_the_distinction()Important for implementations using RAG or MCP:
When retrieving ontology knowledge:
- Don't just match on term string
- Consider context and usage
- When conflict detected, trigger disambiguation flow
- Update ontology with new sense, don't overwrite
- Use qualified names in SPARQL queries if multiple senses exist
Example SPARQL that handles multiple senses:
# Instead of:
SELECT ?file WHERE { ?file a :File }
# Use qualified:
SELECT ?digitalFile WHERE { ?digitalFile a :DigitalFile }
SELECT ?physicalFile WHERE { ?physicalFile a :PhysicalFile }The system should recognize and ground concepts in these W3C standard ontologies:
- Namespace:
http://www.w3.org/2006/time# - Prefix:
time: - Use for: Dates, durations, temporal relations, intervals
- Key concepts:
time:Duration- Use ISO 8601 format (P7Y = 7 years, P12M = 12 months)time:Instant- Points in time- Temporal relations:
time:before,time:after,time:intervalDuring
- Always use ISO 8601 durations: "7 years" →
"P7Y", "12 months" →"P12M"
- Namespace:
http://www.w3.org/ns/org# - Prefix:
org: - Use for: Organizations, organizational units, roles, memberships
- Key concepts:
org:Organization- Companies, government departments, teamsorg:OrganizationalUnit- Divisions, departments, business unitsorg:Role- Job roles, responsibilitiesorg:Membership- Person's role in organization
- Namespace:
http://www.w3.org/ns/prov# - Prefix:
prov: - Use for: Activities, agents, entities, processes
- Key concepts:
prov:Activity- Processes, assessments, actionsprov:Agent- People, systems, organizationsprov:Entity- Documents, records, artifacts
- Namespace:
http://www.w3.org/2004/02/skos/core# - Prefix:
skos: - Use for: Concept schemes, taxonomies, controlled vocabularies
- Key concepts:
skos:Concept- Individual concepts in a schemeskos:ConceptScheme- Classification systemsskos:prefLabel,skos:altLabel- Labelingskos:broader,skos:narrower- Hierarchical relationships
- Namespace:
http://purl.org/dc/terms/ - Prefix:
dcterms: - Use for: Descriptive metadata, documentation
- Key concepts:
dcterms:title,dcterms:descriptiondcterms:creator,dcterms:created,dcterms:modifieddcterms:type,dcterms:subject
- Use for: Class/property definitions, basic relationships
- Key concepts:
rdfs:Class,owl:Class- Defining typesrdfs:subClassOf- Class hierarchiesrdfs:label,rdfs:comment- Human-readable annotationsowl:ObjectProperty,owl:DatatypeProperty- Relationships
- Python 3.9+
- RDFLib - For RDF manipulation and TTL serialization
- OpenAI API - For conversational interface (GPT-4 or Claude via API)
- No triplestore required - File-based TTL for persistence
- Proactively extracts definitions from user statements
- Multi-turn dialogue management
- Maintains conversation context and memory
- Asks clarifying questions only when definitions are unclear
- Coordinates between user input and ontology operations
- Shows what's being added to the ontology in real-time
- Loads standard ontologies (OWL-Time, org:, PROV-O, SKOS, Dublin Core)
- Checks if terms exist in loaded ontologies
- Generates unique IRIs for new concepts (human-readable, collision-free)
- Adds new triples to the growing domain ontology
- Handles refinement of existing definitions
- Serializes to TTL format after each addition
- Identifies definitional patterns ("X is Y", "X means Z", "X is how we...")
- Extracts entities, relationships, and properties from statements
- Recognizes taxonomic relationships (subclass, instance)
- Identifies attributes and their types
- Flags terms needing clarification
- Matches extracted concepts to standard ontologies
- Suggests which standard concepts apply (e.g., "duration" → time:Duration)
- Links domain concepts to foundation ontologies
- Validates semantic grounding
- Ensures every concept connects to formal semantics
- Pre-loads W3C standard vocabularies (hardcoded mappings)
- Provides lookup functions (e.g., "is 'Activity' a known concept?")
- Suggests relevant standard concepts during extraction
- Returns namespace URIs and definitions
conversational-ontology-builder/
├── README.md
├── requirements.txt
├── .env
├── ontology_builder/
│ ├── __init__.py
│ ├── conversation_agent.py # Main conversational interface
│ ├── ontology_manager.py # RDFLib ontology operations
│ ├── grounding_engine.py # Progressive grounding logic
│ ├── standard_ontologies.py # W3C vocab loader
│ ├── identifier_generator.py # Human-readable IRI generation
│ └── prompts.py # LLM system prompts
├── ontologies/
│ ├── domain_ontology.ttl # The growing domain ontology (starts empty)
│ └── standards/ # Downloaded standard ontologies (optional)
│ ├── time.ttl
│ ├── org.ttl
│ ├── prov.ttl
│ └── skos.ttl
└── tests/
└── test_grounding.py
START
↓
[User makes statement]
↓
[Extract concepts, definitions, relationships]
↓
For each extracted concept:
↓
[Check if already in ontology] ──────→ EXISTS ──→ [Use existing definition]
↓ NEW CONCEPT (or refine if more info)
[Extract definition from context]
↓
[Is definition clear?]
↓
├─→ YES ──→ [Ground in standard ontology] ──→ [Add to ontology] ──→ [Show user]
├─→ NO ──→ [Ask clarifying question] ──→ [Get more info] ──→ [Loop back]
└─→ PARTIAL ──→ [Add what we know, flag for refinement] ──→ [Continue]
↓
[Continue conversation]
The system should recognize definitional patterns in natural speech:
Pattern 1: "X is Y"
- "retention is a duration" → :Retention rdfs:subClassOf time:Duration
Pattern 2: "X is how we Y"
- "retention is how we measure storage time" → Extract concept + purpose
Pattern 3: "X means Y"
- "SC means Security Clearance" → skos:altLabel or rdfs:comment
Pattern 4: "X has Y"
- "files have creation dates" → Property: :hasCreationDate
Pattern 5: "X can be Y or Z"
- "files can be documents or records" → Subtypes: :Document, :Record
Pattern 6: "X is part of Y"
- "UKSV is part of Cabinet Office" → org:unitOf
User: retention is how we measure how long a file should be kept
System: (extracts definitions)
✓ "retention" - concept about measurement/duration
✓ "file" - entity mentioned
✓ relationship: retention measures duration for files
✓ temporal aspect: duration → time:Duration
I understand "retention" is about measuring storage duration for files.
What is a "file" in your context?
User: a file is a digital document
System: ✓ "digital document" → dcterms:Document (Dublin Core)
Perfect! Adding:
:File a owl:Class ;
rdfs:subClassOf dcterms:Document ;
rdfs:label "File" ;
rdfs:comment "A digital document" .
:Retention a owl:Class ;
rdfs:label "Retention" ;
rdfs:comment "Measurement of duration for keeping files" .
:hasRetentionPeriod a owl:DatatypeProperty ;
rdfs:label "has retention period" ;
rdfs:domain :File ;
rdfs:range time:Duration .
Got it! Continue telling me about your domain...
User: files can be documents or records
System: (refining earlier definition)
✓ File has subtypes: documents and records
Updating:
:Document a owl:Class ;
rdfs:subClassOf :File ;
rdfs:label "Document" .
:Record a owl:Class ;
rdfs:subClassOf :File ;
rdfs:label "Record" .
Refined! Files now have two subtypes. What else?
User: The Cabinet Office is a UK government department
System: (extracts)
✓ "Cabinet Office" - organization entity
✓ "government department" → org:Organization
✓ "UK" - geographic qualifier
Adding:
:CabinetOffice a org:Organization ;
rdfs:label "Cabinet Office" ;
rdfs:comment "UK government department" ;
dcterms:spatial :UnitedKingdom .
:UnitedKingdom a dcterms:Location ;
rdfs:label "United Kingdom" .
Done! Continue...
User: UKSV is part of the Cabinet Office. They handle security vetting.
System: (extracts two concepts)
✓ "UKSV" - new entity
✓ "part of" → org:unitOf
✓ Cabinet Office already exists ✓
✓ "handle security vetting" - indicates purpose/activity
Adding:
:UKSV a org:OrganizationalUnit ;
rdfs:label "UKSV" ;
org:unitOf :CabinetOffice ;
org:purpose :SecurityVetting .
:SecurityVetting a owl:Class ;
rdfs:subClassOf prov:Activity ;
rdfs:label "Security Vetting" ;
rdfs:comment "Process of checking backgrounds for security clearance" .
Great! UKSV is now linked to Cabinet Office with its purpose defined. More?
User: clearance means a person's security status
System: (extracts)
✓ "clearance" - concept
✓ "security status" - state or credential
✓ "person" - agent
✓ relationship: person has clearance
Should I model clearance as:
A) A type of credential/status (skos:Concept in a scheme)
B) A property that people have
Which makes more sense?
User: A credential that people have
System: ✓ Using SKOS for credential types
✓ person → prov:Agent
:SecurityClearanceScheme a skos:ConceptScheme ;
rdfs:label "Security Clearance Levels" .
:Clearance a owl:Class ;
rdfs:label "Security Clearance" ;
rdfs:comment "A person's security status credential" .
:hasClearance a owl:ObjectProperty ;
rdfs:domain prov:Agent ;
rdfs:range :Clearance .
Got it! What else about clearances?
User: SC clearance lasts for 10 years
System: (extracts)
✓ "SC clearance" - specific clearance type
✓ "10 years" → time:Duration, ISO 8601: "P10Y"
✓ "lasts for" → validity period property
Adding:
:SCClearance a :Clearance ;
skos:inScheme :SecurityClearanceScheme ;
skos:prefLabel "SC Clearance" ;
skos:notation "SC" ;
:validityPeriod "P10Y"^^xsd:duration .
:validityPeriod a owl:DatatypeProperty ;
rdfs:label "validity period" ;
rdfs:domain :Clearance ;
rdfs:range xsd:duration .
Perfect! SC Clearance now has 10-year validity (ISO 8601: P10Y).
User: actually, DV clearance also lasts 10 years
System: (extends pattern)
✓ Same duration as SC
✓ Adding another clearance type
:DVClearance a :Clearance ;
skos:inScheme :SecurityClearanceScheme ;
skos:prefLabel "DV Clearance" ;
skos:notation "DV" ;
:validityPeriod "P10Y"^^xsd:duration .
Added! Both SC and DV clearances have 10-year validity periods.
[Ontology already has: :File defined as digital document]
User: we need to track physical files too
System: (detects potential conflict)
I have "file" defined as digital documents (like .docx, .pdf).
Are you talking about:
A) The same thing (digital files)
B) Something different (physical paper files)
User: B - physical files in filing cabinets are different
System: ✓ Creating distinction to handle both meanings:
# Make File more general
:File a owl:Class ;
rdfs:label "File" ;
rdfs:comment "General concept of file - see specific types" .
# Original digital concept
:DigitalFile a owl:Class ;
rdfs:subClassOf :File, dcterms:Document ;
rdfs:label "Digital File" ;
rdfs:comment "Computer file (.docx, .pdf, etc.)" .
# New physical concept
:PhysicalFile a owl:Class ;
rdfs:subClassOf :File ;
rdfs:label "Physical File" ;
rdfs:comment "Paper documents in filing cabinets" .
Perfect! Now the ontology supports both digital and physical files.
Both are valid meanings of "file" in different contexts.
What else about physical files?
User: physical files need retention periods too
System: ✓ Already have :hasRetentionPeriod property
✓ Updating domain to include both file types
:hasRetentionPeriod a owl:DatatypeProperty ;
rdfs:label "has retention period" ;
rdfs:domain :File ; # Applies to all file types
rdfs:range xsd:duration .
Now both digital AND physical files can have retention periods!
- Human-readable - Should make sense when viewed in Protégé or TTL
- No collisions - Must be unique across the ontology
- Consistent - Same entity mentioned twice should get same IRI
- URI-safe - No spaces, special characters
def generate_iri(label: str, entity_type: str, namespace: str = "http://example.org/ontology#") -> str:
"""
Generate human-readable, collision-resistant IRI
Examples:
- "Cabinet Office" → http://example.org/ontology#CabinetOffice
- "Data Protection Impact Assessment" → http://example.org/ontology#DataProtectionImpactAssessment
- "has retention period" → http://example.org/ontology#hasRetentionPeriod
"""
# Clean and convert to CamelCase or camelCase
clean = ''.join(word.capitalize() for word in label.split())
# For properties, use camelCase
if entity_type == "property":
clean = clean[0].lower() + clean[1:]
# Check for collision in existing ontology
candidate = f"{namespace}{clean}"
if not iri_exists(candidate):
return candidate
# If collision, append hash of full label
hash_suffix = hashlib.md5(label.encode()).hexdigest()[:6]
return f"{namespace}{clean}_{hash_suffix}"@prefix : <http://example.org/ontology#> .
@prefix time: <http://www.w3.org/2006/time#> .
@prefix org: <http://www.w3.org/ns/org#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .The conversational agent should use a system prompt that:
You are a semantic knowledge extraction assistant. Your role is to help users build a formal RDF ontology by **proactively extracting definitions and meanings** from natural conversation.
## Your Approach: Proactive Definition Extraction
**Listen for definitional patterns** in user statements:
- "X is Y" → X is a type of Y (rdfs:subClassOf)
- "X is how we Y" → Extract X as concept with purpose Y
- "X means Y" → Definition or alternative label
- "X has Y" → Property relationship
- "X can be Y or Z" → Subtypes
- "X is part of Y" → Structural relationship
**Extract proactively** - don't wait to be asked. When someone says "retention is how we measure how long files are kept", immediately:
1. Identify concepts: retention, file, measurement, duration
2. Extract relationships: retention → measures → duration → for files
3. Ground in standards: duration → time:Duration, file → dcterms:Document
4. Build triples and show user
**Only ask questions when**:
- Definition is genuinely unclear or ambiguous
- Multiple grounding options exist (e.g., "Is X a type or instance?")
- Need to resolve which standard ontology to use
**Remember and use context**:
- Once "file" is defined, use that definition consistently
- When user refines ("files can be documents or records"), update the ontology
- Build on previous definitions without re-asking
**Handle semantic flexibility (CRITICAL)**:
- The same word can have multiple valid meanings in different contexts
- If a term exists but user's usage doesn't match, ask: "I have [term] as [existing definition]. Are you talking about that, or something different?"
- If different, create disambiguated concepts (e.g., :DigitalFile vs :PhysicalFile)
- NEVER rigidly enforce first definition as only definition
- Multiple senses are normal - "file" can be digital file OR physical file, both valid
## Your Responsibilities
1. **Proactively extract** definitions, concepts, and relationships from user statements
2. **Ground in standards** - link to OWL-Time, org:, PROV-O, SKOS, Dublin Core
3. **Generate valid RDF** using proper Turtle syntax
4. **Show your work** - display the triples you're creating
5. **Use ISO 8601** for all temporal durations (P7Y for 7 years, P12M for 12 months)
6. **Be conversational** - friendly tone, celebrate discoveries, encourage continuation
## Standard Ontologies You Know
- **OWL-Time (time:)** - temporal concepts, durations, relations (ISO 8601)
- **Organization Ontology (org:)** - organizations, units, roles, membership
- **PROV-O (prov:)** - activities, agents, entities, provenance
- **SKOS (skos:)** - concept schemes, labels, taxonomies
- **Dublin Core (dcterms:)** - metadata, documents, descriptions
- **RDFS/OWL** - classes, properties, hierarchies, annotations
## Definition Extraction Patterns
**Pattern: "X is Y"**
User: "retention is a measure of duration" → Extract: retention as concept, relates to duration (time:Duration)
**Pattern: "X is how we Y"**
User: "retention is how we measure how long files are kept" → Extract: retention (concept), file (entity), temporal measurement (time:Duration)
**Pattern: "X has Y"**
User: "files have creation dates" → Create property: :hasCreationDate, domain: :File, range: xsd:dateTime
**Pattern: "X can be Y or Z"**
User: "files can be documents or records" → Create subtypes: :Document and :Record as subclasses of :File
**Pattern: Refinement**
User already said: "a file is a digital document" User now says: "actually files can be records too" → Update ontology: add :Record as another subtype
## Output Format
Always show the Turtle triples you're generating:
:EntityName a owl:Class ;
rdfs:subClassOf standardOntology:Concept ;
rdfs:label "Human Label" ;
rdfs:comment "Description from context" .
Mark grounding with ✓: "✓ Using org:Organization"
Be conversational, precise, and encouraging! This is collaborative knowledge building.
- Load standard ontology definitions (hardcoded mappings to W3C specs)
- Initialize empty domain ontology (TTL file)
- Accept user input (natural conversational statement)
- Proactively extract definitions using pattern matching:
- "X is Y" patterns
- "X is how we..." patterns
- "X has Y" patterns
- "X can be Y or Z" patterns
- "X means Y" patterns
- For each extracted concept:
- Check if already in ontology (use existing or refine)
- Extract context and meaning from statement
- Identify grounding in standard ontologies
- Generate triples with grounding
- Append to domain_ontology.ttl
- Show user what was extracted and added
- Continue conversation naturally
- Detect temporal expressions proactively ("7 years", "12 months", "lasts for X")
- Always convert to ISO 8601 durations (P7Y, P12M, P30D)
- Ground in OWL-Time concepts (time:Duration, time:Instant)
- Create temporal properties as needed (validityPeriod, retentionPeriod)
- Handle temporal relations ("before X", "after Y") using time:before, time:after
- Detect relationships in natural statements proactively
- Common patterns:
- "X is part of Y" → org:unitOf or dcterms:isPartOf
- "X reports to Y" → org:reportsTo
- "X has Y" → create property :hasY
- "X handles Y" / "X does Y" → org:purpose or prov:activity
- Always ground properties in standard ontologies when possible
- Create custom properties only when no standard exists
- Maintain conversation context (remember what's been defined)
- When user refines earlier definition, update triples (don't duplicate)
- Handle additive refinements: "files can also be records"
- Handle corrections: "actually, X is really Y, not Z"
- Show what changed when refining
- Detect conflicting usage: When user mentions existing term but context doesn't match
- Ask clarifying question: "I have X as [definition]. Same thing or different?"
- Create disambiguated concepts if different:
- Qualified names: :DigitalFile vs :PhysicalFile (not File1, File2)
- Update original to be more general if needed
- Ground each sense in appropriate standard ontology
- Handle polysemy gracefully: Same word, multiple valid meanings = normal
- Test cases:
- "file" as digital document, then "physical files"
- "record" as database record, then "legal records"
- "clearance" as security credential, then "physical clearance"
- Validate generated Turtle syntax (parse with RDFLib)
- Check grounding coverage (every concept links to standard ontology)
- Validate ISO 8601 duration syntax
- Ensure all entities have rdfs:label and rdfs:comment
- Report ontology statistics after additions (X classes, Y properties, Z% grounded)
A successful implementation should:
✅ Start with zero domain knowledge (empty ontology file) ✅ Proactively extract definitions from natural conversational statements ✅ Recognize definitional patterns ("X is Y", "X is how we...", "X has Y") ✅ Ground all concepts in standard W3C ontologies (time:, org:, prov:, skos:, dcterms:) ✅ Generate valid Turtle/RDF syntax ✅ Use human-readable IRIs without collisions ✅ Handle temporal expressions with ISO 8601 (P7Y, P12M) ✅ Remember context and use definitions consistently ✅ Allow refinement of definitions ("files can also be records") ✅ Handle semantic flexibility - detect when same word has different meanings, disambiguate gracefully ✅ Never treat first definition as immutable - ask when usage doesn't match existing definition ✅ Persist growing ontology to TTL file after each addition ✅ Show user what's being extracted and added in real-time
✅ Every new class/property traces back to standard ontology ✅ Uses standard namespaces exclusively for grounding ✅ All temporal durations in ISO 8601 (P7Y, never "7 years") ✅ Consistent identifier generation (same term → same IRI across conversation) ✅ Valid RDF (parseable by RDFLib, viewable in Protégé) ✅ Human-readable IRIs (CamelCase for classes, camelCase for properties) ✅ All entities have rdfs:label and rdfs:comment ✅ No orphaned concepts (all grounded in standards)
✅ Proactive, not reactive - extracts without being explicitly asked ✅ Conversational and friendly tone ✅ Explains grounding decisions clearly ("✓ Using org:Organization") ✅ Shows generated triples after each extraction ✅ Asks questions only when genuinely unclear ✅ Remembers what's been defined (doesn't re-ask) ✅ Handles refinements gracefully ("updating earlier definition...") ✅ Encourages continuation ("What else?", "Tell me more...") ✅ Celebrates discoveries ("Perfect! Adding...", "Got it!")
System: Hi! I'm here to help you build a formal ontology through conversation.
I start knowing only standard W3C vocabularies (Organizations, Time, Provenance).
Tell me about your domain, and I'll ask questions to understand it properly.
User: The Cabinet Office is a UK government department
System: Great! Let me add that...
✓ "government department" → org:Organization (W3C Organization Ontology)
✓ "UK" → I'll add a geographic qualifier
Adding:
:CabinetOffice a org:Organization ;
rdfs:label "Cabinet Office" ;
rdfs:comment "UK government department" ;
dcterms:spatial :UnitedKingdom .
:UnitedKingdom a dcterms:Location ;
rdfs:label "United Kingdom" .
Done! What else can you tell me about the Cabinet Office?
User: UKSV is part of the Cabinet Office and handles security vetting
System: Let me unpack that...
✓ "part of" → org:unitOf
✓ "handles" suggests an activity/responsibility
What is UKSV - an organizational unit, team, or something else?
User: It's an organizational unit
System: Perfect!
✓ org:OrganizationalUnit
And "security vetting" - is that a process/activity they perform?
User: Yes, it's the process of checking people's backgrounds
System: ✓ Process → prov:Activity
Adding:
:UKSV a org:OrganizationalUnit ;
rdfs:label "UKSV" ;
org:unitOf :CabinetOffice ;
org:purpose :SecurityVetting .
:SecurityVetting a owl:Class ;
rdfs:subClassOf prov:Activity ;
rdfs:label "Security Vetting" ;
rdfs:comment "Process of checking people's backgrounds for security clearance" .
Excellent! Your ontology now has:
- 2 organizations (Cabinet Office, UKSV)
- 1 activity type (Security Vetting)
- All grounded in W3C standards ✓
What else?
User: People need SC clearance which lasts for 10 years
System: Let me break this down...
✓ "SC clearance" sounds like a credential or status - is that right?
User: Yes, it's a security clearance level
System: ✓ I'll model as skos:Concept in a clearance scheme
✓ "lasts for 10 years" → time:Duration with ISO 8601
Adding:
:SecurityClearanceScheme a skos:ConceptScheme ;
rdfs:label "Security Clearance Levels" .
:SCClearance a skos:Concept ;
skos:inScheme :SecurityClearanceScheme ;
skos:prefLabel "SC Clearance" ;
skos:notation "SC" ;
:validityPeriod "P10Y"^^xsd:duration . # ISO 8601: 10 years
:validityPeriod a owl:DatatypeProperty ;
rdfs:label "validity period" ;
rdfs:domain skos:Concept ;
rdfs:range time:Duration .
Done! SC Clearance is now in your ontology with a 10-year validity period (ISO 8601: P10Y).
Continue building?
Provide a simple CLI to run conversations:
# Start fresh conversation
python -m ontology_builder.cli --new-ontology domain.ttl
# Continue building existing ontology
python -m ontology_builder.cli --ontology domain.ttl
# Validate current ontology
python -m ontology_builder.cli --validate domain.ttl
# Show ontology stats
python -m ontology_builder.cli --stats domain.ttl- Definition extraction tests - Given statement "X is Y", extract correct relationship
- Pattern matching tests - Test each definitional pattern ("X is how we...", "X has Y")
- Grounding tests - Ensure concepts ground in correct standard ontology
- Identifier tests - Human-readable generation, collision detection, consistency
- Temporal tests - "7 years" → "P7Y", "12 months" → "P12M"
- Refinement tests - Update ontology when definition is refined, don't duplicate
- Context memory tests - Use earlier definitions consistently in later extractions
- Integration tests - Full conversation flows from examples
- Validation tests - All generated TTL parses correctly with RDFLib
- Mock LLM responses for deterministic testing
- Type hints throughout
- Docstrings with examples
- Logging of all ontology changes
- Version control the growing ontology
- Export to other formats (JSON-LD, RDF/XML)
- Import existing ontologies to extend
- SPARQL query interface
- Triplestore integration (Blazegraph, Oxigraph)
- Web UI for conversation
- Integration with policy translation system
# requirements.txt
rdflib>=7.0.0
openai>=1.0.0 # or anthropic if using Claude
python-dotenv>=1.0.0
click>=8.0.0 # for CLI
rich>=13.0.0 # for nice terminal output
pyyaml>=6.0.0
isodate>=0.6.0 # for ISO 8601 duration parsing/validation
-
Read this spec thoroughly - Understand the proactive extraction approach
-
Study standard ontologies - Review W3C specs for time:, org:, prov:, skos:, dcterms:
-
Analyze conversation examples - See how definitions are extracted from natural statements
-
Understand definitional patterns:
- "X is Y" → subclass relationship
- "X is how we Y" → concept with purpose
- "X has Y" → property creation
- "X can be Y or Z" → taxonomy building
-
Implement in this order: a. Standard ontology loader (hardcoded mappings to W3C specs) b. Identifier generator (human-readable, collision-free IRIs) c. Definition extractor (pattern matching for definitional statements) d. Grounding engine (map extracted concepts to standard ontologies) e. Ontology manager (RDFLib operations, TTL serialization) f. Conversational agent (LLM integration with proactive extraction prompts) g. CLI interface (conversation runner)
-
Test with examples - Run through the conversation examples in this doc
-
Validate outputs - Ensure generated TTL is valid RDF (parse with RDFLib)
- Proactive extraction, not reactive - Extract definitions from statements, don't wait to be asked
- Definitional focus, not policy - Building semantic knowledge, not translating rules
- File-based TTL, not triplestore - Simple persistence for now (can scale later)
- Standard ontologies hardcoded - Reference by namespace, no downloads needed
- ISO 8601 mandatory for temporal - All durations as P7Y, P12M, never free text
- Every concept grounds in standards - No orphaned entities, all trace to W3C ontologies
- Human-readable IRIs - CamelCase classes, camelCase properties, collision detection
- Conversational memory - Remember definitions, use consistently, allow refinement
- Semantic flexibility - Same word can have multiple valid meanings (polysemy), disambiguate when needed
- Never rigid definitions - First definition isn't gospel, ask when usage conflicts
- LLM as knowledge mediator - Bridges natural language ↔ formal semantics
- Zero assumed knowledge - Start from empty ontology, build collaboratively
The system should feel like a collaborative conversation with a semantic expert who helps you formalize your domain knowledge, not a form-filling exercise or question-answer bot.
Key principles:
- Listen proactively for definitions
- Extract meaning without requiring formal language
- Ground in established standards for interoperability
- Remember context across the conversation
- Allow natural refinement and correction
- Embrace semantic flexibility - words can have multiple valid meanings
- Never treat first definition as immutable truth
- Disambiguate gracefully when conflicts arise
- Show what's being built in real-time
- Encourage exploration and elaboration
This specification is complete and actionable. A new Claude Code instance should be able to implement the conversational ontology builder from this document.
Focus: Semantic knowledge extraction through natural dialogue, not policy translation.