Controlled Natural Language to Structured YAML Translator
Part of the Assurance-as-Code platform for government policy automation.
The CNL Translator converts natural language policy statements into structured YAML format that can be compiled into executable rules. This enables policy professionals to author machine-readable policies without coding knowledge.
Natural Language Policy
↓
OpenAI GPT-4 (Translation Assistant)
↓
Structured YAML (Human-Readable)
↓
Validation & Human Review
↓
Deterministic Compilation to Rules (Rego/SHACL)
- ✅ Natural Language Input: Write policies in plain English
- ✅ Structured YAML Output: Human-readable, machine-parsable format
- ✅ Schema Validation: Automatic validation against defined schema
- ✅ CLI Interface: Simple command-line tool
- ✅ OpenAI Integration: Uses GPT-4 for intelligent translation
- ✅ Rich Output: Beautiful terminal formatting with syntax highlighting
- ✅ Examples: Built-in example policies for learning
cd "/Users/kelcey.swan/Documents/src/assurance-as-code translator"pip install -r requirements.txtOr install in development mode:
pip install -e .Create a .env file:
cp .env.example .envEdit .env and add your API key:
OPENAI_API_KEY=sk-your-actual-key-here
OPENAI_MODEL=gpt-4-turbo-preview
OPENAI_TEMPERATURE=0.1
Or set it in your shell:
export OPENAI_API_KEY='sk-your-actual-key-here'cnl-translator translate examples/example_policies.txtcnl-translator translate --text "Services must have a named owner and must not store data beyond the retention period."cnl-translator translate policy.txt -o output.yamlcnl-translator validate output.yamlcnl-translator examplescnl-translator configIf a service processes personal data for more than 1000 data subjects,
then a Data Protection Impact Assessment is required and must be approved
by the Data Protection Officer before the service goes live.
policy_id: dpia_threshold_requirement
version: 1.0.0
status: draft
last_updated: 2026-02-16
metadata:
title: DPIA Threshold Requirement
description: Determines when a DPIA is mandatory based on data subject count
entities:
service:
attributes:
- name: processes_personal_data
type: boolean
- name: data_subjects_count
type: integer
- name: go_live_date
type: date
rules:
- rule_id: dpia_required_threshold
when:
all_of:
- service.processes_personal_data == true
- service.data_subjects_count > 1000
then:
obligations:
- action: complete_assessment
assessment_type: Data Protection Impact Assessment
approved_by: Data Protection Officer
deadline: before service.go_live_dateThe translator outputs YAML conforming to this structure:
- policy_id: Unique identifier (snake_case)
- version: Semantic version (X.Y.Z)
- status: draft | active | deprecated
- metadata: Human-readable title and description
- entities: Entity definitions with typed attributes
- rules: List of conditional rules with when/then logic
- when: Conditions (all_of, any_of)
- then: Consequences (obligations, permissions, prohibitions)
- references: Links to standards, legislation, guidance
The translator automatically validates:
✅ Required fields (policy_id, version, rules) ✅ Type checking (boolean, integer, string, date) ✅ Structure conformance ✅ Business rules (non-empty rules, valid conditions)
Warnings are provided for:
- Conditional Logic: IF/THEN, AND/OR conditions
- Deontic Modalities: MUST (obligations), MAY (permissions), MUST NOT (prohibitions)
- Quantification: ALL, ANY, MORE THAN, LESS THAN
- Temporal Expressions: BEFORE, AFTER, WITHIN N DAYS/MONTHS/YEARS
- Entity Relationships: Service HAS owner, Team INCLUDES members
- Cross-References: Links to legislation, standards, guidance
Translate natural language policy to structured YAML
cnl-translator translate [INPUT_FILE] [OPTIONS]
Options:
--text, -t TEXT Policy text to translate (alternative to file)
--output, -o PATH Output file path (default: stdout)
--model, -m TEXT OpenAI model to use
--validate/--no-validate Validate output structure (default: validate)
--show-tokens/--no-show-tokens Show token usage (default: show)Validate a structured YAML policy file
cnl-translator validate YAML_FILEShow example policy translations
cnl-translator examplesShow current configuration
cnl-translator configcnl-translator/
├── cnl_translator/
│ ├── __init__.py
│ ├── cli.py # CLI interface
│ ├── translator.py # OpenAI API integration
│ ├── validator.py # Schema validation
│ ├── schema.py # Pydantic models
│ └── prompts.py # LLM system prompts
├── examples/
│ ├── example_policies.txt
│ └── example_outputs/
├── tests/
│ └── test_translator.py
├── requirements.txt
├── setup.py
├── .env.example
└── README.md
pytest tests/Using GPT-4 Turbo:
- ~$0.01 per translation (typical policy)
- Input: ~1000 tokens (system prompt + policy text)
- Output: ~800 tokens (structured YAML)
- Cost: ~$0.01 + $0.03 = $0.04 per translation
For production use, consider:
- Self-hosted smaller models (Phi-4, Llama 3.2) for ~$0.001 per translation
- Fine-tuning for improved accuracy on domain-specific terminology
- Test with real policies: Translate 20+ policies from your organization
- Evaluate accuracy: Track approval rate, edit rate, rejection rate
- Fine-tune prompts: Iterate based on common error patterns
- Add compilation step: Build YAML → Rego compiler
- Integrate with case management: Connect to your assurance platform
- Add YAML → Rego compilation
- Support for SHACL output format
- Web UI for side-by-side review
- Test scenario builder
- Git integration for version control
- Fine-tuned model for government policies
- Batch processing mode
- Interactive refinement mode
Make sure you've set your API key:
export OPENAI_API_KEY='sk-your-key-here'Or create a .env file with your key.
If you hit rate limits:
- Wait a few seconds and retry
- Use
--model gpt-3.5-turbofor lower-tier access - Consider upgrading your OpenAI plan
If translations are inaccurate:
- Try adding more context in your policy statement
- Use explicit deontic keywords (must, may, must not)
- Specify entities clearly (service, team, assessment)
- Check the examples for guidance on phrasing
This project is part of UK Government digital services. Contact the author for usage terms.
Author: Dr Kelcey Swain Project: Assurance-as-Code CNL Translator Version: 0.1.0 (Prototype)