Skip to content

cabinetoffice/assurance-as-code-translator

 
 

Repository files navigation

CNL Translator

Controlled Natural Language to Structured YAML Translator

Part of the Assurance-as-Code platform for government policy automation.

Overview

The CNL Translator converts natural language policy statements into structured YAML format that can be compiled into executable rules. This enables policy professionals to author machine-readable policies without coding knowledge.

Architecture

Natural Language Policy
    ↓
OpenAI GPT-4 (Translation Assistant)
    ↓
Structured YAML (Human-Readable)
    ↓
Validation & Human Review
    ↓
Deterministic Compilation to Rules (Rego/SHACL)

Features

  • Natural Language Input: Write policies in plain English
  • Structured YAML Output: Human-readable, machine-parsable format
  • Schema Validation: Automatic validation against defined schema
  • CLI Interface: Simple command-line tool
  • OpenAI Integration: Uses GPT-4 for intelligent translation
  • Rich Output: Beautiful terminal formatting with syntax highlighting
  • Examples: Built-in example policies for learning

Installation

1. Clone or download the project

cd "/Users/kelcey.swan/Documents/src/assurance-as-code translator"

2. Install dependencies

pip install -r requirements.txt

Or install in development mode:

pip install -e .

3. Configure your OpenAI API key

Create a .env file:

cp .env.example .env

Edit .env and add your API key:

OPENAI_API_KEY=sk-your-actual-key-here
OPENAI_MODEL=gpt-4-turbo-preview
OPENAI_TEMPERATURE=0.1

Or set it in your shell:

export OPENAI_API_KEY='sk-your-actual-key-here'

Usage

Translate a policy from file

cnl-translator translate examples/example_policies.txt

Translate direct text

cnl-translator translate --text "Services must have a named owner and must not store data beyond the retention period."

Save output to file

cnl-translator translate policy.txt -o output.yaml

Validate an existing YAML file

cnl-translator validate output.yaml

View example policies

cnl-translator examples

Check configuration

cnl-translator config

Example

Input (Natural Language)

If a service processes personal data for more than 1000 data subjects,
then a Data Protection Impact Assessment is required and must be approved
by the Data Protection Officer before the service goes live.

Output (Structured YAML)

policy_id: dpia_threshold_requirement
version: 1.0.0
status: draft
last_updated: 2026-02-16

metadata:
  title: DPIA Threshold Requirement
  description: Determines when a DPIA is mandatory based on data subject count

entities:
  service:
    attributes:
      - name: processes_personal_data
        type: boolean
      - name: data_subjects_count
        type: integer
      - name: go_live_date
        type: date

rules:
  - rule_id: dpia_required_threshold
    when:
      all_of:
        - service.processes_personal_data == true
        - service.data_subjects_count > 1000
    then:
      obligations:
        - action: complete_assessment
          assessment_type: Data Protection Impact Assessment
          approved_by: Data Protection Officer
          deadline: before service.go_live_date

Structured Format Schema

The translator outputs YAML conforming to this structure:

  • policy_id: Unique identifier (snake_case)
  • version: Semantic version (X.Y.Z)
  • status: draft | active | deprecated
  • metadata: Human-readable title and description
  • entities: Entity definitions with typed attributes
  • rules: List of conditional rules with when/then logic
    • when: Conditions (all_of, any_of)
    • then: Consequences (obligations, permissions, prohibitions)
  • references: Links to standards, legislation, guidance

Validation

The translator automatically validates:

✅ Required fields (policy_id, version, rules) ✅ Type checking (boolean, integer, string, date) ✅ Structure conformance ✅ Business rules (non-empty rules, valid conditions)

Warnings are provided for:

⚠️ Non-standard naming conventions ⚠️ Missing metadata ⚠️ Incorrect version format

Supported Policy Patterns

  1. Conditional Logic: IF/THEN, AND/OR conditions
  2. Deontic Modalities: MUST (obligations), MAY (permissions), MUST NOT (prohibitions)
  3. Quantification: ALL, ANY, MORE THAN, LESS THAN
  4. Temporal Expressions: BEFORE, AFTER, WITHIN N DAYS/MONTHS/YEARS
  5. Entity Relationships: Service HAS owner, Team INCLUDES members
  6. Cross-References: Links to legislation, standards, guidance

Command Reference

translate

Translate natural language policy to structured YAML

cnl-translator translate [INPUT_FILE] [OPTIONS]

Options:
  --text, -t TEXT         Policy text to translate (alternative to file)
  --output, -o PATH       Output file path (default: stdout)
  --model, -m TEXT        OpenAI model to use
  --validate/--no-validate  Validate output structure (default: validate)
  --show-tokens/--no-show-tokens  Show token usage (default: show)

validate

Validate a structured YAML policy file

cnl-translator validate YAML_FILE

examples

Show example policy translations

cnl-translator examples

config

Show current configuration

cnl-translator config

Development

Project Structure

cnl-translator/
├── cnl_translator/
│   ├── __init__.py
│   ├── cli.py              # CLI interface
│   ├── translator.py       # OpenAI API integration
│   ├── validator.py        # Schema validation
│   ├── schema.py           # Pydantic models
│   └── prompts.py          # LLM system prompts
├── examples/
│   ├── example_policies.txt
│   └── example_outputs/
├── tests/
│   └── test_translator.py
├── requirements.txt
├── setup.py
├── .env.example
└── README.md

Running Tests

pytest tests/

Cost Estimation

Using GPT-4 Turbo:

  • ~$0.01 per translation (typical policy)
  • Input: ~1000 tokens (system prompt + policy text)
  • Output: ~800 tokens (structured YAML)
  • Cost: ~$0.01 + $0.03 = $0.04 per translation

For production use, consider:

  • Self-hosted smaller models (Phi-4, Llama 3.2) for ~$0.001 per translation
  • Fine-tuning for improved accuracy on domain-specific terminology

Next Steps

  1. Test with real policies: Translate 20+ policies from your organization
  2. Evaluate accuracy: Track approval rate, edit rate, rejection rate
  3. Fine-tune prompts: Iterate based on common error patterns
  4. Add compilation step: Build YAML → Rego compiler
  5. Integrate with case management: Connect to your assurance platform

Roadmap

  • Add YAML → Rego compilation
  • Support for SHACL output format
  • Web UI for side-by-side review
  • Test scenario builder
  • Git integration for version control
  • Fine-tuned model for government policies
  • Batch processing mode
  • Interactive refinement mode

Troubleshooting

"OpenAI API key must be provided"

Make sure you've set your API key:

export OPENAI_API_KEY='sk-your-key-here'

Or create a .env file with your key.

Rate limiting errors

If you hit rate limits:

  • Wait a few seconds and retry
  • Use --model gpt-3.5-turbo for lower-tier access
  • Consider upgrading your OpenAI plan

Translation quality issues

If translations are inaccurate:

  • Try adding more context in your policy statement
  • Use explicit deontic keywords (must, may, must not)
  • Specify entities clearly (service, team, assessment)
  • Check the examples for guidance on phrasing

License

This project is part of UK Government digital services. Contact the author for usage terms.

Contact

Author: Dr Kelcey Swain Project: Assurance-as-Code CNL Translator Version: 0.1.0 (Prototype)

About

No description, website, or topics provided.

Resources

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%