DRL Generation Tests

A comprehensive testing framework for AI-generated Drools rules. This project validates that local AI models (primarily Ollama) can generate valid DRL (Drools Rule Language) from natural language requirements.

Overview

This framework:

Tests AI models' ability to generate valid DRL rules from business requirements
Compares different models (Qwen, Granite, DeepSeek, etc.) on identical scenarios
Supports multiple generation approaches: Direct DRL, YAML-to-DRL, and Plain English
Validates generated rules through compilation and execution
Provides a web UI for interactive rule generation

Project Structure

drools-drl-generation-tests/
├── drlgen-core/           # Reusable DRL generation library
├── drlgen-tests/          # Test framework with scenarios and comparison tools
├── drlgen-ui/             # Quarkus web UI for interactive generation
├── models.yaml            # Model configurations (temperatures, context sizes, etc.)
└── pom.xml                # Parent Maven POM

Modules

Module	Description
`drlgen-core`	Reusable library: DRL generation agent, validation, execution
`drlgen-tests`	Test scenarios, comparison runners, integration tests
`drlgen-ui`	Quarkus REST API and web UI for interactive DRL generation

Prerequisites

Java 17+
Maven 3.8+
Ollama running locally with desired models
Drools (uses snapshot version 999-SNAPSHOT)

Quick Start

1. Run unit tests (no Ollama required)

mvn test -pl drlgen-tests

2. Run integration tests (requires Ollama)

# Start Ollama with a model
ollama run qwen3-coder-next

# Run integration tests
mvn test -pl drlgen-tests -Pintegration

# Run with a specific model
mvn test -pl drlgen-tests -Pintegration -Dtest.ollama.model=granite4:small-h

3. Start the Web UI

cd drlgen-ui
mvn quarkus:dev

Access the UI at http://localhost:8080

Configuration

models.yaml

Configure AI models in the project root models.yaml:

defaultModel: qwen3-coder-next

ollama:
  baseUrl: http://localhost:11434
  timeoutMinutes: 5

models:
  - name: qwen3-coder-next
    temperature: 0.1
    topP: 0.9
    numPredict: 2048
    numCtx: 16384
    repeatPenalty: 1.1

  # Multiple configs for same model with different parameters
  - id: granite4-temp0
    name: granite4:small-h
    temperature: 0.0

  # Specify agent type per model
  - id: qwen2.5-coder-simple
    name: qwen2.5-coder:14b-instruct-q4_K_M
    agentType: SIMPLE

Agent Types

Models can use different agent types, configured via agentType in models.yaml:

Agent Type	Description	Parameters	Best For
`GUIDED` (default)	Concise system prompt with external DRL reference guide	drlGuide, requirement, factTypes, exampleInput	Most models
`SIMPLE`	Self-contained detailed system prompt, no external guide	requirement, factTypes, testScenario	Models that perform better with all instructions inline

When to use SIMPLE:

Model struggles with long context (guide + requirement)
Model performs better with self-contained prompts
Testing showed qwen2.5-coder achieves 93%+ with SIMPLE vs 69% with GUIDED

Example configuration:

models:
  - id: qwen2.5-coder-simple
    name: qwen2.5-coder:14b-instruct-q4_K_M
    agentType: SIMPLE    # Use self-contained prompt

  - name: qwen3-coder-next
    # agentType defaults to GUIDED

Environment Variables

Variable	Description	Default
`TEST_OLLAMA_MODEL`	Ollama model name/id	`qwen3-coder-next`
`TEST_OLLAMA_BASE_URL`	Ollama server URL	`http://localhost:11434`

System Properties

-Dtest.ollama.model=granite4:small-h
-Dtest.ollama.baseUrl=http://localhost:11434

Test Scenarios

Test scenarios are defined in YAML files under drlgen-tests/src/main/resources/scenarios/:

name: Order Discount - Basic
description: Apply 10% discount for orders over $100

requirement: |
  Create a rule that:
  1. Checks if Order total is greater than 100.0
  2. If yes, sets the 'discountPercent' field to 10
  3. Sets the 'discountApplied' field to true

factTypes:
  - name: Order
    fields:
      orderId: String
      total: double
      discountPercent: int
      discountApplied: boolean

testCases:
  - name: Qualifying Order
    input:
      - _type: Order
        orderId: ORD001
        total: 150.0
        discountPercent: 0
        discountApplied: false
    expectedRulesFired: 1
    expectedFields:
      discountApplied: true
      discountPercent: 10

Available Scenarios

Category	Scenarios
Validation	adult-validation, age-classification, age-validation-simple, email-validation, password-strength
Discounts	basic-discount, bulk-order-discount, order-discount-customer-type, tiered-discount
Financial	credit-card-approval, insurance-premium, insurance-risk-assessment, loan-approval-complex, loan-eligibility, tax-bracket, tax-calculation-chain
Orders	cart-validation, inventory-alert, order-fraud-detection, shipping-cost, shipping-method-priority
Misc	grade-calculation, loyalty-points, missing-document-alert, priority-assignment, senior-citizen, simple-calculation, status-transition, subscription-renewal

Generation Approaches

1. Direct DRL Generation

AI generates DRL syntax directly from requirements.

DRLGenerationService service = new DRLGenerationService();
ChatModel model = ModelConfiguration.createModel("qwen3-coder-next");
GenerationResult result = service.generateAndValidate(model, requirement, factTypes);

2. YAML-to-DRL Generation

AI generates rules in YAML format, then converts to DRL.

YAMLRuleGenerationService service = new YAMLRuleGenerationService();
YAMLGenerationResult result = service.generateAndValidate(model, requirement, factTypes);

3. Plain English Generation

AI describes rules in structured English, then converts to DRL.

PlainEnglishGenerationService service = new PlainEnglishGenerationService();
PlainEnglishGenerationResult result = service.generateAndValidate(model, requirement, factTypes);

Supported Models

Model	ID	Parameters	Best For
Qwen 3 Coder Next	`qwen3-coder-next`	79.7B	DRL generation (100% success)
Granite 4 Small-H	`granite4:small-h`	32.2B	DRL generation (100% success)
Qwen 2.5 Coder 14B	`qwen2.5-coder:14b-instruct-q4_K_M`	14.8B	DRL generation (100% success)
Qwen 3	`qwen3`	8.2B	DRL generation (88% success)
DeepSeek Coder V2	`deepseek-coder-v2:16b`	16B	Code generation
Phi 4	`phi4`	14B	General purpose
CodeLlama 13B	`codellama:13b`	13B	Code generation
Granite 4	`granite4`	3.4B	Plain English approach

Web UI API

Endpoints

Method	Path	Description
GET	`/api/generate/models`	List available models
POST	`/api/generate/drl`	Generate DRL from requirements

Example Request

curl -X POST http://localhost:8080/api/generate/drl \
  -H "Content-Type: application/json" \
  -d '{
    "modelName": "qwen3-coder-next",
    "requirement": "If person age >= 18, set adult to true",
    "factTypes": [
      {
        "name": "Person",
        "fields": {
          "name": "String",
          "age": "int",
          "adult": "boolean"
        }
      }
    ]
  }'

Running Tests

Unit Tests (Fast, No LLM Required)

# Run unit tests only (~13 seconds)
mvn test -pl drlgen-tests

Integration Tests (Requires Ollama)

# Run LLM integration tests
mvn test -pl drlgen-tests -Pintegration

# Run specific integration test class
mvn test -pl drlgen-tests -Pintegration -Dtest=DRLGenerationTest

# With specific model
mvn test -pl drlgen-tests -Pintegration -Dtest.ollama.model=granite4:small-h

Model Comparison

# Compare all configured models
mvn test -pl drlgen-tests -Pmodel-comparison -Dtest=ModelComparisonTest

# Compare DRL vs YAML formats
mvn test -pl drlgen-tests -Pall-tests -Dtest=FormatComparisonTest

Test Profiles

Profile	Description	Speed
(default)	Unit tests only (no LLM required)	~13 seconds
`integration`	LLM integration tests (requires Ollama)	Minutes
`model-comparison`	Multi-model comparison tests	Very slow
`all-tests`	All tests including slow/integration	Slow

# Fast unit tests (default)
mvn test -pl drlgen-tests

# Integration tests (requires Ollama running)
mvn test -pl drlgen-tests -Pintegration

# All tests
mvn test -pl drlgen-tests -Pall-tests

Running the Comparison Runner

The ComparisonRunner is a standalone tool that runs all test scenarios against multiple models and generates a CSV report. This is the main entry point for batch testing.

Running as Standalone JAR (Recommended for Distribution)

Build and run as a self-contained executable JAR with all dependencies included:

# Build the executable JAR
mvn package -pl drlgen-tests -DskipTests

# Show help
java -jar drlgen-tests/target/drlgen-tests-1.0.0-SNAPSHOT.jar --help

# Run all scenarios with all models
java -jar drlgen-tests/target/drlgen-tests-1.0.0-SNAPSHOT.jar

# Run specific models
java -jar drlgen-tests/target/drlgen-tests-1.0.0-SNAPSHOT.jar --models qwen3-coder-next,granite4:small-h

# Filter scenarios and output to custom file
java -jar drlgen-tests/target/drlgen-tests-1.0.0-SNAPSHOT.jar --scenarios discount --output discount-results.csv

# Use custom scenarios directory with domain instructions
java -jar drlgen-tests/target/drlgen-tests-1.0.0-SNAPSHOT.jar \
    --scenarios-dir ./my-scenarios \
    --instructions ./domain-guide.md \
    --output-dir ./my-results

Prerequisites for JAR usage:

Java 17+ installed
Ollama running locally (ollama serve)
Required models pulled (ollama pull qwen3-coder-next)

Using the Shell Script

cd drlgen-tests

# Run all models on all scenarios (DRL + YAML formats)
./run-comparison.sh

# Test specific models
./run-comparison.sh --models qwen3-coder-next,granite4:small-h

# Filter scenarios by name
./run-comparison.sh --scenarios adult,discount

# Output to custom CSV file
./run-comparison.sh --output my-results.csv

# Test only specific formats
./run-comparison.sh --drl-only
./run-comparison.sh --yaml-only
./run-comparison.sh --english-only

# Test multiple formats
./run-comparison.sh --formats DRL,YAML,ENGLISH

Using Maven Directly

# Run with default settings (all models, all scenarios)
mvn -pl drlgen-tests exec:java \
    -Dexec.mainClass="com.github.rikkola.drlgen.generation.runner.ComparisonRunner"

# With arguments
mvn -pl drlgen-tests exec:java \
    -Dexec.mainClass="com.github.rikkola.drlgen.generation.runner.ComparisonRunner" \
    -Dexec.args="--models qwen3-coder-next --formats DRL,YAML --output results.csv"

Command Line Options

Option	Description	Default
`--models <list>`	Comma-separated model names/ids	All from models.yaml
`--scenarios <list>`	Filter scenarios by name substring	All scenarios
`--output <file>`	CSV output filename	`comparison-results.csv`
`--formats <list>`	Formats to test: DRL, YAML, ENGLISH	DRL, YAML
`--max-turns <n>`	Retry count (2+ enables retry with guard instruction on loop/multi-fire failures)	1
`--drl-only`	Only test DRL generation	-
`--yaml-only`	Only test YAML generation	-
`--english-only`	Only test Plain English pipeline	-

Output

The runner produces:

Console output - Real-time progress with pass/fail status
CSV report - Detailed results with model, scenario, format, success status, rules fired, and timing

Example console output:

Starting Drools Rule Generation Comparison
Models: 3
Scenarios: 29
Formats: DRL, YAML

=== Testing model: qwen3-coder-next ===
[1/174] qwen3-coder-next - Adult Validation (DRL)... PASS (1 rules)
[2/174] qwen3-coder-next - Adult Validation (YAML)... PASS (1 rules)
...

Adding New Scenarios

Create a YAML file in drlgen-tests/src/main/resources/scenarios/:

name: My New Scenario
description: Brief description

requirement: |
  Natural language description of the rule logic...

factTypes:
  - name: MyFact
    fields:
      field1: String
      field2: int

testCases:
  - name: Test Case 1
    input:
      - _type: MyFact
        field1: "value"
        field2: 42
    expectedRulesFired: 1
    expectedFields:
      field1: "expected"

The scenario will be automatically picked up by tests using YAMLScenarioLoader.

Dependencies

Drools - Rule engine (compilation and execution)
LangChain4j - AI model integration
Quarkus - Web UI framework
SnakeYAML - YAML parsing
JUnit 5 - Testing framework
AssertJ - Fluent assertions

License

This project is for testing and evaluation purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
drlgen-core		drlgen-core
drlgen-tests		drlgen-tests
drlgen-ui		drlgen-ui
examples		examples
test-runs		test-runs
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LIBRARY_MANUAL.md		LIBRARY_MANUAL.md
LICENSE		LICENSE
NOTES.md		NOTES.md
QWEN25_FAILURE_REPORT.md		QWEN25_FAILURE_REPORT.md
README.md		README.md
models.yaml		models.yaml
pom.xml		pom.xml

Folders and files

Latest commit

History

Repository files navigation

DRL Generation Tests

Overview

Project Structure

Modules

Prerequisites

Quick Start

1. Run unit tests (no Ollama required)

2. Run integration tests (requires Ollama)

3. Start the Web UI

Configuration

models.yaml

Agent Types

Environment Variables

System Properties

Test Scenarios

Available Scenarios

Generation Approaches

1. Direct DRL Generation

2. YAML-to-DRL Generation

3. Plain English Generation

Supported Models

Web UI API

Endpoints

Example Request

Running Tests

Unit Tests (Fast, No LLM Required)

Integration Tests (Requires Ollama)

Model Comparison

Test Profiles

Running the Comparison Runner

Running as Standalone JAR (Recommended for Distribution)

Using the Shell Script

Using Maven Directly

Command Line Options

Output

Adding New Scenarios

Dependencies

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages