AATest is a comprehensive testing framework designed specifically for AALang products. It provides a structured approach to testing AALang agents, actors, and workflows through message-based test execution.
AATest is an AALang-based testing tool that follows the 4-mode-13-actor pattern, similar to GAB. It evaluates test needs, generates test files, executes tests, and reports results for AALang products.
- Message-Based Testing: All tests are message-based - tests send AALang messages to actors and observe resulting messages, state changes, and behaviors
- Three Test Types: Supports comprehensive testing at different levels of granularity
- Automatic Test Generation: Analyzes products and generates appropriate test files
- LLM-Native Execution: Tests execute within the LLM context, leveraging AALang's execution model
- Comprehensive Reporting: Detailed test results with pass/fail status, execution logs, and summary statistics
AATest supports three distinct test types, each designed for different levels of testing:
Purpose: Tests how individual actors respond to messages - tests individual actor responsibilities in isolation
Use Cases:
- Verify actor behavior when receiving specific messages
- Test actor responsibilities in isolation
- Validate actor message handling logic
- Check actor state management for individual messages
File Pattern: {product-name}-message-response-tests.jsonld
Purpose: Tests message flow between actors, mode transitions, and state management - tests actor interactions
Use Cases:
- Verify communication between actors
- Test mode transitions triggered by messages
- Validate state management across actor interactions
- Check message routing and delivery
File Pattern: {product-name}-message-flow-tests.jsonld
Purpose: Tests complete agent workflows from user perspective - tests end-to-end workflows and full agent execution
Use Cases:
- Test complete user workflows
- Verify end-to-end agent behavior
- Validate full agent execution paths
- Test complex multi-actor, multi-mode scenarios
File Pattern: {product-name}-agent-workflow-tests.jsonld
All AATest tests follow a consistent structure based on the AALang test specification:
- name: Test name identifier
- description: Test description explaining what messages are sent, what actor/mode/workflow is being tested, and what results are expected
- type: Test type classification (MessageResponseTest, MessageFlowTest, or AgentWorkflowTest)
- priority: Optional test priority for execution order
- messages: Array of AALang messages to send during test execution
- messageSequence: Optional ordered sequence of messages if order matters
- testContext: Optional additional context (e.g., initial state, preconditions)
- observedMessages: Expected messages observed in response
- observedStateChanges: Expected state changes observed
- observedBehavior: Expected behavioral observations (actor actions, mode transitions, message acceptance/rejection)
AATest supports a comprehensive set of assertion types, all designed to be LLM-friendly:
- contains: Checks if output contains specific text/pattern
- isLike: Semantic similarity check using LLM understanding
- boundedDeviation: Allows variance within user-defined bounds for non-deterministic checks
- matchesPattern: Pattern/regex matching
- hasStructure: Verifies structural elements (JSON-LD nodes, graph structure)
- followsSequence: Verifies actions/events occur in expected order
- satisfiesConstraint: Logical constraint checks
- withinRange: Numeric range checks
- hasProperty: Presence of specific properties/attributes
- excludes: Ensures something is not present
- semanticEquivalence: Semantic equivalence check
- completeness: All required elements present
- consistency: Behavior consistency across runs
- modeTransition: AALang-specific: mode transitions occur correctly
- actorBehavior: AALang-specific: actor behaves according to responsibilities
- messageFormat: AALang-specific: messages follow AALang message format
- stateConsistency: State consistency before/after operations
AATest supports test fixtures and mocks:
- Mock Actors: Mock actors defined in test files must have
_aamockin their id property - MockManagerActor_aamock: Manages mocks during test execution
- Auto-Generation: Basic mocks are auto-generated, with manual override available in test files
AATest supports parameterized tests for testing multiple scenarios with different parameter values.
Tests can be organized into suites/groups for better organization and selective execution.
AATest operates in four modes:
- Analyzes target product structure
- Identifies test gaps and requirements
- Determines which test types are needed
- Prioritizes test generation
- Generates test files based on product analysis
- Creates MessageResponseTest, MessageFlowTest, and AgentWorkflowTest files as needed
- Validates test specifications
- Organizes tests into appropriate files
- Executes tests by loading products as agents
- Sends test messages and observes results
- Manages mock actors during execution
- Tracks test execution state
- Aggregates test results
- Generates comprehensive test reports
- Provides detailed execution logs
- Reports summary statistics
Test files are organized in a tests/ subdirectory by default (user-configurable):
{product-directory}/
├── {product-name}.jsonld
└── tests/
├── {product-name}-message-response-tests.jsonld
├── {product-name}-message-flow-tests.jsonld
├── {product-name}-agent-workflow-tests.jsonld
└── {product-name}-test-results.md
- Sequential: Tests run one at a time in order (by priority, then alphabetically by test name)
- Filter tests by test type
- Execute single tests for debugging
Verbose mode provides detailed execution logs for debugging. To enable verbose mode, simply request it when running tests. For example:
- "Run tests in verbose mode"
- "Execute tests with verbose output"
- "Run [test name] with verbose logging"
When verbose mode is enabled, AATest includes detailed execution logs for each test step:
- For MessageResponseTests: Step-by-step logs with input, output, LLM reasoning, and mock interactions
- For MessageFlowTests: Interaction logs, transition logs, state operation logs, and mock interactions
- For AgentWorkflowTests: Workflow execution logs, user interaction logs, agent response logs, and mode transition logs
For test suites with large numbers of tests (over ~50 tests), it is recommended to execute each group of tests one at a time rather than running all tests together. This best practice applies to:
- MessageResponseTest groups: Execute all message-response tests together, then move to the next group
- MessageFlowTest groups: Execute all message-flow tests together, then move to the next group
- AgentWorkflowTest groups: Execute all agent-workflow tests together, then move to the next group
Executing test groups separately helps manage execution complexity, improves reliability, and makes it easier to identify and debug issues within specific test categories.
You can then ask for a testing summary if you want one that combines the results of the three testing reports.
- Location:
tests/{product-name}-test-results.md - Includes: Pass/fail status, detailed execution logs, summary statistics
- Output: Both console and file output
- An LLM that can execute JSON-LD based prompts (e.g., Claude, GPT-4)
- Access to the AATest specification files (
AATest.jsonldandAATest_spec.jsonld) - An AALang product to test
- Load AATest: Add the
AATest.jsonldandAATest_spec.jsonldfiles into your LLM environment - Provide Product Path: AATest will request the path to your AALang product file
- Follow the Workflow: AATest will guide you through:
- Test Need Evaluation → Test Generation → Test Execution → Test Result Reporting
- Review Results: Check the generated test results file for detailed execution logs
You: [Load AATest.jsonld]
AATest: "Welcome to AATest! Please provide the path to your AALang product file."
You: "my-product.jsonld"
AATest: [Test Need Evaluation Mode]
"Analyzing my-product.jsonld...
Found 3 actors, 2 modes, 5 personas.
Recommended tests:
- 5 MessageResponseTests (one per persona)
- 3 MessageFlowTests (actor interactions)
- 2 AgentWorkflowTests (end-to-end workflows)"
AATest: [Test Generation Mode]
"Generating test files...
Created: tests/my-product-message-response-tests.jsonld
Created: tests/my-product-message-flow-tests.jsonld
Created: tests/my-product-agent-workflow-tests.jsonld"
AATest: [Test Execution Mode]
"Executing tests...
MessageResponseTest: test_actor_1_basic_response... PASSED
MessageFlowTest: test_actor_communication... PASSED
..."
You: "Run tests in verbose mode"
AATest: [Test Execution Mode]
"Executing tests with verbose output...
MessageResponseTest: test_actor_1_basic_response
Step: Loading test file
Step: Adopting actor definition
Input: {routingGraph: {...}, payload: {...}}
Output: {response: {...}}
Reasoning: [LLM reasoning used]
Mock Interactions: [...]
... PASSED"
AATest: [Test Result Reporting Mode]
"Test execution complete!
Results saved to: tests/my-product-test-results.md"
All AATest tests are message-based. This means:
- Inputs are messages: Tests send AALang messages to actors
- Outputs are observations: Tests observe resulting messages, state changes, and behaviors
- No direct code execution: Tests work within AALang's message-passing architecture
In AALang, actors adopt definitions - they do not simulate them. During test execution:
- The actor becomes the entity under test through definition adoption
- There is no simulation layer
- The actor IS the entity under test during execution
AATest supports bounded non-determinism through:
- boundedDeviation assertions
- consistency assertions
- User-defined deviation bounds (e.g., semantic similarity thresholds)
- Start with MessageResponseTests: Test individual actor behavior before testing interactions
- Use MessageFlowTests for Integration: Verify actor communication and mode transitions
- Use AgentWorkflowTests for End-to-End: Test complete user workflows
- Leverage Semantic Assertions: Use
isLikeandsemanticEquivalencefor LLM-friendly testing - Organize Tests into Suites: Group related tests for better organization
- Use Parameterized Tests: Test multiple scenarios efficiently
- Review Test Results: Check detailed logs to understand test failures
AATest executes within LLM contexts, which means semantic evaluation can naturally vary between runs. To achieve more consistent and reproducible test results, follow these guidelines:
Note: AATest provides the framework and assertion types, but achieving consistent results requires you to configure your tests appropriately. The practices below indicate what AATest does automatically versus what requires your input.
AATest provides: All assertion types (contains, excludes, isLike, hasProperty, followsSequence, modeTransition, matchesPattern, etc.)
Requires user input: You must choose which assertion types to use in your test definitions. Prefer objective, measurable assertions over subjective semantic checks:
- Replace
isLikewithcontainsorexcludes: When checking for specific content, usecontainsto verify exact text or patterns are present, orexcludesto ensure unwanted content is absent - Replace
actorBehaviorwith explicit checks: Instead of relying on semantic evaluation of actor behavior, use specific assertions likehasProperty,followsSequence, ormodeTransitionto verify concrete behaviors - Use
matchesPatternfor structured validation: When you need to verify message formats or structured data,matchesPatternprovides more deterministic results than semantic checks
Example of user input:
{
"observedMessages": [{
"assertion": "contains",
"value": "status: success"
}]
}Instead of:
{
"observedMessages": [{
"assertion": "isLike",
"value": "actor responded successfully"
}]
}AATest provides: Test execution framework that evaluates assertions and reports results
Requires user input: You must write clear test descriptions and define precise expected outputs. Make your test expectations clear and measurable:
- Document explicit criteria in test descriptions: Clearly state what constitutes a pass or fail in the test description
- Specify expected outputs precisely: Define exactly what messages, state changes, or behaviors should occur
- Include verification steps: Add explicit verification steps for state isolation, access control, and other critical behaviors
Example of user input:
{
"name": "test_user_authentication",
"description": "Test verifies that actor accepts valid credentials and rejects invalid ones. Pass criteria: (1) Valid credentials return message with 'authenticated: true', (2) Invalid credentials return message with 'authenticated: false', (3) No state changes occur for invalid credentials",
"observedMessages": [{
"assertion": "contains",
"value": "authenticated: true"
}]
}AATest provides: Support for boundedDeviation assertion type
Requires user input: You must configure explicit thresholds and bounds when using semantic checks. When semantic evaluation is necessary, use bounded deviation to control variance:
- Use
boundedDeviationwith fixed thresholds: Set explicit similarity thresholds or deviation bounds for semantic checks - Document acceptable variance: Clearly define what level of variation is acceptable in test descriptions
- Combine with objective checks: Use semantic checks alongside objective assertions to balance flexibility with determinism
Example of user input:
{
"observedMessages": [{
"assertion": "boundedDeviation",
"value": "user greeting message",
"threshold": 0.85,
"baseAssertion": "isLike"
}]
}Or with explicit bounds:
{
"observedMessages": [{
"assertion": "boundedDeviation",
"value": "response time",
"min": 100,
"max": 500,
"unit": "milliseconds"
}]
}AATest provides: testContext field support and test execution framework
Requires user input: You must specify initial state, preconditions, and execution requirements. Help ensure consistent execution across runs:
- Specify test context requirements: Document any initial state, preconditions, or setup needed in
testContext - Clarify execution capabilities: Note any specific LLM capabilities or configurations required for consistent results
- Include state isolation verification: Add explicit checks to verify state isolation and access control when relevant
Example of user input:
{
"name": "test_state_isolation",
"testContext": {
"initialState": {
"userCount": 0,
"activeSessions": []
},
"preconditions": [
"No existing user data",
"Actor in 'ready' mode"
],
"executionRequirements": {
"requiresStateIsolation": true,
"requiresAccessControl": true
}
},
"observedStateChanges": [{
"assertion": "hasProperty",
"property": "userCount",
"value": 1
}]
}LLM-based semantic evaluation is inherently variable because language models interpret meaning contextually. To get deterministic results, move from subjective semantic checks to objective, measurable criteria. This doesn't mean avoiding semantic assertions entirely—rather, use them strategically with appropriate bounds and combine them with objective checks for maximum reliability.
AATest works seamlessly with products created by GAB:
- GAB generates AALang products
- AATest tests those products
- Together, they provide a complete development and testing workflow
For complete technical specifications, see:
AATest_spec.jsonld: Complete test specification and structure definitionsAATest.jsonld: AATest agent implementation and execution instructions
AATest is part of the AALang/GAB project. See the main LICENSE file for license information.
Ready to test your AALang products? Load AATest.jsonld and provide your product file path!