Evaluation

AAE includes a reproducible adversarial evaluation suite. The goal is to measure robustness, statistical restraint, semantic-model validity, recovery behavior, and latency rather than rely on static screenshots.

Commands

python benchmark_aae.py --include-chunked
python elite_eval.py
python elite_eval.py --large-mb 1024

Latest Benchmark Summary

Last run: 2026-05-03T13:59:31

Check	Result
Dataset benchmark cases	11/11 passed
Readiness edge cases	5/5 passed
Benchmark pass rate	100.0%
Average Assurance	98.5/100
Minimum Assurance	96/100
DAX validation	All passed
Required artifacts	All present
Senior analyst reviews	All present

Latest Adversarial Eval Summary

Last run: 2026-05-03T13:53:17

Check	Result
Adversarial perturbations	24/24 passed
Perturbation categories	9
Self-healing probes	4/4 passed
Peer-review checks	24/24 passed
TMDL exports	24/24 passed
Latency gate	Passed
Average Assurance	99/100
Minimum Assurance	99/100

Covered perturbation categories:

encoding
delimiter
schema
type
locale
semantic
sparsity
cardinality
anti-hallucination

Large-File Eval

The 1GB chunked eval was run separately to avoid making every evaluation expensive.

Metric	Result
Generated CSV size	1,025.94 MB
Rows processed	18,400,000
Columns processed	7
Processing mode	`chunked_full_file`
Time to insight	184.31 seconds
Assurance	99/100
Result	PASS

Anti-Hallucination Control

The randomized control dataset passed with:

0 predictive false-positive insight signals
0 strong significant random correlations promoted
no significant random trend promoted
cautious/no-dominant-pattern language present

Self-Healing Probes

The recovery trace covers:

optional SciPy unavailable, recovered through NumPy fallback
bad encoding, recovered through encoding retry
parser width mismatch, recovered through resilient CSV parsing
simulated memory pressure, recovered through chunked processing

Notes

Generated eval outputs are intentionally ignored by Git. Re-run the commands above to reproduce fresh local reports under output_benchmark/ and output_elite_audit/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation

Commands

Latest Benchmark Summary

Latest Adversarial Eval Summary

Large-File Eval

Anti-Hallucination Control

Self-Healing Probes

Notes

FilesExpand file tree

EVALUATION.md

Latest commit

History

EVALUATION.md

File metadata and controls

Evaluation

Commands

Latest Benchmark Summary

Latest Adversarial Eval Summary

Large-File Eval

Anti-Hallucination Control

Self-Healing Probes

Notes