AAE includes a reproducible adversarial evaluation suite. The goal is to measure robustness, statistical restraint, semantic-model validity, recovery behavior, and latency rather than rely on static screenshots.
python benchmark_aae.py --include-chunked
python elite_eval.py
python elite_eval.py --large-mb 1024Last run: 2026-05-03T13:59:31
| Check | Result |
|---|---|
| Dataset benchmark cases | 11/11 passed |
| Readiness edge cases | 5/5 passed |
| Benchmark pass rate | 100.0% |
| Average Assurance | 98.5/100 |
| Minimum Assurance | 96/100 |
| DAX validation | All passed |
| Required artifacts | All present |
| Senior analyst reviews | All present |
Last run: 2026-05-03T13:53:17
| Check | Result |
|---|---|
| Adversarial perturbations | 24/24 passed |
| Perturbation categories | 9 |
| Self-healing probes | 4/4 passed |
| Peer-review checks | 24/24 passed |
| TMDL exports | 24/24 passed |
| Latency gate | Passed |
| Average Assurance | 99/100 |
| Minimum Assurance | 99/100 |
Covered perturbation categories:
- encoding
- delimiter
- schema
- type
- locale
- semantic
- sparsity
- cardinality
- anti-hallucination
The 1GB chunked eval was run separately to avoid making every evaluation expensive.
| Metric | Result |
|---|---|
| Generated CSV size | 1,025.94 MB |
| Rows processed | 18,400,000 |
| Columns processed | 7 |
| Processing mode | chunked_full_file |
| Time to insight | 184.31 seconds |
| Assurance | 99/100 |
| Result | PASS |
The randomized control dataset passed with:
- 0 predictive false-positive insight signals
- 0 strong significant random correlations promoted
- no significant random trend promoted
- cautious/no-dominant-pattern language present
The recovery trace covers:
- optional SciPy unavailable, recovered through NumPy fallback
- bad encoding, recovered through encoding retry
- parser width mismatch, recovered through resilient CSV parsing
- simulated memory pressure, recovered through chunked processing
Generated eval outputs are intentionally ignored by Git. Re-run the commands above to reproduce fresh local reports under output_benchmark/ and output_elite_audit/.