Skip to content

Latest commit

 

History

History
89 lines (67 loc) · 2.28 KB

File metadata and controls

89 lines (67 loc) · 2.28 KB

Evaluation

AAE includes a reproducible adversarial evaluation suite. The goal is to measure robustness, statistical restraint, semantic-model validity, recovery behavior, and latency rather than rely on static screenshots.

Commands

python benchmark_aae.py --include-chunked
python elite_eval.py
python elite_eval.py --large-mb 1024

Latest Benchmark Summary

Last run: 2026-05-03T13:59:31

Check Result
Dataset benchmark cases 11/11 passed
Readiness edge cases 5/5 passed
Benchmark pass rate 100.0%
Average Assurance 98.5/100
Minimum Assurance 96/100
DAX validation All passed
Required artifacts All present
Senior analyst reviews All present

Latest Adversarial Eval Summary

Last run: 2026-05-03T13:53:17

Check Result
Adversarial perturbations 24/24 passed
Perturbation categories 9
Self-healing probes 4/4 passed
Peer-review checks 24/24 passed
TMDL exports 24/24 passed
Latency gate Passed
Average Assurance 99/100
Minimum Assurance 99/100

Covered perturbation categories:

  • encoding
  • delimiter
  • schema
  • type
  • locale
  • semantic
  • sparsity
  • cardinality
  • anti-hallucination

Large-File Eval

The 1GB chunked eval was run separately to avoid making every evaluation expensive.

Metric Result
Generated CSV size 1,025.94 MB
Rows processed 18,400,000
Columns processed 7
Processing mode chunked_full_file
Time to insight 184.31 seconds
Assurance 99/100
Result PASS

Anti-Hallucination Control

The randomized control dataset passed with:

  • 0 predictive false-positive insight signals
  • 0 strong significant random correlations promoted
  • no significant random trend promoted
  • cautious/no-dominant-pattern language present

Self-Healing Probes

The recovery trace covers:

  • optional SciPy unavailable, recovered through NumPy fallback
  • bad encoding, recovered through encoding retry
  • parser width mismatch, recovered through resilient CSV parsing
  • simulated memory pressure, recovered through chunked processing

Notes

Generated eval outputs are intentionally ignored by Git. Re-run the commands above to reproduce fresh local reports under output_benchmark/ and output_elite_audit/.