Skip to content

prithvinairr/AAE---Autonomous-Analytics-Engineer

Repository files navigation

AAE - Autonomous Analytics Engineer

A local-first analytics engine that turns arbitrary tabular files into audited, Power BI-ready analytical packages.

Python FastAPI Power BI License

Quick Start . Pipeline . Outputs . Evaluation . Architecture


Overview

AAE is a multi-agent analytics system for CSV, TSV, Excel, and Parquet files. It profiles the dataset, cleans it, performs statistically guarded EDA, builds Power BI-style semantic artifacts, generates dashboard stories, and produces an assurance report that checks output quality.

It is designed for realistic data work: malformed CSVs, mixed types, schema drift, sparse columns, high-cardinality IDs, randomized no-signal datasets, and large CSV/TSV files that should be processed in chunks rather than loaded fully into memory.

What You Get

Capability What AAE Produces
Data audit schema inference, date detection, cardinality, nulls, outliers, type recommendations
Cleaning cleaned Parquet output, cleaning decision trace, dtype enforcement, risk notes
EDA distributions, correlations, trends, movers, KPI candidates, significance-aware insights
Modeling fact table, dimension tables, data dictionary, TMDL semantic model files
DAX validated SUM, AVG, COUNT, time intelligence, contribution, and rolling measures where applicable
Storytelling dashboard page recommendations, chart suggestions, executive headlines, caveats
Assurance consistency checks, DAX validity, story evidence support, cleaning risk, model health

Pipeline

flowchart LR
    A["Upload Dataset"] --> B["AUDIT<br/>Data Health Inspector"]
    B --> C["CLEAN<br/>Data Surgeon"]
    C --> D["ANALYZE<br/>Insight Miner"]
    D --> E["ARCHITECT<br/>Schema Engineer"]
    E --> F["STORY<br/>Dashboard Storyteller"]
    F --> G["ASSURANCE<br/>Output QA Lead"]
    G --> H["Power BI-ready Package"]
Loading
Phase Agent Primary Output
AUDIT Data Health Inspector data_health_report.json
CLEAN Data Surgeon cleaned_data.parquet, cleaning_report.json
ANALYZE Insight Miner statistical_analysis_report.json
ARCHITECT Schema Engineer star schema tables, DAX, dictionary, TMDL
STORY Dashboard Storyteller dashboard_stories.json
ASSURANCE Output QA Lead assurance_report.json

Large CSV/TSV files are routed through agents/warehouse.py, which scans, cleans, aggregates, and writes chunked outputs without requiring a full in-memory load.

Quick Start

1. Clone

git clone https://github.com/prithvinairr/AAE---Autonomous-Analytics-Engineer.git
cd AAE---Autonomous-Analytics-Engineer

2. Create an Environment

Windows PowerShell:

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt

macOS/Linux:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

3. Run

python main.py

Open:

http://localhost:8000

Upload a dataset, review the inferred KPI/date/role selections, optionally answer the analyst interview, and run the pipeline.

Supported Inputs

Format Extensions
CSV .csv
TSV .tsv
Excel .xlsx, .xls
Parquet .parquet, .pq

Uploads are capped at 5GB. Large CSV/TSV files automatically switch to chunked full-file mode.

Outputs

Each run can create a local package like this:

output/
  data_health_report.json
  cleaning_report.json
  cleaned_data.parquet
  statistical_analysis_report.json
  dashboard_stories.json
  dax_measures.json
  data_dictionary.json
  data_dictionary.csv
  assurance_report.json
  star_schema/
    fact_data.parquet
    dim_*.parquet
    tmdl/

Generated data and reports stay local and are ignored by Git.

Evaluation

AAE includes reproducible evaluation harnesses for data resilience, statistical restraint, semantic-model validity, recovery behavior, and large-file processing.

python benchmark_aae.py --include-chunked
python elite_eval.py

For the 1GB chunked-processing proof:

python elite_eval.py --large-mb 1024

Latest Benchmark Summary

Last benchmark run: 2026-05-03T13:59:31

Metric Result
Dataset benchmark cases 11/11 passed
Readiness edge cases 5/5 passed
Benchmark pass rate 100.0%
Average Assurance 98.5/100
Minimum Assurance 96/100
DAX validation All passed
Required artifacts All present
Senior analyst reviews All present

Latest Adversarial Eval Summary

Last adversarial eval run: 2026-05-03T13:53:17

Metric Result
Adversarial perturbations 24/24 passed
Perturbation categories 9
Self-healing probes 4/4 passed
Peer-review checks 24/24 passed
TMDL exports 24/24 passed
Latency gate Passed
Average Assurance 99/100
Minimum Assurance 99/100

Large-File Eval

Metric Result
Generated CSV size 1,025.94 MB
Rows processed 18,400,000
Columns processed 7
Processing mode chunked_full_file
Time to insight 184.31 seconds
Assurance 99/100
Result PASS

Anti-Hallucination Control

The randomized-control eval passed with:

  • 0 predictive false-positive insight signals
  • 0 strong significant random correlations promoted
  • no significant random trend promoted
  • cautious/no-dominant-pattern language present

Read the full evaluation notes in docs/EVALUATION.md.

Project Structure

agents/                 pipeline agents and QA modules
evals/                  perturbation and latency evaluation utilities
static/                 browser UI
docs/                   architecture and evaluation notes
main.py                 FastAPI app and WebSocket pipeline runner
benchmark_aae.py        dataset benchmark harness
elite_eval.py           adversarial evaluation harness
generate_data.py        synthetic local data generator
requirements.txt        Python dependencies

Local Development

Run syntax checks:

python -m py_compile main.py benchmark_aae.py elite_eval.py agents/*.py evals/*.py
node --check static/app.js

Run the app:

python main.py

Important generated folders are ignored by Git:

data/
output/
output_benchmark/
output_elite_audit/
output_test*/

Documentation

Tech Stack

Layer Tools
Backend FastAPI, WebSocket
Data Pandas, NumPy, PyArrow; SciPy optional
Frontend HTML, CSS, JavaScript
BI handoff Parquet star schema, DAX, data dictionary, TMDL

License

MIT License. See LICENSE.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors