Aaron Horvitz AaronNHorvitz

Featured Portfolio Projects

Project	Focus	Technical Highlights
Agent Gatsby	Local-first GenAI pipeline	Deterministic state machine, quote verification, bounded expansion loops, hallucination control, multilingual output
Labels On Tap	AI verification & compliance	Local-first ML compliance engine using OCR/NLP, deterministic matching, FastAPI, Docker, and human-in-the-loop review
LexiChess	LLM/agent benchmarking	Reproducible chess-agent evaluation with legal-move validation, SQLite storage, ratings, PGN exports, and live broadcast tooling
TopicMiner	Deterministic NLP classification	Classical NLP and ML toolkit for high-volume, auditable email classification and topic discovery
AdaptiveLASSO	Statistical computing	Python package implementing adaptive LASSO with a Statsmodels-style workflow, tests, documentation, and demo notebook
iSignify	Bioinformatics & private AI	Local/on-device AI interpretation with custom k-mer comparison for genomic signature discovery

Project Portfolio Narrative

Agent Gatsby

A local-first AI pipeline built around deterministic workflow control. The project demonstrates how GenAI systems can be constrained, verified, and structured to reduce hallucination risk while still producing useful multilingual output.

Key themes: GenAI orchestration, deterministic verification, quote validation, local-first AI, bounded generation loops.

Labels On Tap

A local-first AI verification and compliance system that combines deterministic string matching, OCR/NLP, and human-in-the-loop review. The project demonstrates how AI can assist regulatory triage while preserving auditability.

Key themes: compliance automation, OCR/NLP, FastAPI, Docker, deterministic matching, human review.

LexiChess

A chess tournament and evaluation platform for benchmarking LLMs, chess agents, and custom engines. The project emphasizes reproducibility, legal-move validation, rating updates, persistent storage, and live interaction.

Key themes: LLM evaluation, agent benchmarking, SQLite, FastAPI, PGN exports, live broadcast tooling.

TopicMiner

A modular Python toolkit for deterministic, high-volume email classification using classical NLP and machine learning methods. It is designed as an auditable alternative to LLM-based classification for narrow-taxonomy tasks.

Key themes: LDA, Word2Vec, Doc2Vec, K-Means, Random Forest, XGBoost, deterministic NLP.

AdaptiveLASSO

A statistical computing project implementing adaptive LASSO in Python with a clean workflow inspired by Statsmodels. The project shows applied statistical modeling, packaging, testing, documentation, and reproducibility.

Key themes: regression, shrinkage, statistical computing, Python packaging, reproducible modeling.

iSignify

A bioinformatics and AI pipeline focused on local/on-device interpretation and custom k-mer comparison for identifying unique genomic signatures across large datasets.

Key themes: private AI, bioinformatics, k-mer algorithms, local inference, computational efficiency.

Professional Background Snapshot

Gen AI Python Systems Engineer — PwC Advisory Services

Contributing to production-grade Generative AI systems and building modular Python pipeline components for enterprise AI delivery.

Recent work includes object-oriented pipeline architecture, complex data ingestion, Pydantic validation, production Python development, unit testing, and peer-reviewed code delivery.

Statistician / Data Scientist, GS-14 — IRS Research, Analytics, and Applied Statistics

Led and supported projects involving LLM/RAG interfaces, email classification, audit automation, executive briefings, forecasting, and Python ETL systems.

This work combined statistical modeling, automation, technical leadership, and communication with senior federal stakeholders.

Data Scientist — McDermott International

Built forecasting systems, workforce planning models, Python statistical tooling, and automated executive reporting workflows.

Work included time-series forecasting, statistical learning, internal Python tool development, and applied statistics training.

Data Scientist — Avangard Innovative

Developed computer vision models, real-time predictive modeling systems, API normalization workflows, and statistical process analysis tools.

Work included CNN image classification, sensor-data modeling, API-driven normalization, ANOVA/Tukey testing, SQL, and time-series forecasting.

Technical Stack

Languages & Core Tools

Python · SQL · Linux · GitHub · GitLab · VS Code · JupyterLab

Data Science & Machine Learning

Pandas · NumPy · Scikit-Learn · Statsmodels · PyTorch · XGBoost · Random Forests · LDA · Word2Vec · Doc2Vec

GenAI & Engineering

LLMs · RAG · Agentic Workflows · FastAPI · Docker · Pydantic · SQLite · Local-First AI · Human-in-the-Loop Review

Statistical Methods

Time Series · SARIMAX · LOWESS · Regression · Classification · Multivariate Analysis · Forecasting · Shrinkage Methods

Engineering Practices

Unit Testing · Code Review · Modular Architecture · ETL Automation · Data Validation · Reproducible Workflows · Documentation

Portfolio Principles

I am especially interested in AI and data systems that are:

Useful in real operational settings
Reviewable and explainable
Reliable under messy real-world data
Statistically defensible
Testable and maintainable
Production-ready rather than demo-only

My preferred approach is pragmatic: use GenAI where it adds value, use classical ML where it is stronger, and use deterministic validation wherever reliability matters.

Connect

Please use the contact and social links in my GitHub profile sidebar.

GitHub: github.com/AaronNHorvitz

LinkedIn: linkedin.com/in/aaron-horvitz-a666215

Resume: Download Resume

Provide feedback

Saved searches

Use saved searches to filter your results more quickly