Skip to content
View AaronNHorvitz's full-sized avatar

Block or report AaronNHorvitz

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
AaronNHorvitz/README.md

Aaron Horvitz GitHub Portfolio Banner

Featured Portfolio Projects

Project Focus Technical Highlights
Agent Gatsby Local-first GenAI pipeline Deterministic state machine, quote verification, bounded expansion loops, hallucination control, multilingual output
Labels On Tap AI verification & compliance Local-first ML compliance engine using OCR/NLP, deterministic matching, FastAPI, Docker, and human-in-the-loop review
LexiChess LLM/agent benchmarking Reproducible chess-agent evaluation with legal-move validation, SQLite storage, ratings, PGN exports, and live broadcast tooling
TopicMiner Deterministic NLP classification Classical NLP and ML toolkit for high-volume, auditable email classification and topic discovery
AdaptiveLASSO Statistical computing Python package implementing adaptive LASSO with a Statsmodels-style workflow, tests, documentation, and demo notebook
iSignify Bioinformatics & private AI Local/on-device AI interpretation with custom k-mer comparison for genomic signature discovery

Project Portfolio Narrative

Agent Gatsby

A local-first AI pipeline built around deterministic workflow control. The project demonstrates how GenAI systems can be constrained, verified, and structured to reduce hallucination risk while still producing useful multilingual output.

Key themes: GenAI orchestration, deterministic verification, quote validation, local-first AI, bounded generation loops.

Labels On Tap

A local-first AI verification and compliance system that combines deterministic string matching, OCR/NLP, and human-in-the-loop review. The project demonstrates how AI can assist regulatory triage while preserving auditability.

Key themes: compliance automation, OCR/NLP, FastAPI, Docker, deterministic matching, human review.

LexiChess

A chess tournament and evaluation platform for benchmarking LLMs, chess agents, and custom engines. The project emphasizes reproducibility, legal-move validation, rating updates, persistent storage, and live interaction.

Key themes: LLM evaluation, agent benchmarking, SQLite, FastAPI, PGN exports, live broadcast tooling.

TopicMiner

A modular Python toolkit for deterministic, high-volume email classification using classical NLP and machine learning methods. It is designed as an auditable alternative to LLM-based classification for narrow-taxonomy tasks.

Key themes: LDA, Word2Vec, Doc2Vec, K-Means, Random Forest, XGBoost, deterministic NLP.

AdaptiveLASSO

A statistical computing project implementing adaptive LASSO in Python with a clean workflow inspired by Statsmodels. The project shows applied statistical modeling, packaging, testing, documentation, and reproducibility.

Key themes: regression, shrinkage, statistical computing, Python packaging, reproducible modeling.

iSignify

A bioinformatics and AI pipeline focused on local/on-device interpretation and custom k-mer comparison for identifying unique genomic signatures across large datasets.

Key themes: private AI, bioinformatics, k-mer algorithms, local inference, computational efficiency.


Professional Background Snapshot

Gen AI Python Systems Engineer — PwC Advisory Services

Contributing to production-grade Generative AI systems and building modular Python pipeline components for enterprise AI delivery.

Recent work includes object-oriented pipeline architecture, complex data ingestion, Pydantic validation, production Python development, unit testing, and peer-reviewed code delivery.

Statistician / Data Scientist, GS-14 — IRS Research, Analytics, and Applied Statistics

Led and supported projects involving LLM/RAG interfaces, email classification, audit automation, executive briefings, forecasting, and Python ETL systems.

This work combined statistical modeling, automation, technical leadership, and communication with senior federal stakeholders.

Data Scientist — McDermott International

Built forecasting systems, workforce planning models, Python statistical tooling, and automated executive reporting workflows.

Work included time-series forecasting, statistical learning, internal Python tool development, and applied statistics training.

Data Scientist — Avangard Innovative

Developed computer vision models, real-time predictive modeling systems, API normalization workflows, and statistical process analysis tools.

Work included CNN image classification, sensor-data modeling, API-driven normalization, ANOVA/Tukey testing, SQL, and time-series forecasting.


Technical Stack

Languages & Core Tools

Python · SQL · Linux · GitHub · GitLab · VS Code · JupyterLab

Data Science & Machine Learning

Pandas · NumPy · Scikit-Learn · Statsmodels · PyTorch · XGBoost · Random Forests · LDA · Word2Vec · Doc2Vec

GenAI & Engineering

LLMs · RAG · Agentic Workflows · FastAPI · Docker · Pydantic · SQLite · Local-First AI · Human-in-the-Loop Review

Statistical Methods

Time Series · SARIMAX · LOWESS · Regression · Classification · Multivariate Analysis · Forecasting · Shrinkage Methods

Engineering Practices

Unit Testing · Code Review · Modular Architecture · ETL Automation · Data Validation · Reproducible Workflows · Documentation


Portfolio Principles

I am especially interested in AI and data systems that are:

  • Useful in real operational settings
  • Reviewable and explainable
  • Reliable under messy real-world data
  • Statistically defensible
  • Testable and maintainable
  • Production-ready rather than demo-only

My preferred approach is pragmatic: use GenAI where it adds value, use classical ML where it is stronger, and use deterministic validation wherever reliability matters.


Connect

Please use the contact and social links in my GitHub profile sidebar.

GitHub: github.com/AaronNHorvitz

LinkedIn: linkedin.com/in/aaron-horvitz-a666215

Resume: Download Resume

Pinned Loading

  1. TopicMiner TopicMiner Public

    A Python Library for Advanced Email Topic Modeling and Text Preprocessing

    Jupyter Notebook

  2. AdaptiveLASSO AdaptiveLASSO Public

    Adaptive LASSO for linear regression with a Statsmodels-style API.

    Python 11 2

  3. iSignify iSignify Public

    A microbial DNA signature program for identifying unique DNA signatures in microbial species.

    Python 2 2