AI Engineer | Software Engineer | LLM & MLOps Researcher
Research Assistant @ Micro Electronics Research Lab (MERL)
LFXβ25 Mentee @ RISC-V International
Open-source contributor focused on production-ready AI systems, LLM evaluation, and reproducible ML pipelines.
Iβm an AI Engineer and Researcher working at the intersection of LLMs, MLOps, and open-source systems.
I build reliable, testable, and deployment-ready AI pipelines rather than experimental-only models.
- π§ LLM Evaluation & Benchmarking β functional, syntactic, adversarial
- π‘οΈ Hallucination Mitigation β GAN-based approaches for private LLMs
- βοΈ Reproducible ML Pipelines β CI/CD, logging, SLA-aware validation
- π RISC-V Tooling & Data β machine-readable specifications and verification
π‘ Making AI systems trustworthy in production is my passion.
-
πΉ Research Assistant β MERL
LLM evaluation pipelines, benchmarking frameworks, RISC-V tooling -
πΉ LFXβ25 Mentee β RISC-V International
Machine-readable RISC-V specifications, schemas, and CI validation
Languages: Python Β· Scala Β· Verilog Β· Java Β· Shell Β· JavaScript Β· HTML Β· CSS
AI / ML: PyTorch Β· TensorFlow Β· Hugging Face Transformers Β· GANs Β· LLM Evaluation Β· NumPy Β· Pandas Β· Scikit-learn
MLOps & Engineering: CI/CD Β· Docker Β· REST/gRPC Β· Logging & Monitoring Β· Reproducible Pipelines Β· Git Β· GitHub Actions Β· Linux Β· pytest
Data & Config: JSON Β· YAML Β· MySQL
π‘οΈ AI4org β GAN-based Hallucination Mitigation for Private LLMs
π GitHub Repository
- Built a privacy-first ML pipeline to detect and mitigate hallucinations in private LLMs
- Designed a GAN-style generator/discriminator for hallucination detection
- End-to-end pipeline: ingestion β validation β reproducible training β containerized inference
- Integrated CI/CD, automated testing, and monitoring for production readiness
π Designed for enterprise and on-prem LLM deployments where reliability matters.
π¬ ArcheV β LLM Benchmark Suite
π GitHub Repository
- Engineered a reproducible LLM benchmarking framework
- Standardized JSON I/O and CI-driven evaluation pipelines
- Validates functional and syntactic correctness for deployment decisions
π RISC-V Unified Database
π GitHub Repository
- Maintained versioned YAML/JSON schemas for RISC-V tooling
- Implemented CI validation to ensure data integrity and observability
- Improved downstream reliability for tooling and ML pipelines
- π Linux Foundation Mentorship Program (LFX) 2025
- π§ͺ Research Assistant at MERL
- π Improved LLM benchmarking reliability by ~25%
- π§ Hands-on experience with LLMs, GANs, MLOps, and CI/CD
- π Contributor to open-source and research-grade tooling
- πΌ LinkedIn: Shehroz Kashif
- π§ Email: sharooz57@gmail.com
β If you find my work useful, feel free to star a repository.
π€ Open to collaborations in AI, LLMs, MLOps, and open-source systems.


