This project explores how to evaluate the performance of GenAI models on financial summarization tasks using metrics like ROUGE. It includes a hands-on pipeline built with Hugging Face Transformers, Python, and Streamlit β designed to help transition from compliance roles to AI engineering.
- Build an end-to-end evaluation pipeline for financial text summarization
- Compare GenAI-generated summaries against reference texts using ROUGE
- Visualize evaluation results with Streamlit dashboards
- Document learnings and progress toward AI engineering readiness
| Tool/Library | Purpose |
|---|---|
| Python | Core scripting and data handling |
| Hugging Face | GenAI model loading and inference |
evaluate |
ROUGE metric computation |
| Streamlit | Dashboard visualization |
| Git + GitHub | Version control and project tracking |
| VS Code + Jupyter | Development environment |
LLM-Evaluation-Prototype/ βββ Mini_Project/ β βββ scripts/ # Evaluation scripts β βββ data/ # Sample inputs and outputs βββ Learning_Notes/ # Daily reflections and learnings βββ README.md # Project overvie