🧠 LLM Evaluation Prototype

This project explores how to evaluate the performance of GenAI models on financial summarization tasks using metrics like ROUGE. It includes a hands-on pipeline built with Hugging Face Transformers, Python, and Streamlit — designed to help transition from compliance roles to AI engineering.

📌 Objectives

Build an end-to-end evaluation pipeline for financial text summarization
Compare GenAI-generated summaries against reference texts using ROUGE
Visualize evaluation results with Streamlit dashboards
Document learnings and progress toward AI engineering readiness

🛠️ Tech Stack

Tool/Library	Purpose
Python	Core scripting and data handling
Hugging Face	GenAI model loading and inference
`evaluate`	ROUGE metric computation
Streamlit	Dashboard visualization
Git + GitHub	Version control and project tracking
VS Code + Jupyter	Development environment

📂 Folder Structure

LLM-Evaluation-Prototype/ ├── Mini_Project/ │ ├── scripts/ # Evaluation scripts │ ├── data/ # Sample inputs and outputs ├── Learning_Notes/ # Daily reflections and learnings ├── README.md # Project overvie

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Learning Notes		Learning Notes
Mini Project/scripts		Mini Project/scripts
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 LLM Evaluation Prototype

📌 Objectives

🛠️ Tech Stack

📂 Folder Structure

About

Uh oh!

Releases

Packages

Languages

gwasiakshay/LLM-Evaluation-Prototype

Folders and files

Latest commit

History

Repository files navigation

🧠 LLM Evaluation Prototype

📌 Objectives

🛠️ Tech Stack

📂 Folder Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages