This repository contains lightweight examples of evaluation artifacts used in LLM quality workflows.
It reflects interests in:
- LLM evaluation & quality analysis
- Prompt & rubric design
- Reliability, tone, and safety checks
- Data labeling & structured feedback
I work professionally in model evaluation, taxonomy, and annotation. Much of my applied work is proprietary, so this repo includes illustrative examples only.
evaluator_guidelines.md– judging instructions exampleprompt_rubric_example.md– sample structured rubrictest_prompts.csv– example prompt setsimple_eval_script.py– tiny Python scoring demo (illustrative)
- Clarity
- Alignment
- Educational applications of AI
- Safety & high-quality outputs