LLM Behaviour Design & Validation Pipeline

(Originally developed and validated as an AI Tutor testing pipeline)

A general-purpose toolkit for designing, stress-testing, and validating the behaviour of LLM-based chatbots — without writing code.

What Is This?

This repository provides a behaviour design and evaluation pipeline for Large Language Model (LLM) chatbots.

You define how a chatbot should behave using a system prompt, test that behaviour against many inputs at scale, and optionally automatically evaluate whether the chatbot followed your rules — all using simple text files and CSVs.

If you can edit a text file, you can use this system.

Primary Validated Use Case: AI Tutors (Academic Integrity)

This pipeline was originally built to design and validate AI tutors for university courses, with a strong focus on academic integrity.

The validated tutor behaviour:

Helps students learn through explanation and guidance
Encourages critical thinking
Refuses to give direct answers to questions that contribute to grades or marks
Detects graded-style questions and red-flag phrasing
Resists jailbreaks, manipulation, and authority appeals
Maintains a professional, supportive teaching tone

This AI tutor use case has been:

Tested against 140+ adversarial prompts
Manually reviewed
Automatically evaluated with a custom rubric
Demonstrated to produce zero direct answer leaks

Broader Applications (Not Limited to Tutoring)

Although tutoring is the flagship example, this pipeline is not tutor-specific.

It can be used to design, test, and validate any constrained or safety-critical LLM behaviour, including:

Corporate compliance assistants
Mental-health or wellbeing chatbots with strict safety rules
Customer-support bots that must avoid legal or medical advice
Study coaches that guide without solving
Policy-restricted or refusal-heavy assistants
Comparative testing of different system prompt designs

If you need confidence that a chatbot consistently follows rules, this pipeline is applicable.

What This Repo Is (and Isn’t)

✅ This repo does

Shape chatbot behaviour via system prompts
Stress-test behaviour using CSV-based batch inputs
Record refusals, errors, token usage, and cost
Automatically evaluate behaviour using LLM-as-a-judge
Support iterative refinement and validation

❌ This repo does not

Provide a chatbot UI or frontend
Replace your LMS or chatbot platform
Train or fine-tune models
Generate direct answers to graded questions (for tutor use cases)

Think of this as a behaviour design, testing, and validation layer — not the chatbot itself.

📚 Documentation (Start Here)

All detailed documentation lives in the Wiki.

👉 🏠 Home – Overview & Concepts
(Recommended starting point)

Quick Start (Minimal)

python3 llm_batch_processor.py \
  --input prompts.csv \
  --output responses.csv \
  --system system_prompt.txt

Full setup and explanation: 🚀 Getting Started

Who Is This For?

Educators and course coordinators
Academic researchers studying AI behaviour
Educational technologists
Safety and alignment researchers
Anyone designing rule-bound or refusal-aware LLM systems

No programming background required.

##Licence MIT Licence - free to use, modify, and distribute

##Final Note This project exists to make responsible, constrained, and verifiable AI behaviour practical — not theoretical.

Whether you’re building an AI tutor or any other carefully bounded chatbot, this pipeline gives you control, visibility, and confidence.

👉 Start here: 🏠 Home – Overview & Concepts

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Sample Evaluations		Sample Evaluations
5.2_prompt2_evaluation.csv		5.2_prompt2_evaluation.csv
Education LLM Assessment Criteria.docx		Education LLM Assessment Criteria.docx
README.md		README.md
evaluated_responses_gpt5.csv		evaluated_responses_gpt5.csv
llm_batch_processor.py		llm_batch_processor.py
llm_evaluator.py		llm_evaluator.py
prompts.csv		prompts.csv
responses.csv		responses.csv
responses_5.2_prompt2.csv		responses_5.2_prompt2.csv
rubric.txt		rubric.txt
system_prompt.txt		system_prompt.txt
system_prompt2.txt		system_prompt2.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Behaviour Design & Validation Pipeline

What Is This?

Primary Validated Use Case: AI Tutors (Academic Integrity)

Broader Applications (Not Limited to Tutoring)

What This Repo Is (and Isn’t)

✅ This repo does

❌ This repo does not

📚 Documentation (Start Here)

Recommended Reading Order

Quick Start (Minimal)

Who Is This For?

##Licence MIT Licence - free to use, modify, and distribute

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Behaviour Design & Validation Pipeline

What Is This?

Primary Validated Use Case: AI Tutors (Academic Integrity)

Broader Applications (Not Limited to Tutoring)

What This Repo Is (and Isn’t)

✅ This repo does

❌ This repo does not

📚 Documentation (Start Here)

Recommended Reading Order

Quick Start (Minimal)

Who Is This For?

##Licence MIT Licence - free to use, modify, and distribute

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages