-
Notifications
You must be signed in to change notification settings - Fork 0
feat/mcq-generation #11
Description
feat/mcq-generation
Implement the MCQ question generation service. For each slide in a module, this service constructs a generation prompt combining the slide's enriched markdown with the module's style guide fingerprint, sends it to Claude Sonnet 4.6, and parses the structured JSON response into Question and QuestionOption records in the database. Both standard mode (single correct answer, 4 options) and hard mode (multiple correct answers, 4–6 options) questions are generated. Generation is submitted via the Anthropic Batch API for cost efficiency.
Context
Refer to the project context document for the full tech stack, architecture, and design principles. This issue depends on feat/enriched-markdown-store (slides must have enriched_markdown) and feat/past-exam-ingestion (module must have a style_guide) being completed. The batch_jobs table introduced in feat/image-description is reused here with job_type == mcq_generation.
Key design decisions:
- Generation is per slide — one batch request item per slide. This is the natural chunk boundary and ensures questions are correctly attributed to their source slide and lecture.
- The style guide JSON from
feat/past-exam-ingestionis injected as a prefix in every generation prompt — it is not re-extracted per call. - The LLM must return only a JSON array of question objects — no preamble, no markdown fences. The prompt enforces this strictly.
- Each slide generates 3–5 questions by default (configurable): a mix of standard and hard mode questions. The exact split is determined by the LLM based on slide content richness.
- Questions are only generated once per slide. Regeneration is supported but requires explicit user action (covered in
feat/regeneration, Milestone 6). - Generated questions must be validated before storage — malformed responses are retried once, then marked as failed.
Todos
Backend
-
Add to
core/config.py:MCQ_GENERATION_MODEL(default:"claude-sonnet-4-6")MCQ_GENERATION_MAX_TOKENS(default:2000)MCQ_QUESTIONS_PER_SLIDE(default:4)
-
Create
/backend/app/services/mcq_generation_service.pywith:build_generation_prompt(slide: Slide, style_guide: StyleGuide, mode: str) -> str- Constructs the full prompt with three sections:
- Style guide prefix — pastes the style guide's
style_summaryJSON and instructs the model to match this style - Slide content — pastes the slide's
enriched_markdown - Generation instructions — instructs the model to generate exactly
MCQ_QUESTIONS_PER_SLIDEquestions
- Style guide prefix — pastes the style guide's
- The generation instructions must specify:
- Return only a JSON array — no text before or after, no markdown code fences
- Each question object must follow this exact schema:
{ "question_text": "string", "question_type": "single" | "multi", "options": ["string", "string", "string", "string"], "correct_indices": [0], "difficulty": "easy" | "medium" | "hard", "topic_tags": ["string"] } - For standard mode (
mode == "standard"):question_typemust be"single"optionsmust have exactly 4 elementscorrect_indicesmust have exactly 1 element
- For hard mode (
mode == "hard"):- Mix
"single"and"multi"types — at least 40% should be"multi" optionsarray length must vary between 4 and 6 per question — do not make all questions the same length"multi"questions must have between 2 andn_options - 1correct answers- Never reveal the number of correct answers in the
question_text - Distractors must be plausible and closely related to correct answers — not obviously wrong
- Mix
- For both modes:
- Questions must be based only on the provided slide content
- Difficulty should reflect the
difficulty_distributionfrom the style guide topic_tagsshould be 1–3 short lowercase strings (e.g.["sorting", "time-complexity"])- Do not repeat the same question concept across questions for the same slide
validate_question_json(raw_json: str) -> list[dict]- Parses the JSON string — raises
ValidationErrorif not valid JSON - Validates each question object has all required fields with correct types
- Validates
correct_indicesare all valid indices intooptions - Validates
question_type == "single"has exactly 1 correct index - Validates
question_type == "multi"has 2 or more correct indices - Validates
n_options(len ofoptions) is 4, 5, or 6 - Returns the validated list of question dicts
store_questions(slide_id: UUID, questions: list[dict], db: AsyncSession) -> list[Question]- For each validated question dict:
- Creates a
Questionrecord with all fields populated - Sets
n_options = len(options) - Creates one
QuestionOptionrecord per option with correctindexvalue
- Creates a
- Returns the list of created
Questionrecords
build_batch_requests(module_id: UUID, mode: str, db: AsyncSession) -> list[dict]- Fetches all slides for the module where
enriched_markdownis not null - Fetches the module's
StyleGuide - For each slide, calls
build_generation_prompt()and constructs an Anthropic batch request dict withcustom_id = str(slide.id) - Returns the full list of batch request dicts ready for submission
submit_generation_batch(module_id: UUID, mode: str, db: AsyncSession) -> str- Calls
build_batch_requests()to get all request dicts - Submits via
client.messages.batches.create(requests=[...]) - Creates a
BatchJobrecord withjob_type == mcq_generation, storesmodein ametadataJSONB field on thebatch_jobstable - Returns the Anthropic batch ID
process_generation_results(batch_job_id: UUID, db: AsyncSession) -> GenerationResult- Retrieves batch results from Anthropic
- For each result:
- Extracts
slide_idfromcustom_id - Calls
validate_question_json()on the response text - If valid: calls
store_questions()— sets slide's question generation status tocomplete - If invalid: attempts one retry by calling the API synchronously with an explicit correction instruction
- If retry also fails: logs the raw response, marks slide generation status as
failed
- Extracts
- Updates
BatchJobtocomplete - Returns
GenerationResultdataclass:{total_slides, succeeded, failed, total_questions_created}
- Constructs the full prompt with three sections:
-
Add
generation_statusandgeneration_modefields to theslidestable (new Alembic migration):generation_status (enum: not_started / processing / complete / failed, default: not_started) generation_mode (enum: standard / hard / null, nullable) -
Add
metadata(JSONB, nullable) to thebatch_jobstable (new Alembic migration) to store arbitrary job context (e.g. generation mode) -
Create API endpoints in
/backend/app/api/routes/questions.py(all protected):POST /api/modules/{module_id}/generate-questions— accepts{"mode": "standard" | "hard"}body, callssubmit_generation_batch()as a background task, returns{"batch_job_id": "...", "total_slides": n}GET /api/modules/{module_id}/questions— returns all questions for the module, supports query params:?lecture_id=,?difficulty=,?question_type=,?limit=,?offset=GET /api/modules/{module_id}/generation-status— returns per-slide generation status summary and overall progressGET /api/questions/{question_id}— returns a single question with all options
-
Extend the background polling task from
feat/image-descriptionto also pollmcq_generationbatch jobs and callprocess_generation_results()when complete
Frontend
- On the module detail page, add a "Generate Questions" section that:
- Is enabled only when
module.processing_status == complete(all slides enriched) and a style guide exists - Shows a mode selector: "Standard" / "Hard Mode" toggle
- Shows a "Generate" button that calls
POST /api/modules/{module_id}/generate-questions - Switches to a progress display after clicking:
X / Y slides processedby pollingGET /api/modules/{module_id}/generation-statusevery 10 seconds - Shows a completion summary:
N questions generated across L lectures
- Is enabled only when
- Create a
QuestionCardcomponent (/frontend/components/questions/QuestionCard.tsx) that:- Displays the question text
- Renders options as radio buttons (standard) or checkboxes (hard mode multi)
- Shows a difficulty badge (easy / medium / hard) with colour coding (green / yellow / red)
- Shows topic tags as small pill labels
- Is used in both the question browser and the study session (Milestone 5)
- Create a question browser page at
/modules/[id]/questionsthat:- Lists all generated questions for the module
- Supports filtering by lecture, difficulty, and question type
- Renders each question using
QuestionCardin read-only/preview mode (answers not revealed)
Acceptance Criteria
-
POST /api/modules/{module_id}/generate-questionswithmode: "standard"creates questions where every question hasquestion_type == "single", exactly 4 options, and exactly 1 correct index -
POST /api/modules/{module_id}/generate-questionswithmode: "hard"creates a mix including"multi"questions with 4–6 options and 2+ correct indices - Hard mode questions never reveal the number of correct answers in the question text (spot-check 10 questions)
- Every generated question has
correct_indicesthat are valid indices into itsoptionsarray - Every generated question is correctly attributed to its source
slide_id,lecture_id, andmodule_id -
topic_tagsanddifficultyare populated for every question - A slide with generation failure does not block other slides from completing — failed slides are marked individually
-
GET /api/modules/{module_id}/questions?lecture_id=Xreturns only questions from that lecture -
GET /api/modules/{module_id}/generation-statuscorrectly reflects per-slide progress - The question browser page renders questions correctly with difficulty badges and topic tags
- Calling
POST /api/modules/{module_id}/generate-questionswhen no style guide exists returns HTTP 400 - Calling
POST /api/modules/{module_id}/generate-questionswhen not all slides are enriched returns HTTP 400 with a count of how many slides are not yet ready