Skip to content

feat/mcq-generation #11

@alretum

Description

@alretum

feat/mcq-generation

Implement the MCQ question generation service. For each slide in a module, this service constructs a generation prompt combining the slide's enriched markdown with the module's style guide fingerprint, sends it to Claude Sonnet 4.6, and parses the structured JSON response into Question and QuestionOption records in the database. Both standard mode (single correct answer, 4 options) and hard mode (multiple correct answers, 4–6 options) questions are generated. Generation is submitted via the Anthropic Batch API for cost efficiency.

Context

Refer to the project context document for the full tech stack, architecture, and design principles. This issue depends on feat/enriched-markdown-store (slides must have enriched_markdown) and feat/past-exam-ingestion (module must have a style_guide) being completed. The batch_jobs table introduced in feat/image-description is reused here with job_type == mcq_generation.

Key design decisions:

  • Generation is per slide — one batch request item per slide. This is the natural chunk boundary and ensures questions are correctly attributed to their source slide and lecture.
  • The style guide JSON from feat/past-exam-ingestion is injected as a prefix in every generation prompt — it is not re-extracted per call.
  • The LLM must return only a JSON array of question objects — no preamble, no markdown fences. The prompt enforces this strictly.
  • Each slide generates 3–5 questions by default (configurable): a mix of standard and hard mode questions. The exact split is determined by the LLM based on slide content richness.
  • Questions are only generated once per slide. Regeneration is supported but requires explicit user action (covered in feat/regeneration, Milestone 6).
  • Generated questions must be validated before storage — malformed responses are retried once, then marked as failed.

Todos

Backend

  • Add to core/config.py:

    • MCQ_GENERATION_MODEL (default: "claude-sonnet-4-6")
    • MCQ_GENERATION_MAX_TOKENS (default: 2000)
    • MCQ_QUESTIONS_PER_SLIDE (default: 4)
  • Create /backend/app/services/mcq_generation_service.py with:

    build_generation_prompt(slide: Slide, style_guide: StyleGuide, mode: str) -> str

    • Constructs the full prompt with three sections:
      1. Style guide prefix — pastes the style guide's style_summary JSON and instructs the model to match this style
      2. Slide content — pastes the slide's enriched_markdown
      3. Generation instructions — instructs the model to generate exactly MCQ_QUESTIONS_PER_SLIDE questions
    • The generation instructions must specify:
      • Return only a JSON array — no text before or after, no markdown code fences
      • Each question object must follow this exact schema:
        {
          "question_text": "string",
          "question_type": "single" | "multi",
          "options": ["string", "string", "string", "string"],
          "correct_indices": [0],
          "difficulty": "easy" | "medium" | "hard",
          "topic_tags": ["string"]
        }
      • For standard mode (mode == "standard"):
        • question_type must be "single"
        • options must have exactly 4 elements
        • correct_indices must have exactly 1 element
      • For hard mode (mode == "hard"):
        • Mix "single" and "multi" types — at least 40% should be "multi"
        • options array length must vary between 4 and 6 per question — do not make all questions the same length
        • "multi" questions must have between 2 and n_options - 1 correct answers
        • Never reveal the number of correct answers in the question_text
        • Distractors must be plausible and closely related to correct answers — not obviously wrong
      • For both modes:
        • Questions must be based only on the provided slide content
        • Difficulty should reflect the difficulty_distribution from the style guide
        • topic_tags should be 1–3 short lowercase strings (e.g. ["sorting", "time-complexity"])
        • Do not repeat the same question concept across questions for the same slide

    validate_question_json(raw_json: str) -> list[dict]

    • Parses the JSON string — raises ValidationError if not valid JSON
    • Validates each question object has all required fields with correct types
    • Validates correct_indices are all valid indices into options
    • Validates question_type == "single" has exactly 1 correct index
    • Validates question_type == "multi" has 2 or more correct indices
    • Validates n_options (len of options) is 4, 5, or 6
    • Returns the validated list of question dicts

    store_questions(slide_id: UUID, questions: list[dict], db: AsyncSession) -> list[Question]

    • For each validated question dict:
      • Creates a Question record with all fields populated
      • Sets n_options = len(options)
      • Creates one QuestionOption record per option with correct index value
    • Returns the list of created Question records

    build_batch_requests(module_id: UUID, mode: str, db: AsyncSession) -> list[dict]

    • Fetches all slides for the module where enriched_markdown is not null
    • Fetches the module's StyleGuide
    • For each slide, calls build_generation_prompt() and constructs an Anthropic batch request dict with custom_id = str(slide.id)
    • Returns the full list of batch request dicts ready for submission

    submit_generation_batch(module_id: UUID, mode: str, db: AsyncSession) -> str

    • Calls build_batch_requests() to get all request dicts
    • Submits via client.messages.batches.create(requests=[...])
    • Creates a BatchJob record with job_type == mcq_generation, stores mode in a metadata JSONB field on the batch_jobs table
    • Returns the Anthropic batch ID

    process_generation_results(batch_job_id: UUID, db: AsyncSession) -> GenerationResult

    • Retrieves batch results from Anthropic
    • For each result:
      • Extracts slide_id from custom_id
      • Calls validate_question_json() on the response text
      • If valid: calls store_questions() — sets slide's question generation status to complete
      • If invalid: attempts one retry by calling the API synchronously with an explicit correction instruction
      • If retry also fails: logs the raw response, marks slide generation status as failed
    • Updates BatchJob to complete
    • Returns GenerationResult dataclass: {total_slides, succeeded, failed, total_questions_created}
  • Add generation_status and generation_mode fields to the slides table (new Alembic migration):

    generation_status (enum: not_started / processing / complete / failed, default: not_started)
    generation_mode (enum: standard / hard / null, nullable)
    
  • Add metadata (JSONB, nullable) to the batch_jobs table (new Alembic migration) to store arbitrary job context (e.g. generation mode)

  • Create API endpoints in /backend/app/api/routes/questions.py (all protected):

    • POST /api/modules/{module_id}/generate-questions — accepts {"mode": "standard" | "hard"} body, calls submit_generation_batch() as a background task, returns {"batch_job_id": "...", "total_slides": n}
    • GET /api/modules/{module_id}/questions — returns all questions for the module, supports query params: ?lecture_id=, ?difficulty=, ?question_type=, ?limit=, ?offset=
    • GET /api/modules/{module_id}/generation-status — returns per-slide generation status summary and overall progress
    • GET /api/questions/{question_id} — returns a single question with all options
  • Extend the background polling task from feat/image-description to also poll mcq_generation batch jobs and call process_generation_results() when complete

Frontend

  • On the module detail page, add a "Generate Questions" section that:
    • Is enabled only when module.processing_status == complete (all slides enriched) and a style guide exists
    • Shows a mode selector: "Standard" / "Hard Mode" toggle
    • Shows a "Generate" button that calls POST /api/modules/{module_id}/generate-questions
    • Switches to a progress display after clicking: X / Y slides processed by polling GET /api/modules/{module_id}/generation-status every 10 seconds
    • Shows a completion summary: N questions generated across L lectures
  • Create a QuestionCard component (/frontend/components/questions/QuestionCard.tsx) that:
    • Displays the question text
    • Renders options as radio buttons (standard) or checkboxes (hard mode multi)
    • Shows a difficulty badge (easy / medium / hard) with colour coding (green / yellow / red)
    • Shows topic tags as small pill labels
    • Is used in both the question browser and the study session (Milestone 5)
  • Create a question browser page at /modules/[id]/questions that:
    • Lists all generated questions for the module
    • Supports filtering by lecture, difficulty, and question type
    • Renders each question using QuestionCard in read-only/preview mode (answers not revealed)

Acceptance Criteria

  • POST /api/modules/{module_id}/generate-questions with mode: "standard" creates questions where every question has question_type == "single", exactly 4 options, and exactly 1 correct index
  • POST /api/modules/{module_id}/generate-questions with mode: "hard" creates a mix including "multi" questions with 4–6 options and 2+ correct indices
  • Hard mode questions never reveal the number of correct answers in the question text (spot-check 10 questions)
  • Every generated question has correct_indices that are valid indices into its options array
  • Every generated question is correctly attributed to its source slide_id, lecture_id, and module_id
  • topic_tags and difficulty are populated for every question
  • A slide with generation failure does not block other slides from completing — failed slides are marked individually
  • GET /api/modules/{module_id}/questions?lecture_id=X returns only questions from that lecture
  • GET /api/modules/{module_id}/generation-status correctly reflects per-slide progress
  • The question browser page renders questions correctly with difficulty badges and topic tags
  • Calling POST /api/modules/{module_id}/generate-questions when no style guide exists returns HTTP 400
  • Calling POST /api/modules/{module_id}/generate-questions when not all slides are enriched returns HTTP 400 with a count of how many slides are not yet ready

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions