Skip to content

Iris: Add MCQ carousel, response persistence, and content-aware quiz generation#462

Open
alretum wants to merge 35 commits intomainfrom
feature/iris/quiz-questions-v2
Open

Iris: Add MCQ carousel, response persistence, and content-aware quiz generation#462
alretum wants to merge 35 commits intomainfrom
feature/iris/quiz-questions-v2

Conversation

@alretum
Copy link
Copy Markdown
Contributor

@alretum alretum commented Mar 23, 2026

Summary

This PR is the continuation of the server-side implementation that enables students to request AI-generated multiple choice quiz questions from Iris. It is now possible to generate multiple questions at once, the questions are based on lecture content and citations are included.

Note: This PR works best together with the corresponding Artemis changes from the branch with the same name (feature/iris/quiz-questions) for MCQ generation to work end-to-end. That branch adds the MCQ generation UI to the Artemis, as well as communication between the Artemis and Iris servers.

Motivation and Context

Previously, Iris currently responded with plain text messages. After adding single MCQ generation, to support active learning and self-assessment, students should be able to request multiple quiz-style questions directly within the chat. This enables Iris to serve as a more interactive tutoring tool by generating multiple-choice questions with instant feedback.

Description

Architecture: Tool-based MCQ generation with dedicated subpipeline

  • Introduced a dedicated McqGenerationPipeline subpipeline (mcq_generation_pipeline.py) that handles MCQ generation with its own prompt template and a fast model (gpt-4.1-nano).
  • Added an MCQ generation tool (tools/mcq_generation.py) that the agent can invoke, registered in both the course chat and lecture chat pipelines.
  • Replaced the previous inline prompt-block approach (v1 embedded MCQ instructions in system prompts) with a cleaner tool-usage pattern — the agent calls the MCQ tool, which delegates to the subpipeline.

Multi-question generation with parallel workers

  • Added intent detection (_detect_mcq_intent) that parses the user's message for question count (e.g., "5 questions", "quiz me") before tool selection.
  • For multiple questions: extracts N distinct subtopics via a fast LLM call, then spawns N parallel worker threads (each with its own pipeline instance and contextvars context) to generate questions concurrently.
  • Includes sequential fallback with deduplication if subtopic extraction fails, and retry logic for any individual worker failures.
  • Results are collected via a Queue and bundled into an mcq-set JSON payload for the Artemis carousel UI.

Lecture content grounding and citations

  • MCQ questions are now grounded in actual lecture page content fetched directly from Weaviate (not summaries), preventing hallucinated questions about topics not in the course material.
  • Citations are added to MCQ explanations in the same [cite:L:...] format used by the regular citation pipeline, matching the LLM-generated source field to lecture unit metadata.
  • Citations are stripped from question text and options to keep them clean.

UX improvements

  • Added chatMessage field to StageDTO so in_progress() can send dynamic loading messages to the user (e.g., "Getting your quiz ready...", "Writing the question and answer options...").
  • Graceful fallback messages are shown to the user if MCQ generation fails.

Prompt engineering

  • Dedicated Jinja2 prompt template (mcq_generation_prompt.j2) with strict JSON output format, answer validation instructions ("mentally solve before outputting"), option-length uniformity rules, and source attribution requirements.
  • System prompts for both course and lecture chat updated with MCQ tool usage instructions (conditionally rendered based on whether parallel generation is active).

Testing

  • Added comprehensive tests: test_mcq_generation_pipeline.py (parallel generation, thread safety, queue delivery, subtopic extraction), test_mcq_generation_tool.py (tool creation and invocation), test_mcq_prompt_rendering.py (template rendering with various contexts).

Other changes included in the branch (merged from main):

  • Global lecture search endpoint and AI-powered search answer pipeline
  • Memiris: allow users to delete all memory data
  • Graceful fallback for Cohere rerank failures
  • CI/CD: bumped tj-actions/changed-files, actions/cache, actions/upload-artifact

Summary by CodeRabbit

  • New Features

    • Interactive MCQ generation: single & multi-question modes, parallel/background generation, language-aware prompts, integrated quiz widget, and automatic citation attachment.
    • System prompts updated to drive tool-based MCQ creation and parallel-mode behavior.
    • Status updates can embed a chat message for richer progress feedback.
  • Bug Fixes

    • Reranking now gracefully falls back if the external rerank service fails.
  • Tests

    • Added end-to-end and unit tests for MCQ generation, tooling, prompts, and threaded execution.
  • Chores

    • Added project metadata and ignore entries.

alretum added 27 commits March 8, 2026 09:15
@alretum alretum self-assigned this Mar 23, 2026
@github-actions github-actions bot added the iris label Mar 23, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 23, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds MCQ generation: new McqGenerationPipeline, MCQ tool and templates, integration into course and lecture chat pipelines (parallel and tool modes), tests, rerank fallback handling, status callback/chat_message changes, a pipeline enum entry, ignore entries, and a pyproject.toml.

Changes

Cohort / File(s) Summary
Config & Metadata
iris/.claudeignore, pyproject.toml
Added ignore entries for local/example YAMLs and added pyproject.toml with project metadata and Poetry build-system.
Pipeline Enum
iris/src/iris/common/pipeline_enum.py
Added IRIS_MCQ_GENERATION_PIPELINE enum member.
MCQ Pipeline & Template
iris/src/iris/pipeline/shared/mcq_generation_pipeline.py, iris/src/iris/pipeline/prompts/templates/mcq_generation_prompt.j2
New McqGenerationPipeline with prompt building, JSON validation/repair, token tracking, threaded/parallel generation (subtopic extraction, worker threads), queueing, and a Jinja2 MCQ generation template.
Chat Pipeline Integration
iris/src/iris/pipeline/chat/course_chat_pipeline.py, iris/src/iris/pipeline/chat/lecture_chat_pipeline.py
Wired MCQ intent detection, conditional tool vs parallel-threaded generation, lecture content retrieval for MCQ context, thread lifecycle handling, citation injection, and merging MCQ outputs into conversation state.
System Prompt Templates
iris/src/iris/pipeline/prompts/templates/course_chat_system_prompt.j2, iris/src/iris/pipeline/prompts/templates/lecture_chat_system_prompt.j2
Reworked MCQ instructions to branch on mcq_parallel: parallel mode emits only short intro (auto widget); non-parallel mode instructs use of generate_mcq_questions tool and forbids authored MCQ text.
Tool Implementation & Exports
iris/src/iris/tools/mcq_generation.py, iris/src/iris/tools/__init__.py
Added create_tool_generate_mcq_questions factory that runs the MCQ pipeline, stores JSON in shared storage, returns [MCQ_RESULT], and exported the tool.
Rerank Handler
iris/src/iris/llm/request_handler/rerank_request_handler.py
Added module logger, _rerank_available flag, and try/except around external rerank call to log failures, disable reranking on error, and fallback to returning top-N documents.
Status DTO & Callback
iris/src/iris/domain/status/stage_dto.py, iris/src/iris/web/status/status_update.py
Added chat_message field and model_config = ConfigDict(populate_by_name=True) to StageDTO; changed on_status_update() to return bool, extended in_progress() to accept chat_message, and made done() clear transient fields only when POST succeeds.
Tests
iris/tests/test_mcq_generation_pipeline.py, iris/tests/test_mcq_generation_tool.py, iris/tests/test_mcq_prompt_rendering.py
Added tests covering pipeline parsing/validation/repair, threaded queue behavior and error handling, tool factory behavior, and prompt rendering for parallel vs non-parallel modes.

Sequence Diagram

sequenceDiagram
    participant Client
    participant ChatPipeline
    participant MCQ_Pipeline
    participant LLM
    participant Weaviate
    participant Queue
    participant Thread

    alt mcq_parallel = true
        Client->>ChatPipeline: start chat
        ChatPipeline->>ChatPipeline: detect MCQ intent (set mcq_parallel)
        ChatPipeline->>Thread: pre_agent_hook() spawn MCQ thread
        Thread->>Weaviate: fetch lecture content & metadata
        Weaviate-->>Thread: content + metadata
        Thread->>MCQ_Pipeline: run_in_thread(command, lecture_content)
        MCQ_Pipeline->>LLM: generate MCQ JSON
        LLM-->>MCQ_Pipeline: MCQ response
        MCQ_Pipeline->>MCQ_Pipeline: validate/repair JSON
        MCQ_Pipeline->>Queue: enqueue result
        ChatPipeline->>ChatPipeline: post_agent_hook() join thread, collect queue
        ChatPipeline->>ChatPipeline: add citations & merge MCQ into result
        ChatPipeline-->>Client: final response
    else mcq_parallel = false
        Client->>ChatPipeline: start chat
        ChatPipeline->>ChatPipeline: detect MCQ intent (set mcq_parallel=false)
        ChatPipeline->>ChatPipeline: add generate_mcq_questions tool
        Client->>ChatPipeline: agent invokes tool
        ChatPipeline->>MCQ_Pipeline: __call__(command)
        MCQ_Pipeline->>LLM: generate MCQ JSON
        LLM-->>MCQ_Pipeline: MCQ response
        MCQ_Pipeline->>MCQ_Pipeline: validate/repair JSON
        MCQ_Pipeline-->>ChatPipeline: return MCQ JSON (stored in shared storage)
        ChatPipeline->>ChatPipeline: replace [MCQ_RESULT], add citations
        ChatPipeline-->>Client: final response
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

"I hopped through prompts, templates, and thread,
I fetched some facts, then counted ahead.
Four options, one true — I tidied the rest,
Queued up the questions and signed them with zest.
🐇✨"

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.97% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately describes the primary changes: MCQ carousel integration, response persistence (chatMessage in StageDTO), and content-aware quiz generation via the new McqGenerationPipeline.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/iris/quiz-questions-v2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@alretum alretum changed the title Feature/iris/quiz questions v2 Iris: Add MCQ carousel, response persistence, and content-aware quiz generation Mar 23, 2026
@alretum alretum marked this pull request as ready for review March 23, 2026 19:17
@alretum alretum requested review from a team as code owners March 23, 2026 19:17
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (6)
iris/src/iris/llm/request_handler/rerank_request_handler.py (1)

90-98: Consider narrowing the exception type.

Catching bare Exception can mask programming errors (e.g., TypeError, AttributeError from bugs in result indexing on line 87-88). Consider catching more specific exceptions from the Cohere client (e.g., network/API errors) while letting programming errors propagate.

♻️ Suggested refinement
-        except Exception as e:
+        except (cohere.CohereAPIError, cohere.CohereConnectionError, TimeoutError) as e:

If the Cohere client's specific exception types aren't accessible, at minimum exclude TypeError, AttributeError, KeyError to catch bugs in the result-handling logic on lines 87-88.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@iris/src/iris/llm/request_handler/rerank_request_handler.py` around lines 90
- 98, The current broad except Exception in RerankRequestHandler around the
Cohere call masks programming errors; narrow it to catch the
Cohere/client-specific exceptions (or network/API errors) instead and let bugs
propagate; if those exception classes aren't available, change the handler to
except Exception as e: but re-raise on TypeError, AttributeError, KeyError
(i.e., check the exception type and reraise those) so that logger.warning,
setting RerankRequestHandler._rerank_available = False, and returning
valid_documents[:top_n] only happen for expected client/network failures while
index/result-handling bugs in the rerank logic surface normally.
iris/src/iris/tools/mcq_generation.py (1)

48-56: Consider error handling for pipeline failures.

If mcq_pipeline(...) raises an exception, it propagates uncaught. Depending on how the agent handles tool errors, you may want to catch exceptions and return a user-friendly error message instead of the placeholder.

♻️ Optional: wrap in try/except for graceful degradation
+    from ..common.logging_config import get_logger
+    logger = get_logger(__name__)
+
     def generate_mcq_questions(command: str) -> str:
         ...
-        result_json = mcq_pipeline(
-            command=command,
-            chat_history=chat_history,
-            user_language=user_language,
-            callback=callback,
-            lecture_content=lecture_content,
-        )
-        mcq_result_storage["mcq_json"] = result_json
-        return "[MCQ_RESULT]"
+        try:
+            result_json = mcq_pipeline(
+                command=command,
+                chat_history=chat_history,
+                user_language=user_language,
+                callback=callback,
+                lecture_content=lecture_content,
+            )
+            mcq_result_storage["mcq_json"] = result_json
+            return "[MCQ_RESULT]"
+        except Exception as e:
+            logger.warning("MCQ generation failed: %s", str(e))
+            return "I wasn't able to generate a quiz question right now. Please try again."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@iris/src/iris/tools/mcq_generation.py` around lines 48 - 56, The call to
mcq_pipeline can raise and currently will propagate uncaught; wrap the
mcq_pipeline(...) invocation in a try/except that catches Exception, logs or
stores the exception details into mcq_result_storage (e.g., set a key like
"error" with the exception message/traceback), and return a user-friendly marker
or message (e.g., "[MCQ_ERROR]" or a short descriptive string) instead of
letting the exception bubble; keep the original behavior of setting
mcq_result_storage["mcq_json"] on success and reference the mcq_pipeline call
and mcq_result_storage keys so you modify the correct block.
iris/src/iris/pipeline/chat/lecture_chat_pipeline.py (2)

287-328: Consider extracting _detect_mcq_intent to a shared utility.

This method is identical to the one in course_chat_pipeline.py (lines 423-464). Consider extracting it to a shared module (e.g., pipeline/shared/utils.py) to eliminate duplication and ensure consistent behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@iris/src/iris/pipeline/chat/lecture_chat_pipeline.py` around lines 287 - 328,
The _detect_mcq_intent function in lecture_chat_pipeline.py is duplicated in
course_chat_pipeline.py; extract this function into a shared utility (e.g.,
pipeline.shared.utils) with the same signature def
_detect_mcq_intent(user_message: str) -> tuple[bool, int], move the logic
(count_patterns, mcq_keywords, regex search) into that module, then replace the
local definitions in both LectureChatPipeline and CourseChatPipeline with
imports from the new shared util and call the shared _detect_mcq_intent; ensure
both modules update their imports, preserve return semantics (True/False and
count defaulting to 1/0), and run existing pipeline tests to confirm no
behavioral change.

621-679: _add_mcq_citations is duplicated across pipelines.

This static method is identical to the one in course_chat_pipeline.py (lines 768-826). Extract to a shared module to maintain consistency and reduce duplication.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@iris/src/iris/pipeline/chat/lecture_chat_pipeline.py` around lines 621 - 679,
The _add_mcq_citations staticmethod is duplicated between lecture_chat_pipeline
and course_chat_pipeline; extract the logic into a single shared helper (e.g.,
create a function named add_mcq_citations in a new/shared module like
iris.pipeline.chat._mcq_utils or similar), move the current implementation there
(preserve signature: mcq_json_str, lecture_units_meta), replace both
staticmethods with imports that call the shared add_mcq_citations (or keep a
thin staticmethod wrapper that delegates), and remove the duplicated code from
lecture_chat_pipeline and course_chat_pipeline so both modules import and reuse
the single implementation.
iris/src/iris/pipeline/shared/mcq_generation_pipeline.py (2)

322-330: F-string with conditional expression is valid in Python 3.12+.

The static analysis tool (Flake8) reports a syntax error on line 324, but this is a false positive. Per retrieved learnings, Python 3.12+ (PEP 701) supports nested f-strings and complex expressions within f-string braces. The syntax f'{"from the lecture material above " if lecture_content else ""}' is valid.

However, consider simplifying for readability:

♻️ Optional: Simplify f-string for clarity
+        material_clause = "from the lecture material above " if lecture_content else ""
         prompt += (
             f"\nGenerate exactly {count} distinct subtopics or aspects "
-            f'{"from the lecture material above " if lecture_content else ""}'
+            f"{material_clause}"
             f"that would each make a good multiple-choice question. "
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@iris/src/iris/pipeline/shared/mcq_generation_pipeline.py` around lines 322 -
330, The f-string uses a nested expression that triggers linters; to fix,
replace the nested f-string expression with a simple conditional string or
variable and include it in the main prompt assembly: compute a small helper like
lecture_clause = "from the lecture material above " if lecture_content else ""
(or use inline ('from the lecture material above ' if lecture_content else '')),
then use f"...Generate exactly {count} distinct subtopics or aspects
{lecture_clause}that..." when building the prompt variable in
mcq_generation_pipeline.py (look for the prompt variable construction
referencing lecture_content and count).

276-289: Rename unused loop variable i to _.

The loop variable i is not used within the loop body. Convention is to use _ for intentionally unused variables.

♻️ Proposed fix
-        for i in range(missing):
+        for _ in range(missing):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@iris/src/iris/pipeline/shared/mcq_generation_pipeline.py` around lines 276 -
289, In the retry loop inside MCQ generation (the for loop that currently reads
"for i in range(missing):" where the loop variable i is not used), rename the
unused loop variable to "_" (i.e., "for _ in range(missing):") to follow the
unused-variable convention; update the loop header only (no other logic change)
and ensure no reference to "i" remains in the surrounding method (the block that
calls self(...), appends to successful_results, and logs errors).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@iris/src/iris/domain/status/stage_dto.py`:
- Line 16: StageDTO currently defines chat_message with alias="chatMessage"
which prevents instantiation by the Python name; add a model_config to StageDTO
to enable population by Python field names (i.e., set populate_by_name=True via
ConfigDict) so instances created with chat_message=... in
course_chat_pipeline.py, lecture_chat_pipeline.py, and
mcq_generation_pipeline.py validate correctly.

In `@iris/src/iris/llm/request_handler/rerank_request_handler.py`:
- Line 23: The class-level flag _rerank_available on RerankRequestHandler
disables reranking for all instances after one failure; change this to an
instance attribute (initialize self._rerank_available in
RerankRequestHandler.__init__) or implement a time-to-live/reset (store a
timestamp like self._rerank_backoff_until and check it before reranking) so
failures are localized or temporary; also replace the broad except Exception
around the call to cohere_client.rerank() with targeted exception handling
(catch the specific API/timeout/network exceptions you expect and only set the
instance-level backoff/disable for those), and add a periodic or TTL-based retry
path to re-enable reranking when appropriate.

In `@iris/src/iris/pipeline/chat/course_chat_pipeline.py`:
- Around line 249-264: The non-parallel MCQ tool block in
course_chat_pipeline.py fails to pass lecture context; call the existing
_retrieve_lecture_content_for_mcq(self, state) (same helper used for parallel
flow) to obtain lecture_content and then pass that lecture_content into
create_tool_generate_mcq_questions in the non-parallel branch (the same
position/argument where the parallel branch supplies it), ensuring you still use
getattr(state, "mcq_result_storage", {}) and user_language as before.

In `@iris/src/iris/pipeline/chat/lecture_chat_pipeline.py`:
- Around line 546-548: The MCQ thread join in lecture_chat_pipeline.py currently
uses timeout=120 which is inconsistent with course_chat_pipeline.py's
timeout=180; update the mcq_thread.join call in the lecture chat flow (look for
mcq_thread.join(...) and getattr(state, "mcq_result_storage", {})) to use a
shared constant (e.g., MCQ_THREAD_TIMEOUT) or match the 180s value used in
course_chat_pipeline so both pipelines behave the same; define the constant at
module scope (or import it from the module where it's defined) and reference it
in the join call instead of the hardcoded 120.

In `@iris/tests/test_mcq_generation_tool.py`:
- Around line 31-46: The test helper _create_tool is missing the lecture_content
parameter compared to the production factory create_tool_generate_mcq_questions,
so update _create_tool to accept a lecture_content argument and pass it through
to mcq_pipeline (i.e., include lecture_content=lecture_content in the
mcq_pipeline call inside generate_mcq_questions) so the test
test_tool_calls_pipeline_with_correct_args verifies the same argument flow as
production; keep storing result in mcq_result_storage["mcq_json"] and return
"[MCQ_RESULT]" as before.
- Around line 85-96: Update the test test_tool_calls_pipeline_with_correct_args
to assert that lecture_content is passed to the pipeline: modify the call to
mock_pipeline.assert_called_once_with(...) to include
lecture_content=<expected_value> and adjust the _setup_tool helper to accept and
forward a lecture_content argument (so tool, mock_pipeline, mock_callback,
chat_history, lecture_content = _setup_tool() or have _setup_tool return the
lecture_content) ensuring the tool(command) invocation uses that
lecture_content; reference the test function
test_tool_calls_pipeline_with_correct_args, the helper _setup_tool, and
mock_pipeline when making the change.

In `@pyproject.toml`:
- Line 9: The project metadata's requires-python value uses Poetry's caret
syntax ("^3.12") which is invalid for PEP 621/PEP 440; update the
requires-python field to a PEP 440 range such as ">=3.12,<4" so tools that
validate PEP 440 (pip/setuptools/build) accept it — locate the requires-python
entry in pyproject.toml and replace the caret specifier with a proper PEP 440
specifier.
- Around line 9-11: The root pyproject declares requires-python = "^3.12" which
conflicts with package constraints (logos and atlas require >=3.13 while nebula
caps at <3.13); unify Python constraints across the monorepo by choosing a
single supported Python range (e.g., requires-python = ">=3.12,<3.14" or bump
everything to ">=3.13,<4.0"), then update the following package pyproject.toml
files to match that chosen range: root (requires-python in the diff),
logos/pyproject.toml, nebula/pyproject.toml, atlas/AtlasMl/pyproject.toml and
any other packages (shared, iris, memiris, athena*, evaluation) that declare
incompatible ranges; ensure nebula's upper bound is relaxed or logos/atlas are
lowered consistently, run dependency tests/CI to confirm no runtime/install
conflicts, and commit the synchronized requires-python constraint.

---

Nitpick comments:
In `@iris/src/iris/llm/request_handler/rerank_request_handler.py`:
- Around line 90-98: The current broad except Exception in RerankRequestHandler
around the Cohere call masks programming errors; narrow it to catch the
Cohere/client-specific exceptions (or network/API errors) instead and let bugs
propagate; if those exception classes aren't available, change the handler to
except Exception as e: but re-raise on TypeError, AttributeError, KeyError
(i.e., check the exception type and reraise those) so that logger.warning,
setting RerankRequestHandler._rerank_available = False, and returning
valid_documents[:top_n] only happen for expected client/network failures while
index/result-handling bugs in the rerank logic surface normally.

In `@iris/src/iris/pipeline/chat/lecture_chat_pipeline.py`:
- Around line 287-328: The _detect_mcq_intent function in
lecture_chat_pipeline.py is duplicated in course_chat_pipeline.py; extract this
function into a shared utility (e.g., pipeline.shared.utils) with the same
signature def _detect_mcq_intent(user_message: str) -> tuple[bool, int], move
the logic (count_patterns, mcq_keywords, regex search) into that module, then
replace the local definitions in both LectureChatPipeline and CourseChatPipeline
with imports from the new shared util and call the shared _detect_mcq_intent;
ensure both modules update their imports, preserve return semantics (True/False
and count defaulting to 1/0), and run existing pipeline tests to confirm no
behavioral change.
- Around line 621-679: The _add_mcq_citations staticmethod is duplicated between
lecture_chat_pipeline and course_chat_pipeline; extract the logic into a single
shared helper (e.g., create a function named add_mcq_citations in a new/shared
module like iris.pipeline.chat._mcq_utils or similar), move the current
implementation there (preserve signature: mcq_json_str, lecture_units_meta),
replace both staticmethods with imports that call the shared add_mcq_citations
(or keep a thin staticmethod wrapper that delegates), and remove the duplicated
code from lecture_chat_pipeline and course_chat_pipeline so both modules import
and reuse the single implementation.

In `@iris/src/iris/pipeline/shared/mcq_generation_pipeline.py`:
- Around line 322-330: The f-string uses a nested expression that triggers
linters; to fix, replace the nested f-string expression with a simple
conditional string or variable and include it in the main prompt assembly:
compute a small helper like lecture_clause = "from the lecture material above "
if lecture_content else "" (or use inline ('from the lecture material above ' if
lecture_content else '')), then use f"...Generate exactly {count} distinct
subtopics or aspects {lecture_clause}that..." when building the prompt variable
in mcq_generation_pipeline.py (look for the prompt variable construction
referencing lecture_content and count).
- Around line 276-289: In the retry loop inside MCQ generation (the for loop
that currently reads "for i in range(missing):" where the loop variable i is not
used), rename the unused loop variable to "_" (i.e., "for _ in range(missing):")
to follow the unused-variable convention; update the loop header only (no other
logic change) and ensure no reference to "i" remains in the surrounding method
(the block that calls self(...), appends to successful_results, and logs
errors).

In `@iris/src/iris/tools/mcq_generation.py`:
- Around line 48-56: The call to mcq_pipeline can raise and currently will
propagate uncaught; wrap the mcq_pipeline(...) invocation in a try/except that
catches Exception, logs or stores the exception details into mcq_result_storage
(e.g., set a key like "error" with the exception message/traceback), and return
a user-friendly marker or message (e.g., "[MCQ_ERROR]" or a short descriptive
string) instead of letting the exception bubble; keep the original behavior of
setting mcq_result_storage["mcq_json"] on success and reference the mcq_pipeline
call and mcq_result_storage keys so you modify the correct block.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8641539c-ed5f-439d-99d9-1176b6b67a33

📥 Commits

Reviewing files that changed from the base of the PR and between 1b358e6 and 409e5ab.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (17)
  • iris/.claudeignore
  • iris/src/iris/common/pipeline_enum.py
  • iris/src/iris/domain/status/stage_dto.py
  • iris/src/iris/llm/request_handler/rerank_request_handler.py
  • iris/src/iris/pipeline/chat/course_chat_pipeline.py
  • iris/src/iris/pipeline/chat/lecture_chat_pipeline.py
  • iris/src/iris/pipeline/prompts/templates/course_chat_system_prompt.j2
  • iris/src/iris/pipeline/prompts/templates/lecture_chat_system_prompt.j2
  • iris/src/iris/pipeline/prompts/templates/mcq_generation_prompt.j2
  • iris/src/iris/pipeline/shared/mcq_generation_pipeline.py
  • iris/src/iris/tools/__init__.py
  • iris/src/iris/tools/mcq_generation.py
  • iris/src/iris/web/status/status_update.py
  • iris/tests/test_mcq_generation_pipeline.py
  • iris/tests/test_mcq_generation_tool.py
  • iris/tests/test_mcq_prompt_rendering.py
  • pyproject.toml

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
iris/src/iris/pipeline/chat/course_chat_pipeline.py (2)

425-466: Consider edge cases in intent detection.

The keyword-based detection could trigger false positives in certain contexts:

  • "quiz" matches questions like "When is the quiz due?" or "What is a quiz?"
  • "another 3" could match "explain another 3 concepts"

While the LLM can likely handle these gracefully, consider adding negative patterns or requiring multiple signals for more confident detection if this becomes problematic in practice.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@iris/src/iris/pipeline/chat/course_chat_pipeline.py` around lines 425 - 466,
The _detect_mcq_intent function is producing false positives from broad
keywords; tighten detection by adding negative patterns and stronger positive
signals: update the count_patterns and mcq_keywords checks to (1) prefer
explicit count patterns (keep current behavior), (2) require the word
"question(s)" or phrases like "quiz me", "test me", "ask me" for keyword matches
(avoid matching lone "quiz"), and (3) add negative pattern checks (e.g., words
like "due", "when", "what is", "explain", "definition") that, if present,
suppress MCQ intent even when a keyword matches; implement this logic in
_detect_mcq_intent so it first checks negatives, then explicit counts, then
confirms keywords only when accompanied by question-related terms or absent
negative patterns.

793-801: Substring matching for unit lookup may produce false matches.

The _find_matching_unit function uses bidirectional substring matching (source in name or name in source). This could produce incorrect matches for common or short names:

  • Unit "Introduction" would match source "Introduction to Advanced Topics"
  • Unit "Data" would match unrelated sources containing "Data"

Consider using more specific matching (e.g., word boundaries, similarity scoring) if citation accuracy becomes important. For now, this is acceptable as a best-effort approach.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@iris/src/iris/pipeline/chat/course_chat_pipeline.py` around lines 793 - 801,
The current _find_matching_unit uses bidirectional substring checks on
source_lower vs unit_name/lecture_name which yields false positives for
short/common tokens; update the matching to require word-boundary or token-based
matches instead of raw substring checks: for each meta in lecture_units_meta,
normalize and tokenize source_lower and unit_name/lecture_name and then check
for exact token overlap or apply a simple regex word-boundary match (e.g.,
whole-word match) or a lightweight similarity threshold (e.g., fuzzy ratio)
before returning meta; update the checks that currently reference source_lower,
unit_name, and lecture_name to use the new word-boundary/token-match logic to
reduce accidental matches.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@iris/src/iris/pipeline/chat/course_chat_pipeline.py`:
- Around line 682-690: The MCQ thread is left running after join(timeout=180)
which can leak resources; introduce a cancellation Event and use it to signal
the MCQ pipeline to stop: when creating the thread (where mcq_thread is spawned)
create a threading.Event (e.g., mcq_stop_event), pass it into the MCQ
worker/function so the worker periodically checks mcq_stop_event.is_set() and
exits cleanly (also stop putting to mcq_result_storage/queue if set). In this
block, after mcq_thread.join(timeout=180) returns and mcq_thread.is_alive() is
True, call mcq_stop_event.set(), then attempt a short join (e.g.,
join(timeout=5)) and if still alive log a stronger error; also store the event
on state (e.g., state.mcq_stop_event) alongside mcq_result_storage so the
timeout handler can access it.
- Around line 490-498: The fetch_objects call for chunks
(state.db.lectures.query.fetch_objects) relies on Weaviate's implicit default
limit of 10; make the intent explicit by either adding a limit parameter (e.g.,
limit=10 with a comment "intentional cap to 10 chunks" or limit=500 if you need
more chunks for MCQ quality) or, if you need all course chunks, replace
fetch_objects with the paginated iterator API
(state.db.lectures.query.iterator(...)) to retrieve all pages; update the call
site where chunks is assigned and keep the same return_properties
(LectureUnitPageChunkSchema.*) when changing to iterator.

---

Nitpick comments:
In `@iris/src/iris/pipeline/chat/course_chat_pipeline.py`:
- Around line 425-466: The _detect_mcq_intent function is producing false
positives from broad keywords; tighten detection by adding negative patterns and
stronger positive signals: update the count_patterns and mcq_keywords checks to
(1) prefer explicit count patterns (keep current behavior), (2) require the word
"question(s)" or phrases like "quiz me", "test me", "ask me" for keyword matches
(avoid matching lone "quiz"), and (3) add negative pattern checks (e.g., words
like "due", "when", "what is", "explain", "definition") that, if present,
suppress MCQ intent even when a keyword matches; implement this logic in
_detect_mcq_intent so it first checks negatives, then explicit counts, then
confirms keywords only when accompanied by question-related terms or absent
negative patterns.
- Around line 793-801: The current _find_matching_unit uses bidirectional
substring checks on source_lower vs unit_name/lecture_name which yields false
positives for short/common tokens; update the matching to require word-boundary
or token-based matches instead of raw substring checks: for each meta in
lecture_units_meta, normalize and tokenize source_lower and
unit_name/lecture_name and then check for exact token overlap or apply a simple
regex word-boundary match (e.g., whole-word match) or a lightweight similarity
threshold (e.g., fuzzy ratio) before returning meta; update the checks that
currently reference source_lower, unit_name, and lecture_name to use the new
word-boundary/token-match logic to reduce accidental matches.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4ba68e9f-d2ef-4cdf-a12a-f36b6227da86

📥 Commits

Reviewing files that changed from the base of the PR and between 409e5ab and 1a1a45f.

📒 Files selected for processing (2)
  • iris/src/iris/domain/status/stage_dto.py
  • iris/src/iris/pipeline/chat/course_chat_pipeline.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • iris/src/iris/domain/status/stage_dto.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@iris/src/iris/pipeline/chat/course_chat_pipeline.py`:
- Line 247: The fallback tool-calling path currently inserts mcq_json raw and
discards lecture_units_meta, so update the fallback in course_chat_pipeline to
persist lecture_units_meta on state (set state.lecture_units_meta =
lecture_units_meta returned by _retrieve_lecture_content_for_mcq) and call
_add_mcq_citations(mcq_json, state) before replacing the [MCQ_RESULT]
placeholder; do the same normalization flow you use in the parallel path (ensure
any cleaning of questions/options and enrichment of explanations with
[cite:L:...] happens via _add_mcq_citations) and apply the same fix to the other
fallback block around lines 615-626.
- Line 247: The code currently calls _retrieve_lecture_content_for_mcq during
get_tools(), causing expensive Weaviate fetches for every turn; instead, defer
that call until the MCQ tool is actually invoked. Move the lecture_content, _ =
self._retrieve_lecture_content_for_mcq(state) out of get_tools() and into the
MCQ tool's callback (the function used to implement generate_mcq_questions), or
implement lazy caching so the first time generate_mcq_questions runs it calls
_retrieve_lecture_content_for_mcq(state), stores the result on the instance
(e.g., self._mcq_lecture_content), and subsequent calls reuse the cache.
- Around line 395-425: The count detection logic in the block that defines
count_patterns and mcq_keywords is too permissive: update count_patterns so
numeric captures are only accepted when tied to explicit quiz terms (e.g.,
require question|mcq|quiz in the same pattern) instead of matching bare “give me
3”; clamp the derived mcq_count to a small safe range (e.g., 1..5) before
returning; only treat phrases like "another question" or "one more" as mcq
intent when the previous assistant turn was an MCQ (check whatever state you
have for the last assistant response, e.g., last_bot_was_mcq or similar) and
otherwise ignore them; and keep the fallback keyword check (mcq_keywords against
message_lower) but ensure generic verbs like "give me" or "generate" without
quiz-specific words do not trigger MCQ mode.
- Around line 2-10: Add an import for the json module at the top of
course_chat_pipeline.py so json.loads/json.dumps calls succeed; update the
import block near the existing imports (with
Environment/FileSystemLoader/select_autoescape and pytz) to include "import
json" so functions that call json.loads and json.dumps (used in MCQ
post-processing) will not raise NameError.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5844f611-1a10-43df-85ac-099adb9f0d4c

📥 Commits

Reviewing files that changed from the base of the PR and between 1a1a45f and c071ab9.

📒 Files selected for processing (2)
  • iris/src/iris/pipeline/chat/course_chat_pipeline.py
  • iris/src/iris/pipeline/prompts/templates/course_chat_system_prompt.j2

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
iris/src/iris/pipeline/chat/course_chat_pipeline.py (2)

248-260: ⚠️ Potential issue | 🟠 Major

Make the MCQ lecture lookup lazy.

get_tools() is now doing the full course-wide lecture fetch while merely registering generate_mcq_questions, so ordinary chat turns pay the Weaviate cost even when the agent never calls that tool.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@iris/src/iris/pipeline/chat/course_chat_pipeline.py` around lines 248 - 260,
get_tools() is eagerly calling _retrieve_lecture_content_for_mcq and paying the
Weaviate cost when merely registering the tool; change this to lazy lookup by
removing the immediate call in get_tools() and instead have
create_tool_generate_mcq_questions accept a lazy loader (e.g., a callable or
None) or fetch function and perform _retrieve_lecture_content_for_mcq inside the
tool's execution path; update create_tool_generate_mcq_questions and the
generated tool's run/handler to call
self._retrieve_lecture_content_for_mcq(state) (or the injected callable) only
when the tool is actually invoked, and keep setting mcq_lecture_units_meta on
state at that time.

399-428: ⚠️ Potential issue | 🔴 Critical

Tie numeric matches to quiz wording and cap the count.

The broad (?:another|more|give me|generate|create)\s+(\d+) branch still turns requests like give me 3 study tips into MCQ mode, and that raw count later drives parallel generation. That makes the reply easy to hijack and leaves worker fan-out user-controlled.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@iris/src/iris/pipeline/chat/course_chat_pipeline.py` around lines 399 - 428,
The numeric-capture regex in count_patterns (especially r"(?:another|more|give
me|generate|create)\s+(\d+)") is too permissive and lets counts from non-quiz
requests trigger MCQ mode and uncontrolled parallel generation; update logic so
numeric matches are only accepted when accompanied by quiz-related wording (tie
them to mcq_keywords) or by tightening the regex to require quiz words (e.g.,
require "(?:question|quiz|mcq|ask)" near the number), and add a hard cap (define
MAX_MCQ_COUNT, e.g., 10) and clamp the returned count to that max before
returning from the function (references: count_patterns, mcq_keywords, and the
code path that returns True, int(match.group(1))).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@iris/src/iris/pipeline/chat/course_chat_pipeline.py`:
- Around line 747-795: The code assumes json.loads(mcq_json_str) yields a dict
and calls parsed.get(...) and _process_question on it; if json.loads returns a
non-object (list, string, number) those calls will raise and abort processing.
After parsing, guard that parsed is a dict (e.g., if not isinstance(parsed,
dict): return mcq_json_str) before any parsed.get or _process_question usage so
non-object LLM outputs simply fall back to the original string; keep
_process_question and _find_matching_unit unchanged and still operate only on
dicts, and ensure the existing return json.dumps(parsed) only runs for valid
dicts you mutated.

---

Duplicate comments:
In `@iris/src/iris/pipeline/chat/course_chat_pipeline.py`:
- Around line 248-260: get_tools() is eagerly calling
_retrieve_lecture_content_for_mcq and paying the Weaviate cost when merely
registering the tool; change this to lazy lookup by removing the immediate call
in get_tools() and instead have create_tool_generate_mcq_questions accept a lazy
loader (e.g., a callable or None) or fetch function and perform
_retrieve_lecture_content_for_mcq inside the tool's execution path; update
create_tool_generate_mcq_questions and the generated tool's run/handler to call
self._retrieve_lecture_content_for_mcq(state) (or the injected callable) only
when the tool is actually invoked, and keep setting mcq_lecture_units_meta on
state at that time.
- Around line 399-428: The numeric-capture regex in count_patterns (especially
r"(?:another|more|give me|generate|create)\s+(\d+)") is too permissive and lets
counts from non-quiz requests trigger MCQ mode and uncontrolled parallel
generation; update logic so numeric matches are only accepted when accompanied
by quiz-related wording (tie them to mcq_keywords) or by tightening the regex to
require quiz words (e.g., require "(?:question|quiz|mcq|ask)" near the number),
and add a hard cap (define MAX_MCQ_COUNT, e.g., 10) and clamp the returned count
to that max before returning from the function (references: count_patterns,
mcq_keywords, and the code path that returns True, int(match.group(1))).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5f4c45af-030d-4eed-b340-b6e4097afc8c

📥 Commits

Reviewing files that changed from the base of the PR and between c071ab9 and e7d0526.

📒 Files selected for processing (1)
  • iris/src/iris/pipeline/chat/course_chat_pipeline.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@iris/src/iris/pipeline/chat/course_chat_pipeline.py`:
- Around line 791-796: Wrap the MCQ processing logic for both "mcq-set" and
"mcq" branches in a broad try/except around the loop/call that invokes
_process_question (and around accessing parsed.get("questions")). If any
exception (e.g., AttributeError, TypeError, ValueError) occurs, catch it, log or
record the exception context, and fall back to returning the original unparsed
string (so Artemis can handle it) instead of letting the exception propagate;
ensure the handler covers malformed inner structures like non-dict questions or
non-list options and references parsed, parsed.get("questions"), and
_process_question when locating the change.
- Around line 715-717: The loop iterating and clearing self.mcq_pipeline.tokens
can race with the MCQ worker thread appending tokens; guard access by first
confirming the MCQ thread has terminated (use the MCQ thread object’s is_alive()
or join(timeout) result) and only then call self._track_tokens(...) and
self.mcq_pipeline.tokens.clear(), or switch the token buffer to a thread-safe
structure (e.g., queue.Queue) and drain it atomically; if the thread timed out
and is still alive, skip tracking/clearing to avoid concurrent mutation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: deed161d-14a9-41f2-b7b9-8a1e62d1152c

📥 Commits

Reviewing files that changed from the base of the PR and between e7d0526 and b4fe988.

📒 Files selected for processing (1)
  • iris/src/iris/pipeline/chat/course_chat_pipeline.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant