Iris: Add MCQ carousel, response persistence, and content-aware quiz generation#462
Iris: Add MCQ carousel, response persistence, and content-aware quiz generation#462
Iris: Add MCQ carousel, response persistence, and content-aware quiz generation#462Conversation
…o feature/iris/quiz-questions-v2
…oved intent detection
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds MCQ generation: new McqGenerationPipeline, MCQ tool and templates, integration into course and lecture chat pipelines (parallel and tool modes), tests, rerank fallback handling, status callback/chat_message changes, a pipeline enum entry, ignore entries, and a pyproject.toml. Changes
Sequence DiagramsequenceDiagram
participant Client
participant ChatPipeline
participant MCQ_Pipeline
participant LLM
participant Weaviate
participant Queue
participant Thread
alt mcq_parallel = true
Client->>ChatPipeline: start chat
ChatPipeline->>ChatPipeline: detect MCQ intent (set mcq_parallel)
ChatPipeline->>Thread: pre_agent_hook() spawn MCQ thread
Thread->>Weaviate: fetch lecture content & metadata
Weaviate-->>Thread: content + metadata
Thread->>MCQ_Pipeline: run_in_thread(command, lecture_content)
MCQ_Pipeline->>LLM: generate MCQ JSON
LLM-->>MCQ_Pipeline: MCQ response
MCQ_Pipeline->>MCQ_Pipeline: validate/repair JSON
MCQ_Pipeline->>Queue: enqueue result
ChatPipeline->>ChatPipeline: post_agent_hook() join thread, collect queue
ChatPipeline->>ChatPipeline: add citations & merge MCQ into result
ChatPipeline-->>Client: final response
else mcq_parallel = false
Client->>ChatPipeline: start chat
ChatPipeline->>ChatPipeline: detect MCQ intent (set mcq_parallel=false)
ChatPipeline->>ChatPipeline: add generate_mcq_questions tool
Client->>ChatPipeline: agent invokes tool
ChatPipeline->>MCQ_Pipeline: __call__(command)
MCQ_Pipeline->>LLM: generate MCQ JSON
LLM-->>MCQ_Pipeline: MCQ response
MCQ_Pipeline->>MCQ_Pipeline: validate/repair JSON
MCQ_Pipeline-->>ChatPipeline: return MCQ JSON (stored in shared storage)
ChatPipeline->>ChatPipeline: replace [MCQ_RESULT], add citations
ChatPipeline-->>Client: final response
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Iris: Add MCQ carousel, response persistence, and content-aware quiz generation
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (6)
iris/src/iris/llm/request_handler/rerank_request_handler.py (1)
90-98: Consider narrowing the exception type.Catching bare
Exceptioncan mask programming errors (e.g.,TypeError,AttributeErrorfrom bugs in result indexing on line 87-88). Consider catching more specific exceptions from the Cohere client (e.g., network/API errors) while letting programming errors propagate.♻️ Suggested refinement
- except Exception as e: + except (cohere.CohereAPIError, cohere.CohereConnectionError, TimeoutError) as e:If the Cohere client's specific exception types aren't accessible, at minimum exclude
TypeError,AttributeError,KeyErrorto catch bugs in the result-handling logic on lines 87-88.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@iris/src/iris/llm/request_handler/rerank_request_handler.py` around lines 90 - 98, The current broad except Exception in RerankRequestHandler around the Cohere call masks programming errors; narrow it to catch the Cohere/client-specific exceptions (or network/API errors) instead and let bugs propagate; if those exception classes aren't available, change the handler to except Exception as e: but re-raise on TypeError, AttributeError, KeyError (i.e., check the exception type and reraise those) so that logger.warning, setting RerankRequestHandler._rerank_available = False, and returning valid_documents[:top_n] only happen for expected client/network failures while index/result-handling bugs in the rerank logic surface normally.iris/src/iris/tools/mcq_generation.py (1)
48-56: Consider error handling for pipeline failures.If
mcq_pipeline(...)raises an exception, it propagates uncaught. Depending on how the agent handles tool errors, you may want to catch exceptions and return a user-friendly error message instead of the placeholder.♻️ Optional: wrap in try/except for graceful degradation
+ from ..common.logging_config import get_logger + logger = get_logger(__name__) + def generate_mcq_questions(command: str) -> str: ... - result_json = mcq_pipeline( - command=command, - chat_history=chat_history, - user_language=user_language, - callback=callback, - lecture_content=lecture_content, - ) - mcq_result_storage["mcq_json"] = result_json - return "[MCQ_RESULT]" + try: + result_json = mcq_pipeline( + command=command, + chat_history=chat_history, + user_language=user_language, + callback=callback, + lecture_content=lecture_content, + ) + mcq_result_storage["mcq_json"] = result_json + return "[MCQ_RESULT]" + except Exception as e: + logger.warning("MCQ generation failed: %s", str(e)) + return "I wasn't able to generate a quiz question right now. Please try again."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@iris/src/iris/tools/mcq_generation.py` around lines 48 - 56, The call to mcq_pipeline can raise and currently will propagate uncaught; wrap the mcq_pipeline(...) invocation in a try/except that catches Exception, logs or stores the exception details into mcq_result_storage (e.g., set a key like "error" with the exception message/traceback), and return a user-friendly marker or message (e.g., "[MCQ_ERROR]" or a short descriptive string) instead of letting the exception bubble; keep the original behavior of setting mcq_result_storage["mcq_json"] on success and reference the mcq_pipeline call and mcq_result_storage keys so you modify the correct block.iris/src/iris/pipeline/chat/lecture_chat_pipeline.py (2)
287-328: Consider extracting_detect_mcq_intentto a shared utility.This method is identical to the one in
course_chat_pipeline.py(lines 423-464). Consider extracting it to a shared module (e.g.,pipeline/shared/utils.py) to eliminate duplication and ensure consistent behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@iris/src/iris/pipeline/chat/lecture_chat_pipeline.py` around lines 287 - 328, The _detect_mcq_intent function in lecture_chat_pipeline.py is duplicated in course_chat_pipeline.py; extract this function into a shared utility (e.g., pipeline.shared.utils) with the same signature def _detect_mcq_intent(user_message: str) -> tuple[bool, int], move the logic (count_patterns, mcq_keywords, regex search) into that module, then replace the local definitions in both LectureChatPipeline and CourseChatPipeline with imports from the new shared util and call the shared _detect_mcq_intent; ensure both modules update their imports, preserve return semantics (True/False and count defaulting to 1/0), and run existing pipeline tests to confirm no behavioral change.
621-679:_add_mcq_citationsis duplicated across pipelines.This static method is identical to the one in
course_chat_pipeline.py(lines 768-826). Extract to a shared module to maintain consistency and reduce duplication.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@iris/src/iris/pipeline/chat/lecture_chat_pipeline.py` around lines 621 - 679, The _add_mcq_citations staticmethod is duplicated between lecture_chat_pipeline and course_chat_pipeline; extract the logic into a single shared helper (e.g., create a function named add_mcq_citations in a new/shared module like iris.pipeline.chat._mcq_utils or similar), move the current implementation there (preserve signature: mcq_json_str, lecture_units_meta), replace both staticmethods with imports that call the shared add_mcq_citations (or keep a thin staticmethod wrapper that delegates), and remove the duplicated code from lecture_chat_pipeline and course_chat_pipeline so both modules import and reuse the single implementation.iris/src/iris/pipeline/shared/mcq_generation_pipeline.py (2)
322-330: F-string with conditional expression is valid in Python 3.12+.The static analysis tool (Flake8) reports a syntax error on line 324, but this is a false positive. Per retrieved learnings, Python 3.12+ (PEP 701) supports nested f-strings and complex expressions within f-string braces. The syntax
f'{"from the lecture material above " if lecture_content else ""}'is valid.However, consider simplifying for readability:
♻️ Optional: Simplify f-string for clarity
+ material_clause = "from the lecture material above " if lecture_content else "" prompt += ( f"\nGenerate exactly {count} distinct subtopics or aspects " - f'{"from the lecture material above " if lecture_content else ""}' + f"{material_clause}" f"that would each make a good multiple-choice question. "🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@iris/src/iris/pipeline/shared/mcq_generation_pipeline.py` around lines 322 - 330, The f-string uses a nested expression that triggers linters; to fix, replace the nested f-string expression with a simple conditional string or variable and include it in the main prompt assembly: compute a small helper like lecture_clause = "from the lecture material above " if lecture_content else "" (or use inline ('from the lecture material above ' if lecture_content else '')), then use f"...Generate exactly {count} distinct subtopics or aspects {lecture_clause}that..." when building the prompt variable in mcq_generation_pipeline.py (look for the prompt variable construction referencing lecture_content and count).
276-289: Rename unused loop variableito_.The loop variable
iis not used within the loop body. Convention is to use_for intentionally unused variables.♻️ Proposed fix
- for i in range(missing): + for _ in range(missing):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@iris/src/iris/pipeline/shared/mcq_generation_pipeline.py` around lines 276 - 289, In the retry loop inside MCQ generation (the for loop that currently reads "for i in range(missing):" where the loop variable i is not used), rename the unused loop variable to "_" (i.e., "for _ in range(missing):") to follow the unused-variable convention; update the loop header only (no other logic change) and ensure no reference to "i" remains in the surrounding method (the block that calls self(...), appends to successful_results, and logs errors).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@iris/src/iris/domain/status/stage_dto.py`:
- Line 16: StageDTO currently defines chat_message with alias="chatMessage"
which prevents instantiation by the Python name; add a model_config to StageDTO
to enable population by Python field names (i.e., set populate_by_name=True via
ConfigDict) so instances created with chat_message=... in
course_chat_pipeline.py, lecture_chat_pipeline.py, and
mcq_generation_pipeline.py validate correctly.
In `@iris/src/iris/llm/request_handler/rerank_request_handler.py`:
- Line 23: The class-level flag _rerank_available on RerankRequestHandler
disables reranking for all instances after one failure; change this to an
instance attribute (initialize self._rerank_available in
RerankRequestHandler.__init__) or implement a time-to-live/reset (store a
timestamp like self._rerank_backoff_until and check it before reranking) so
failures are localized or temporary; also replace the broad except Exception
around the call to cohere_client.rerank() with targeted exception handling
(catch the specific API/timeout/network exceptions you expect and only set the
instance-level backoff/disable for those), and add a periodic or TTL-based retry
path to re-enable reranking when appropriate.
In `@iris/src/iris/pipeline/chat/course_chat_pipeline.py`:
- Around line 249-264: The non-parallel MCQ tool block in
course_chat_pipeline.py fails to pass lecture context; call the existing
_retrieve_lecture_content_for_mcq(self, state) (same helper used for parallel
flow) to obtain lecture_content and then pass that lecture_content into
create_tool_generate_mcq_questions in the non-parallel branch (the same
position/argument where the parallel branch supplies it), ensuring you still use
getattr(state, "mcq_result_storage", {}) and user_language as before.
In `@iris/src/iris/pipeline/chat/lecture_chat_pipeline.py`:
- Around line 546-548: The MCQ thread join in lecture_chat_pipeline.py currently
uses timeout=120 which is inconsistent with course_chat_pipeline.py's
timeout=180; update the mcq_thread.join call in the lecture chat flow (look for
mcq_thread.join(...) and getattr(state, "mcq_result_storage", {})) to use a
shared constant (e.g., MCQ_THREAD_TIMEOUT) or match the 180s value used in
course_chat_pipeline so both pipelines behave the same; define the constant at
module scope (or import it from the module where it's defined) and reference it
in the join call instead of the hardcoded 120.
In `@iris/tests/test_mcq_generation_tool.py`:
- Around line 31-46: The test helper _create_tool is missing the lecture_content
parameter compared to the production factory create_tool_generate_mcq_questions,
so update _create_tool to accept a lecture_content argument and pass it through
to mcq_pipeline (i.e., include lecture_content=lecture_content in the
mcq_pipeline call inside generate_mcq_questions) so the test
test_tool_calls_pipeline_with_correct_args verifies the same argument flow as
production; keep storing result in mcq_result_storage["mcq_json"] and return
"[MCQ_RESULT]" as before.
- Around line 85-96: Update the test test_tool_calls_pipeline_with_correct_args
to assert that lecture_content is passed to the pipeline: modify the call to
mock_pipeline.assert_called_once_with(...) to include
lecture_content=<expected_value> and adjust the _setup_tool helper to accept and
forward a lecture_content argument (so tool, mock_pipeline, mock_callback,
chat_history, lecture_content = _setup_tool() or have _setup_tool return the
lecture_content) ensuring the tool(command) invocation uses that
lecture_content; reference the test function
test_tool_calls_pipeline_with_correct_args, the helper _setup_tool, and
mock_pipeline when making the change.
In `@pyproject.toml`:
- Line 9: The project metadata's requires-python value uses Poetry's caret
syntax ("^3.12") which is invalid for PEP 621/PEP 440; update the
requires-python field to a PEP 440 range such as ">=3.12,<4" so tools that
validate PEP 440 (pip/setuptools/build) accept it — locate the requires-python
entry in pyproject.toml and replace the caret specifier with a proper PEP 440
specifier.
- Around line 9-11: The root pyproject declares requires-python = "^3.12" which
conflicts with package constraints (logos and atlas require >=3.13 while nebula
caps at <3.13); unify Python constraints across the monorepo by choosing a
single supported Python range (e.g., requires-python = ">=3.12,<3.14" or bump
everything to ">=3.13,<4.0"), then update the following package pyproject.toml
files to match that chosen range: root (requires-python in the diff),
logos/pyproject.toml, nebula/pyproject.toml, atlas/AtlasMl/pyproject.toml and
any other packages (shared, iris, memiris, athena*, evaluation) that declare
incompatible ranges; ensure nebula's upper bound is relaxed or logos/atlas are
lowered consistently, run dependency tests/CI to confirm no runtime/install
conflicts, and commit the synchronized requires-python constraint.
---
Nitpick comments:
In `@iris/src/iris/llm/request_handler/rerank_request_handler.py`:
- Around line 90-98: The current broad except Exception in RerankRequestHandler
around the Cohere call masks programming errors; narrow it to catch the
Cohere/client-specific exceptions (or network/API errors) instead and let bugs
propagate; if those exception classes aren't available, change the handler to
except Exception as e: but re-raise on TypeError, AttributeError, KeyError
(i.e., check the exception type and reraise those) so that logger.warning,
setting RerankRequestHandler._rerank_available = False, and returning
valid_documents[:top_n] only happen for expected client/network failures while
index/result-handling bugs in the rerank logic surface normally.
In `@iris/src/iris/pipeline/chat/lecture_chat_pipeline.py`:
- Around line 287-328: The _detect_mcq_intent function in
lecture_chat_pipeline.py is duplicated in course_chat_pipeline.py; extract this
function into a shared utility (e.g., pipeline.shared.utils) with the same
signature def _detect_mcq_intent(user_message: str) -> tuple[bool, int], move
the logic (count_patterns, mcq_keywords, regex search) into that module, then
replace the local definitions in both LectureChatPipeline and CourseChatPipeline
with imports from the new shared util and call the shared _detect_mcq_intent;
ensure both modules update their imports, preserve return semantics (True/False
and count defaulting to 1/0), and run existing pipeline tests to confirm no
behavioral change.
- Around line 621-679: The _add_mcq_citations staticmethod is duplicated between
lecture_chat_pipeline and course_chat_pipeline; extract the logic into a single
shared helper (e.g., create a function named add_mcq_citations in a new/shared
module like iris.pipeline.chat._mcq_utils or similar), move the current
implementation there (preserve signature: mcq_json_str, lecture_units_meta),
replace both staticmethods with imports that call the shared add_mcq_citations
(or keep a thin staticmethod wrapper that delegates), and remove the duplicated
code from lecture_chat_pipeline and course_chat_pipeline so both modules import
and reuse the single implementation.
In `@iris/src/iris/pipeline/shared/mcq_generation_pipeline.py`:
- Around line 322-330: The f-string uses a nested expression that triggers
linters; to fix, replace the nested f-string expression with a simple
conditional string or variable and include it in the main prompt assembly:
compute a small helper like lecture_clause = "from the lecture material above "
if lecture_content else "" (or use inline ('from the lecture material above ' if
lecture_content else '')), then use f"...Generate exactly {count} distinct
subtopics or aspects {lecture_clause}that..." when building the prompt variable
in mcq_generation_pipeline.py (look for the prompt variable construction
referencing lecture_content and count).
- Around line 276-289: In the retry loop inside MCQ generation (the for loop
that currently reads "for i in range(missing):" where the loop variable i is not
used), rename the unused loop variable to "_" (i.e., "for _ in range(missing):")
to follow the unused-variable convention; update the loop header only (no other
logic change) and ensure no reference to "i" remains in the surrounding method
(the block that calls self(...), appends to successful_results, and logs
errors).
In `@iris/src/iris/tools/mcq_generation.py`:
- Around line 48-56: The call to mcq_pipeline can raise and currently will
propagate uncaught; wrap the mcq_pipeline(...) invocation in a try/except that
catches Exception, logs or stores the exception details into mcq_result_storage
(e.g., set a key like "error" with the exception message/traceback), and return
a user-friendly marker or message (e.g., "[MCQ_ERROR]" or a short descriptive
string) instead of letting the exception bubble; keep the original behavior of
setting mcq_result_storage["mcq_json"] on success and reference the mcq_pipeline
call and mcq_result_storage keys so you modify the correct block.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8641539c-ed5f-439d-99d9-1176b6b67a33
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (17)
iris/.claudeignoreiris/src/iris/common/pipeline_enum.pyiris/src/iris/domain/status/stage_dto.pyiris/src/iris/llm/request_handler/rerank_request_handler.pyiris/src/iris/pipeline/chat/course_chat_pipeline.pyiris/src/iris/pipeline/chat/lecture_chat_pipeline.pyiris/src/iris/pipeline/prompts/templates/course_chat_system_prompt.j2iris/src/iris/pipeline/prompts/templates/lecture_chat_system_prompt.j2iris/src/iris/pipeline/prompts/templates/mcq_generation_prompt.j2iris/src/iris/pipeline/shared/mcq_generation_pipeline.pyiris/src/iris/tools/__init__.pyiris/src/iris/tools/mcq_generation.pyiris/src/iris/web/status/status_update.pyiris/tests/test_mcq_generation_pipeline.pyiris/tests/test_mcq_generation_tool.pyiris/tests/test_mcq_prompt_rendering.pypyproject.toml
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
iris/src/iris/pipeline/chat/course_chat_pipeline.py (2)
425-466: Consider edge cases in intent detection.The keyword-based detection could trigger false positives in certain contexts:
- "quiz" matches questions like "When is the quiz due?" or "What is a quiz?"
- "another 3" could match "explain another 3 concepts"
While the LLM can likely handle these gracefully, consider adding negative patterns or requiring multiple signals for more confident detection if this becomes problematic in practice.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@iris/src/iris/pipeline/chat/course_chat_pipeline.py` around lines 425 - 466, The _detect_mcq_intent function is producing false positives from broad keywords; tighten detection by adding negative patterns and stronger positive signals: update the count_patterns and mcq_keywords checks to (1) prefer explicit count patterns (keep current behavior), (2) require the word "question(s)" or phrases like "quiz me", "test me", "ask me" for keyword matches (avoid matching lone "quiz"), and (3) add negative pattern checks (e.g., words like "due", "when", "what is", "explain", "definition") that, if present, suppress MCQ intent even when a keyword matches; implement this logic in _detect_mcq_intent so it first checks negatives, then explicit counts, then confirms keywords only when accompanied by question-related terms or absent negative patterns.
793-801: Substring matching for unit lookup may produce false matches.The
_find_matching_unitfunction uses bidirectional substring matching (source in name or name in source). This could produce incorrect matches for common or short names:
- Unit "Introduction" would match source "Introduction to Advanced Topics"
- Unit "Data" would match unrelated sources containing "Data"
Consider using more specific matching (e.g., word boundaries, similarity scoring) if citation accuracy becomes important. For now, this is acceptable as a best-effort approach.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@iris/src/iris/pipeline/chat/course_chat_pipeline.py` around lines 793 - 801, The current _find_matching_unit uses bidirectional substring checks on source_lower vs unit_name/lecture_name which yields false positives for short/common tokens; update the matching to require word-boundary or token-based matches instead of raw substring checks: for each meta in lecture_units_meta, normalize and tokenize source_lower and unit_name/lecture_name and then check for exact token overlap or apply a simple regex word-boundary match (e.g., whole-word match) or a lightweight similarity threshold (e.g., fuzzy ratio) before returning meta; update the checks that currently reference source_lower, unit_name, and lecture_name to use the new word-boundary/token-match logic to reduce accidental matches.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@iris/src/iris/pipeline/chat/course_chat_pipeline.py`:
- Around line 682-690: The MCQ thread is left running after join(timeout=180)
which can leak resources; introduce a cancellation Event and use it to signal
the MCQ pipeline to stop: when creating the thread (where mcq_thread is spawned)
create a threading.Event (e.g., mcq_stop_event), pass it into the MCQ
worker/function so the worker periodically checks mcq_stop_event.is_set() and
exits cleanly (also stop putting to mcq_result_storage/queue if set). In this
block, after mcq_thread.join(timeout=180) returns and mcq_thread.is_alive() is
True, call mcq_stop_event.set(), then attempt a short join (e.g.,
join(timeout=5)) and if still alive log a stronger error; also store the event
on state (e.g., state.mcq_stop_event) alongside mcq_result_storage so the
timeout handler can access it.
- Around line 490-498: The fetch_objects call for chunks
(state.db.lectures.query.fetch_objects) relies on Weaviate's implicit default
limit of 10; make the intent explicit by either adding a limit parameter (e.g.,
limit=10 with a comment "intentional cap to 10 chunks" or limit=500 if you need
more chunks for MCQ quality) or, if you need all course chunks, replace
fetch_objects with the paginated iterator API
(state.db.lectures.query.iterator(...)) to retrieve all pages; update the call
site where chunks is assigned and keep the same return_properties
(LectureUnitPageChunkSchema.*) when changing to iterator.
---
Nitpick comments:
In `@iris/src/iris/pipeline/chat/course_chat_pipeline.py`:
- Around line 425-466: The _detect_mcq_intent function is producing false
positives from broad keywords; tighten detection by adding negative patterns and
stronger positive signals: update the count_patterns and mcq_keywords checks to
(1) prefer explicit count patterns (keep current behavior), (2) require the word
"question(s)" or phrases like "quiz me", "test me", "ask me" for keyword matches
(avoid matching lone "quiz"), and (3) add negative pattern checks (e.g., words
like "due", "when", "what is", "explain", "definition") that, if present,
suppress MCQ intent even when a keyword matches; implement this logic in
_detect_mcq_intent so it first checks negatives, then explicit counts, then
confirms keywords only when accompanied by question-related terms or absent
negative patterns.
- Around line 793-801: The current _find_matching_unit uses bidirectional
substring checks on source_lower vs unit_name/lecture_name which yields false
positives for short/common tokens; update the matching to require word-boundary
or token-based matches instead of raw substring checks: for each meta in
lecture_units_meta, normalize and tokenize source_lower and
unit_name/lecture_name and then check for exact token overlap or apply a simple
regex word-boundary match (e.g., whole-word match) or a lightweight similarity
threshold (e.g., fuzzy ratio) before returning meta; update the checks that
currently reference source_lower, unit_name, and lecture_name to use the new
word-boundary/token-match logic to reduce accidental matches.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4ba68e9f-d2ef-4cdf-a12a-f36b6227da86
📒 Files selected for processing (2)
iris/src/iris/domain/status/stage_dto.pyiris/src/iris/pipeline/chat/course_chat_pipeline.py
🚧 Files skipped from review as they are similar to previous changes (1)
- iris/src/iris/domain/status/stage_dto.py
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@iris/src/iris/pipeline/chat/course_chat_pipeline.py`:
- Line 247: The fallback tool-calling path currently inserts mcq_json raw and
discards lecture_units_meta, so update the fallback in course_chat_pipeline to
persist lecture_units_meta on state (set state.lecture_units_meta =
lecture_units_meta returned by _retrieve_lecture_content_for_mcq) and call
_add_mcq_citations(mcq_json, state) before replacing the [MCQ_RESULT]
placeholder; do the same normalization flow you use in the parallel path (ensure
any cleaning of questions/options and enrichment of explanations with
[cite:L:...] happens via _add_mcq_citations) and apply the same fix to the other
fallback block around lines 615-626.
- Line 247: The code currently calls _retrieve_lecture_content_for_mcq during
get_tools(), causing expensive Weaviate fetches for every turn; instead, defer
that call until the MCQ tool is actually invoked. Move the lecture_content, _ =
self._retrieve_lecture_content_for_mcq(state) out of get_tools() and into the
MCQ tool's callback (the function used to implement generate_mcq_questions), or
implement lazy caching so the first time generate_mcq_questions runs it calls
_retrieve_lecture_content_for_mcq(state), stores the result on the instance
(e.g., self._mcq_lecture_content), and subsequent calls reuse the cache.
- Around line 395-425: The count detection logic in the block that defines
count_patterns and mcq_keywords is too permissive: update count_patterns so
numeric captures are only accepted when tied to explicit quiz terms (e.g.,
require question|mcq|quiz in the same pattern) instead of matching bare “give me
3”; clamp the derived mcq_count to a small safe range (e.g., 1..5) before
returning; only treat phrases like "another question" or "one more" as mcq
intent when the previous assistant turn was an MCQ (check whatever state you
have for the last assistant response, e.g., last_bot_was_mcq or similar) and
otherwise ignore them; and keep the fallback keyword check (mcq_keywords against
message_lower) but ensure generic verbs like "give me" or "generate" without
quiz-specific words do not trigger MCQ mode.
- Around line 2-10: Add an import for the json module at the top of
course_chat_pipeline.py so json.loads/json.dumps calls succeed; update the
import block near the existing imports (with
Environment/FileSystemLoader/select_autoescape and pytz) to include "import
json" so functions that call json.loads and json.dumps (used in MCQ
post-processing) will not raise NameError.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5844f611-1a10-43df-85ac-099adb9f0d4c
📒 Files selected for processing (2)
iris/src/iris/pipeline/chat/course_chat_pipeline.pyiris/src/iris/pipeline/prompts/templates/course_chat_system_prompt.j2
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
iris/src/iris/pipeline/chat/course_chat_pipeline.py (2)
248-260:⚠️ Potential issue | 🟠 MajorMake the MCQ lecture lookup lazy.
get_tools()is now doing the full course-wide lecture fetch while merely registeringgenerate_mcq_questions, so ordinary chat turns pay the Weaviate cost even when the agent never calls that tool.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@iris/src/iris/pipeline/chat/course_chat_pipeline.py` around lines 248 - 260, get_tools() is eagerly calling _retrieve_lecture_content_for_mcq and paying the Weaviate cost when merely registering the tool; change this to lazy lookup by removing the immediate call in get_tools() and instead have create_tool_generate_mcq_questions accept a lazy loader (e.g., a callable or None) or fetch function and perform _retrieve_lecture_content_for_mcq inside the tool's execution path; update create_tool_generate_mcq_questions and the generated tool's run/handler to call self._retrieve_lecture_content_for_mcq(state) (or the injected callable) only when the tool is actually invoked, and keep setting mcq_lecture_units_meta on state at that time.
399-428:⚠️ Potential issue | 🔴 CriticalTie numeric matches to quiz wording and cap the count.
The broad
(?:another|more|give me|generate|create)\s+(\d+)branch still turns requests likegive me 3 study tipsinto MCQ mode, and that raw count later drives parallel generation. That makes the reply easy to hijack and leaves worker fan-out user-controlled.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@iris/src/iris/pipeline/chat/course_chat_pipeline.py` around lines 399 - 428, The numeric-capture regex in count_patterns (especially r"(?:another|more|give me|generate|create)\s+(\d+)") is too permissive and lets counts from non-quiz requests trigger MCQ mode and uncontrolled parallel generation; update logic so numeric matches are only accepted when accompanied by quiz-related wording (tie them to mcq_keywords) or by tightening the regex to require quiz words (e.g., require "(?:question|quiz|mcq|ask)" near the number), and add a hard cap (define MAX_MCQ_COUNT, e.g., 10) and clamp the returned count to that max before returning from the function (references: count_patterns, mcq_keywords, and the code path that returns True, int(match.group(1))).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@iris/src/iris/pipeline/chat/course_chat_pipeline.py`:
- Around line 747-795: The code assumes json.loads(mcq_json_str) yields a dict
and calls parsed.get(...) and _process_question on it; if json.loads returns a
non-object (list, string, number) those calls will raise and abort processing.
After parsing, guard that parsed is a dict (e.g., if not isinstance(parsed,
dict): return mcq_json_str) before any parsed.get or _process_question usage so
non-object LLM outputs simply fall back to the original string; keep
_process_question and _find_matching_unit unchanged and still operate only on
dicts, and ensure the existing return json.dumps(parsed) only runs for valid
dicts you mutated.
---
Duplicate comments:
In `@iris/src/iris/pipeline/chat/course_chat_pipeline.py`:
- Around line 248-260: get_tools() is eagerly calling
_retrieve_lecture_content_for_mcq and paying the Weaviate cost when merely
registering the tool; change this to lazy lookup by removing the immediate call
in get_tools() and instead have create_tool_generate_mcq_questions accept a lazy
loader (e.g., a callable or None) or fetch function and perform
_retrieve_lecture_content_for_mcq inside the tool's execution path; update
create_tool_generate_mcq_questions and the generated tool's run/handler to call
self._retrieve_lecture_content_for_mcq(state) (or the injected callable) only
when the tool is actually invoked, and keep setting mcq_lecture_units_meta on
state at that time.
- Around line 399-428: The numeric-capture regex in count_patterns (especially
r"(?:another|more|give me|generate|create)\s+(\d+)") is too permissive and lets
counts from non-quiz requests trigger MCQ mode and uncontrolled parallel
generation; update logic so numeric matches are only accepted when accompanied
by quiz-related wording (tie them to mcq_keywords) or by tightening the regex to
require quiz words (e.g., require "(?:question|quiz|mcq|ask)" near the number),
and add a hard cap (define MAX_MCQ_COUNT, e.g., 10) and clamp the returned count
to that max before returning from the function (references: count_patterns,
mcq_keywords, and the code path that returns True, int(match.group(1))).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5f4c45af-030d-4eed-b340-b6e4097afc8c
📒 Files selected for processing (1)
iris/src/iris/pipeline/chat/course_chat_pipeline.py
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@iris/src/iris/pipeline/chat/course_chat_pipeline.py`:
- Around line 791-796: Wrap the MCQ processing logic for both "mcq-set" and
"mcq" branches in a broad try/except around the loop/call that invokes
_process_question (and around accessing parsed.get("questions")). If any
exception (e.g., AttributeError, TypeError, ValueError) occurs, catch it, log or
record the exception context, and fall back to returning the original unparsed
string (so Artemis can handle it) instead of letting the exception propagate;
ensure the handler covers malformed inner structures like non-dict questions or
non-list options and references parsed, parsed.get("questions"), and
_process_question when locating the change.
- Around line 715-717: The loop iterating and clearing self.mcq_pipeline.tokens
can race with the MCQ worker thread appending tokens; guard access by first
confirming the MCQ thread has terminated (use the MCQ thread object’s is_alive()
or join(timeout) result) and only then call self._track_tokens(...) and
self.mcq_pipeline.tokens.clear(), or switch the token buffer to a thread-safe
structure (e.g., queue.Queue) and drain it atomically; if the thread timed out
and is still alive, skip tracking/clearing to avoid concurrent mutation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: deed161d-14a9-41f2-b7b9-8a1e62d1152c
📒 Files selected for processing (1)
iris/src/iris/pipeline/chat/course_chat_pipeline.py
Summary
This PR is the continuation of the server-side implementation that enables students to request AI-generated multiple choice quiz questions from Iris. It is now possible to generate multiple questions at once, the questions are based on lecture content and citations are included.
Note: This PR works best together with the corresponding Artemis changes from the branch with the same name (
feature/iris/quiz-questions) for MCQ generation to work end-to-end. That branch adds the MCQ generation UI to the Artemis, as well as communication between the Artemis and Iris servers.Motivation and Context
Previously, Iris currently responded with plain text messages. After adding single MCQ generation, to support active learning and self-assessment, students should be able to request multiple quiz-style questions directly within the chat. This enables Iris to serve as a more interactive tutoring tool by generating multiple-choice questions with instant feedback.
Description
Architecture: Tool-based MCQ generation with dedicated subpipeline
McqGenerationPipelinesubpipeline (mcq_generation_pipeline.py) that handles MCQ generation with its own prompt template and a fast model (gpt-4.1-nano).tools/mcq_generation.py) that the agent can invoke, registered in both the course chat and lecture chat pipelines.Multi-question generation with parallel workers
_detect_mcq_intent) that parses the user's message for question count (e.g., "5 questions", "quiz me") before tool selection.contextvarscontext) to generate questions concurrently.Queueand bundled into anmcq-setJSON payload for the Artemis carousel UI.Lecture content grounding and citations
[cite:L:...]format used by the regular citation pipeline, matching the LLM-generatedsourcefield to lecture unit metadata.UX improvements
chatMessagefield toStageDTOsoin_progress()can send dynamic loading messages to the user (e.g., "Getting your quiz ready...", "Writing the question and answer options...").Prompt engineering
mcq_generation_prompt.j2) with strict JSON output format, answer validation instructions ("mentally solve before outputting"), option-length uniformity rules, and source attribution requirements.Testing
test_mcq_generation_pipeline.py(parallel generation, thread safety, queue delivery, subtopic extraction),test_mcq_generation_tool.py(tool creation and invocation),test_mcq_prompt_rendering.py(template rendering with various contexts).Other changes included in the branch (merged from main):
tj-actions/changed-files,actions/cache,actions/upload-artifactSummary by CodeRabbit
New Features
Bug Fixes
Tests
Chores