Update question extraction #32

wkukka1 · 2025-08-25T18:51:13Z

Updates

Improved text extraction: now detects Markdown headings that match the --question argument.
Added PDF parsing with fitz (PyMuPDF) to extract the table of contents and locate headings/questions.
Added image extraction support for QMD files, currently limited to Python code blocks.
Replaced question_num with question

This reverts commit e320a00.

for more information, see https://pre-commit.ci

Rolland-He

Hi @wkukka1 , nice work!

ai_feedback/helpers/image_extractor.py

ai_feedback/helpers/template_utils.py

…tograding-feedback into update-question-extraction

for more information, see https://pre-commit.ci

david-yz-liu · 2025-08-26T17:15:29Z

ai_feedback/helpers/template_utils.py


    if not task_found:
-        print(f"Task {question_num} not found in any assignment file.")
+        print(f"Task {question} not found in any assignment file.")


since this is no longer a number, put quotes around {question}

david-yz-liu · 2025-08-26T18:57:34Z

ai_feedback/helpers/image_extractor.py

+from typing import Any, Dict, List, Optional


 def extract_images(input_notebook_path: os.PathLike, output_directory: os.PathLike, output_name: str):


add a return type annotation here (just noticed it was missing, but now it's changed)

david-yz-liu · 2025-08-26T19:00:15Z

ai_feedback/helpers/image_extractor.py

+
+    qp = Path(qmd_path)
+    if not qp.exists():
+        raise FileNotFoundError(f"QMD/RMD file not found: {qmd_path}")


remove the "/RMD" part

david-yz-liu · 2025-08-26T19:29:58Z

ai_feedback/helpers/image_extractor.py

+    start_line = 0
+    fence_kind = None  # "```" or "~~~"
+
+    while i < len(lines):


Since this is a case where i should always be incremented, it's better to use for i, raw in enumerate(lines)

david-yz-liu · 2025-08-26T20:02:41Z

pyproject.toml

    "pillow",
    "PyPDF2",
    "requests",
+    "matplotlib"


keep the dependencies in alphabetical order

david-yz-liu · 2025-08-26T20:04:19Z

ai_feedback/helpers/template_utils.py

-from typing import List, Optional
+from typing import Any, Dict, List, Optional, Tuple

+import fitz


Seems like this is an old name (https://pymupdf.readthedocs.io/en/latest/tutorial.html). The package should also be added to the project dependencies

…tograding-feedback into update-question-extraction

david-yz-liu · 2025-08-27T19:58:42Z

ai_feedback/helpers/image_extractor.py

+            cur = []
+            start_line = i
+            fence_kind = "~~~" if line.strip().startswith("~~~") else "```"
+            i += 1


As a result of switching to a for loop, the i += 1 statements are no longer necessary

david-yz-liu

Thank you, @wkukka1!

Will Kukkamalla and others added 14 commits July 29, 2025 15:22

updated args to use the hardcoded paths

e320a00

Revert "updated args to use the hardcoded paths"

d726439

This reverts commit e320a00.

added extractor logic similar to R repo

28db70d

updated packages

bf0fac1

Merge branch 'main' into update-question-extraction

e17914b

[pre-commit.ci] auto fixes from pre-commit.com hooks

56c11d4

for more information, see https://pre-commit.ci

replaced question_num with question

793b18b

remove debug statements

1727a34

changed question_num to question

2147e1e

changed docs

ea04c65

changed docs

d7aec2f

remove old extractor

43b0c55

remove debug statements

89caa53

[pre-commit.ci] auto fixes from pre-commit.com hooks

3369e40

for more information, see https://pre-commit.ci

wkukka1 requested a review from Rolland-He August 25, 2025 19:55

Rolland-He reviewed Aug 26, 2025

View reviewed changes

ai_feedback/helpers/image_extractor.py Outdated Show resolved Hide resolved

ai_feedback/helpers/image_extractor.py Outdated Show resolved Hide resolved

ai_feedback/helpers/template_utils.py Outdated Show resolved Hide resolved

Will Kukkamalla and others added 3 commits August 26, 2025 09:58

removed print statements

bdf5fc7

Merge branch 'update-question-extraction' of github.com:wkukka1/ai-au…

9c85383

…tograding-feedback into update-question-extraction

[pre-commit.ci] auto fixes from pre-commit.com hooks

b541f81

for more information, see https://pre-commit.ci

wkukka1 requested review from Rolland-He and david-yz-liu August 26, 2025 14:10

david-yz-liu reviewed Aug 26, 2025

View reviewed changes

Will Kukkamalla added 4 commits August 26, 2025 16:22

changed the from fitz to pymupdf and misc

3ec06d6

Merge branch 'update-question-extraction' of github.com:wkukka1/ai-au…

10a3190

…tograding-feedback into update-question-extraction

removed rmd from error message

21b90a4

made dependices alphabetical

efb1627

wkukka1 requested a review from david-yz-liu August 26, 2025 20:32

david-yz-liu reviewed Aug 27, 2025

View reviewed changes

removed i increments

45fefec

david-yz-liu approved these changes Aug 28, 2025

View reviewed changes

david-yz-liu merged commit 1689e25 into MarkUsProject:main Aug 28, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update question extraction #32

Update question extraction #32

Uh oh!

wkukka1 commented Aug 25, 2025 •

edited

Loading

Uh oh!

Rolland-He left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

david-yz-liu Aug 26, 2025

Uh oh!

david-yz-liu Aug 26, 2025

Uh oh!

david-yz-liu Aug 26, 2025

Uh oh!

david-yz-liu Aug 26, 2025

Uh oh!

david-yz-liu Aug 26, 2025

Uh oh!

david-yz-liu Aug 26, 2025

Uh oh!

david-yz-liu Aug 27, 2025

Uh oh!

david-yz-liu left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		from typing import Any, Dict, List, Optional


		def extract_images(input_notebook_path: os.PathLike, output_directory: os.PathLike, output_name: str):

Update question extraction #32

Update question extraction #32

Uh oh!

Conversation

wkukka1 commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rolland-He left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

david-yz-liu Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

david-yz-liu Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

david-yz-liu Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

david-yz-liu Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

david-yz-liu Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

david-yz-liu Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

david-yz-liu Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

david-yz-liu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wkukka1 commented Aug 25, 2025 •

edited

Loading