Skip to content

Conversation

@wkukka1
Copy link
Collaborator

@wkukka1 wkukka1 commented Aug 25, 2025

Updates

  • Improved text extraction: now detects Markdown headings that match the --question argument.
  • Added PDF parsing with fitz (PyMuPDF) to extract the table of contents and locate headings/questions.
  • Added image extraction support for QMD files, currently limited to Python code blocks.
  • Replaced question_num with question

@wkukka1 wkukka1 requested a review from Rolland-He August 25, 2025 19:55
Copy link
Collaborator

@Rolland-He Rolland-He left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wkukka1 , nice work!


if not task_found:
print(f"Task {question_num} not found in any assignment file.")
print(f"Task {question} not found in any assignment file.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is no longer a number, put quotes around {question}

from typing import Any, Dict, List, Optional


def extract_images(input_notebook_path: os.PathLike, output_directory: os.PathLike, output_name: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a return type annotation here (just noticed it was missing, but now it's changed)


qp = Path(qmd_path)
if not qp.exists():
raise FileNotFoundError(f"QMD/RMD file not found: {qmd_path}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the "/RMD" part

start_line = 0
fence_kind = None # "```" or "~~~"

while i < len(lines):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a case where i should always be incremented, it's better to use for i, raw in enumerate(lines)

pyproject.toml Outdated
"pillow",
"PyPDF2",
"requests",
"matplotlib"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep the dependencies in alphabetical order

from typing import List, Optional
from typing import Any, Dict, List, Optional, Tuple

import fitz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this is an old name (https://pymupdf.readthedocs.io/en/latest/tutorial.html). The package should also be added to the project dependencies

@wkukka1 wkukka1 requested a review from david-yz-liu August 26, 2025 20:32
cur = []
start_line = i
fence_kind = "~~~" if line.strip().startswith("~~~") else "```"
i += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a result of switching to a for loop, the i += 1 statements are no longer necessary

Copy link
Contributor

@david-yz-liu david-yz-liu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @wkukka1!

@david-yz-liu david-yz-liu merged commit 1689e25 into MarkUsProject:main Aug 28, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants