-
Notifications
You must be signed in to change notification settings - Fork 3
Wrap file contents in XML tags #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
wkukka1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job, @Rolland-He! I’ve added a couple of minor suggestions around code style.
| lines = text_content.split('\n') | ||
| for i, line in enumerate(lines, start=1): | ||
| stripped_line = line.rstrip('\n').rstrip() | ||
| stripped_line = line.rstrip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicated logic for wrapping lines with XML tags, both for PDFs and regular text files. Consider refactoring the repeated line formatting into a helper function—e.g., _wrap_lines_with_xml(lines, tag_name, filename)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was actually part of the original code, I’ve moved it into a helper function already. I’ll work on simplifying the logic further. Thanks for pointing that out!
| Args: | ||
| assignment_files (list[str]): List of file paths to process | ||
| submission (Path): Student's submission file path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be Optional[Path] for all these args
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
wkukka1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks Good! @Rolland-He
| Returns: | ||
| str: Formatted content with XML tags and line numbers | ||
| """ | ||
| content = f"<{tag_name} file=\"{filename}\">\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the attribute name filename instead of file, here and throughout all XML tags that have to do with files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
| content = f"<{tag_name} file=\"{filename}\">\n" | ||
|
|
||
| for i, line in enumerate(lines, start=1): | ||
| if is_pdf: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay looking at this more carefully, I'm realizing that actually for text extracted from PDFs I don't think we need to do the line numbering (which was really to support precise annotations for code). I'm not sure we need to call this function at all when the given text is from a PDF, and instead we can just pass the raw text directly as the file contents to the LLM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
helpers/template_utils.py:=== {filename} ===-><submission file="correct_submission.py">Example:
test_outputis also included here.integration_test.py: Update tests.I have also cleaned the commit history.