-
Notifications
You must be signed in to change notification settings - Fork 4
fix: memory enhancement in import courses #380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔍 Existing Issues For ReviewYour pull request is modifying functions with the following pre-existing issues: 📄 File: ferry/transform/import_courses.py
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements memory optimizations to reduce memory usage during data transformation processes. The changes focus on reducing intermediate data structures, explicit garbage collection, and more efficient data processing patterns.
Key changes:
- Implemented explicit garbage collection at strategic points in the transformation pipeline
- Optimized professor and course rating computations to avoid large intermediate DataFrames
- Added immediate cleanup of temporary data structures with explicit
delstatements
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| ferry/transform/init.py | Added garbage collection calls after major transformation steps |
| ferry/transform/transform_compute.py | Refactored rating computations to use dictionaries and direct aggregation instead of lambda functions and intermediate lists |
| ferry/transform/import_courses.py | Added cleanup of source data immediately after concatenation |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
ferry/transform/transform_compute.py
Outdated
| # Pre-compute aggregated ratings for each same_course group to avoid repeated list creation | ||
| logging.debug("Pre-computing same-course rating aggregates") | ||
|
|
||
| def compute_aggregate_rating(course_ids: list[int], rating_dict: dict) -> tuple[float | None, int]: |
Copilot
AI
Oct 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rating_dict parameter should have a more specific type annotation. Consider using dict[int, float] or dict[int, float | None] to better document the expected key-value types.
| def compute_aggregate_rating(course_ids: list[int], rating_dict: dict) -> tuple[float | None, int]: | |
| def compute_aggregate_rating(course_ids: list[int], rating_dict: dict[int, float | None]) -> tuple[float | None, int]: |
|
lfg |
more efficient data processing / loading, specifically during the importing courses step during sync