Skip to content

Future-proofing the global code#49

Merged
sergeyf merged 1 commit intomainfrom
global_refactor
Jul 7, 2025
Merged

Future-proofing the global code#49
sergeyf merged 1 commit intomainfrom
global_refactor

Conversation

@sergeyf
Copy link
Copy Markdown
Collaborator

@sergeyf sergeyf commented Jul 7, 2025

Frees us from Python 3.8...

@sergeyf sergeyf requested a review from Copilot July 7, 2025 20:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances multiprocessing support by ensuring global variables are correctly initialized in worker processes, standardizing pool creation with the spawn context, and refactoring ternary and comprehension formatting. It also adds tests to verify consistency and fallback behavior when using multiple processes.

  • Introduce _init_pool helpers and switch to multiprocessing.get_context("spawn") in both featurizer and data modules
  • Refactor multiline ternary expressions and list comprehensions for readability
  • Add comprehensive tests to confirm single‐ vs. multi‐process output consistency and fallback logic

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
tests/test_featurizer.py New tests for featurization consistency, global dataset init, and fallbacks
tests/test_data.py New tests for preprocessing consistency, global var init, and fallbacks
s2and/featurizer.py Use spawn context, add _init_pool, refactor feature‐tuple formatting
s2and/data.py Use spawn context, add _init_pool, refactor comprehension formatting
Comments suppressed due to low confidence (1)

tests/test_data.py:230

  • [nitpick] The test verifies key fields but omits other derived attributes (e.g., title_ngrams_chars, venue_ngrams); consider extending assertions to cover all preprocessing outputs for full consistency checks.
            assert paper_single.title == paper_multi.title, f"Title mismatch for paper {paper_id}"

Comment thread s2and/featurizer.py
Comment thread s2and/data.py
@allenai allenai deleted a comment from Copilot AI Jul 7, 2025
@sergeyf sergeyf merged commit 52136a5 into main Jul 7, 2025
2 checks passed
@sergeyf sergeyf deleted the global_refactor branch July 7, 2025 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants