Skip to content

Fix correctness issues in Arabic normalization and prompt loading#3589

Open
RinZ27 wants to merge 1 commit intoEleutherAI:mainfrom
RinZ27:fix-correctness-and-leaks
Open

Fix correctness issues in Arabic normalization and prompt loading#3589
RinZ27 wants to merge 1 commit intoEleutherAI:mainfrom
RinZ27:fix-correctness-and-leaks

Conversation

@RinZ27
Copy link

@RinZ27 RinZ27 commented Feb 15, 2026

Several correctness issues were identified during a deep dive into the codebase, specifically affecting Arabic normalization, prompt loading, and logging hygiene.

Key changes:

  • Corrected the Arabic definite article removal regex in mlqa/utils.py. The previous regex had a misplaced caret and was overly aggressive, which could lead to corrupted word forms.
  • Added an else block in lm_eval/prompts/__init__.py to provide a clearer error message when an unknown prompt category is used, preventing a potential UnboundLocalError.
  • Removed a debug print(prompt) statement in med_prescriptions/utils.py to keep evaluation logs clean and protect potential PII in medical datasets.
  • Cleaned up redundant variable assignments (e.g., x = x) in the ruler task module to improve code clarity.

Verified the fixes by running the test_utils.py suite and confirmed everything passes correctly. These improvements directly benefit evaluation accuracy and project robustness.

@RinZ27 RinZ27 requested a review from baberabb as a code owner February 15, 2026 07:17
@CLAassistant
Copy link

CLAassistant commented Feb 15, 2026

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants