Skip to content

Fix access violation for MarianTokenizer#1016

Merged
apsonawane merged 11 commits intomainfrom
asonawane/marian
Dec 10, 2025
Merged

Fix access violation for MarianTokenizer#1016
apsonawane merged 11 commits intomainfrom
asonawane/marian

Conversation

@apsonawane
Copy link
Contributor

@apsonawane apsonawane commented Dec 8, 2025

This pull request updates the CI/CD pipeline configuration and improves the robustness of the tokenizer codebase. The main theme is upgrading macOS build agents from version 13 to 15 across all pipeline YAML files, ensuring compatibility with newer macOS features and Xcode versions. Additionally, there are targeted improvements to tokenization logic, including better state management and input validation, and a minor test expectation update for JPEG decoding.

Pipeline and Build Environment Updates

  • Updated all pipeline YAML files (.pipelines/*.yml, tools/ci_build/github/azure-pipeline/*.yml) to use macOS-15 build agents instead of macOS-13, ensuring builds run on the latest macOS infrastructure. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21]
  • Updated default Xcode version in use-xcode-version.yml template from 14.3 to 16.1 to support the latest toolchain and macOS features.

Tokenizer Robustness Improvements

  • Added a Reset method to CaseEncoder and ensured it is called before each normalization, preventing state leakage between tokenization calls. [1] [2]
  • Added bounds checking and iterator validation to CaseEncoder::Normalize to prevent out-of-bounds errors during normalization.
  • Added UTF-8 input validation in SpmUgmTokenizer::ComputeNoOp, returning an error if invalid input is detected.

Testing and CMake Fixes

  • Updated JPEG decoder test expectations for Apple platforms to reflect correct output values.
  • Fixed a CMake configuration issue in ext_tests.cmake to avoid duplicate PlugIns directory by clearing LIBRARY_OUTPUT_DIRECTORY property.

@apsonawane apsonawane requested a review from a team as a code owner December 8, 2025 21:23
@apsonawane apsonawane force-pushed the asonawane/marian branch 5 times, most recently from 14cbc58 to 9e90066 Compare December 8, 2025 23:44
@apsonawane apsonawane enabled auto-merge (squash) December 8, 2025 23:57
@apsonawane apsonawane force-pushed the asonawane/marian branch 2 times, most recently from ff5eaf8 to 29570f2 Compare December 9, 2025 18:50
@apsonawane apsonawane merged commit 2fbe0eb into main Dec 10, 2025
37 checks passed
@apsonawane apsonawane deleted the asonawane/marian branch December 10, 2025 05:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants