Fix access violation for MarianTokenizer#1016
Merged
apsonawane merged 11 commits intomainfrom Dec 10, 2025
Merged
Conversation
74e11df to
4c7369a
Compare
14cbc58 to
9e90066
Compare
9e90066 to
8fb6f11
Compare
ff5eaf8 to
29570f2
Compare
29570f2 to
00215e2
Compare
3662323 to
f41a912
Compare
hanbitmyths
approved these changes
Dec 10, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request updates the CI/CD pipeline configuration and improves the robustness of the tokenizer codebase. The main theme is upgrading macOS build agents from version 13 to 15 across all pipeline YAML files, ensuring compatibility with newer macOS features and Xcode versions. Additionally, there are targeted improvements to tokenization logic, including better state management and input validation, and a minor test expectation update for JPEG decoding.
Pipeline and Build Environment Updates
.pipelines/*.yml,tools/ci_build/github/azure-pipeline/*.yml) to usemacOS-15build agents instead ofmacOS-13, ensuring builds run on the latest macOS infrastructure. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21]use-xcode-version.ymltemplate from14.3to16.1to support the latest toolchain and macOS features.Tokenizer Robustness Improvements
Resetmethod toCaseEncoderand ensured it is called before each normalization, preventing state leakage between tokenization calls. [1] [2]CaseEncoder::Normalizeto prevent out-of-bounds errors during normalization.SpmUgmTokenizer::ComputeNoOp, returning an error if invalid input is detected.Testing and CMake Fixes
ext_tests.cmaketo avoid duplicate PlugIns directory by clearingLIBRARY_OUTPUT_DIRECTORYproperty.