Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .pipelines/android_packaging.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
jobs:
- job: AndroidPackaging
pool:
vmImage: "macOS-13"
vmImage: "macOS-14"
timeoutInMinutes: 150
variables:
buildConfig: Release
Expand Down
16 changes: 8 additions & 8 deletions .pipelines/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ stages:
###########
- job: MacOSX
pool:
vmImage: 'macOS-13'
vmImage: 'macOS-14'

strategy:
matrix:
Expand Down Expand Up @@ -256,7 +256,7 @@ stages:
##############################
- job: MacOS_PPApiBuild
pool:
vmImage: 'macOS-13'
vmImage: 'macOS-14'

steps:
# compiled as only one operator selected.
Expand All @@ -272,7 +272,7 @@ stages:
#############
- job: MacOSPython
pool:
vmImage: 'macOS-13'
vmImage: 'macOS-14'

strategy:
matrix:
Expand Down Expand Up @@ -722,7 +722,7 @@ stages:
#############
- job: AndroidPackage_BuildOnly
pool:
vmImage: 'macOS-13'
vmImage: 'macOS-14'
timeoutInMinutes: 120
steps:
- task: UsePythonVersion@0
Expand Down Expand Up @@ -757,7 +757,7 @@ stages:

- job: AndroidCpp_BuildOnly
pool:
vmImage: 'macOS-13'
vmImage: 'macOS-14'
timeoutInMinutes: 45
steps:
- task: UsePythonVersion@0
Expand Down Expand Up @@ -796,7 +796,7 @@ stages:
#############
- job: IosPackage
pool:
vmImage: 'macOS-13'
vmImage: 'macOS-14'
timeoutInMinutes: 120
steps:
- template: templates/use-xcode-version.yml
Expand All @@ -822,7 +822,7 @@ stages:
--output_dir $(Build.BinariesDirectory)/xcframework_out \
--platform_arch iphonesimulator x86_64 \
--config RelWithDebInfo \
--ios_deployment_target 13.0 \
--ios_deployment_target 15.0 \
-- \
--enable_cxx_tests
displayName: "Build xcframework for iphonesimulator x86_64"
Expand All @@ -847,7 +847,7 @@ stages:
- script: |
set -e

SIMULATOR_DEVICE_INFO=$(python ./tools/ios/get_simulator_device_info.py)
SIMULATOR_DEVICE_INFO=$(python ./tools/ios/get_simulator_device_info.py --requested-runtime-version 18.2)

echo "Simulator device info:"
echo "${SIMULATOR_DEVICE_INFO}"
Expand Down
4 changes: 2 additions & 2 deletions .pipelines/ci_optional.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ stages:
#############
- job: AndroidPackage
pool:
vmImage: 'macOS-13'
vmImage: 'macOS-14'
timeoutInMinutes: 120
steps:
- task: UsePythonVersion@0
Expand Down Expand Up @@ -78,7 +78,7 @@ stages:

- job: AndroidCpp
pool:
vmImage: 'macOS-13'
vmImage: 'macOS-14'
timeoutInMinutes: 30
steps:
- task: UsePythonVersion@0
Expand Down
2 changes: 1 addition & 1 deletion .pipelines/ios_packaging.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
displayName: "iOS Packaging"

pool:
vmImage: "macOS-13"
vmImage: "macOS-14"

timeoutInMinutes: 180

Expand Down
2 changes: 1 addition & 1 deletion .pipelines/java_packaging.yml
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ stages:
workspace:
clean: all
pool:
vmImage: 'macOS-13'
vmImage: 'macOS-14'

steps:
- task: PowerShell@2
Expand Down
2 changes: 1 addition & 1 deletion .pipelines/templates/android-shared-lib-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ parameters:
jobs:
- job: Android_C_API_Packaging_${{ parameters.JobSuffix }}
pool:
vmImage: "macOS-13"
vmImage: "macOS-14"
timeoutInMinutes: 120
variables:
buildConfig: Release
Expand Down
2 changes: 1 addition & 1 deletion .pipelines/templates/build-package-for-android-aar.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ stages:
- Android_C_API_Packaging_armeabi_v7a
- Android_C_API_Packaging_arm64_v8a
pool:
vmImage: 'macOS-13'
vmImage: 'macOS-14'
timeoutInMinutes: 120
variables:
buildConfig: Release
Expand Down
12 changes: 6 additions & 6 deletions .pipelines/templates/build-package-for-ios-cocoapods.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,39 +16,39 @@ jobs:
parameters:
Platform: 'iphoneos'
IosArch: 'arm64'
IosVersion: '12.0'
IosVersion: '15.0'
IsReleaseBuild: ${{parameters.IsReleaseBuild}}
AdditionalBuildFlags: ${{parameters.AdditionalBuildFlags}}

- template: ios-framework-build.yml
parameters:
Platform: 'iphonesimulator'
IosArch: 'x86_64'
IosVersion: '12.0'
IosVersion: '15.0'
IsReleaseBuild: ${{parameters.IsReleaseBuild}}
AdditionalBuildFlags: ${{parameters.AdditionalBuildFlags}}

- template: ios-framework-build.yml
parameters:
Platform: 'iphonesimulator'
IosArch: 'arm64'
IosVersion: '12.0'
IosVersion: '15.0'
IsReleaseBuild: ${{parameters.IsReleaseBuild}}
AdditionalBuildFlags: ${{parameters.AdditionalBuildFlags}}

- template: ios-framework-build.yml
parameters:
Platform: 'maccatalyst'
IosArch: 'arm64'
IosVersion: '14.0'
IosVersion: '15.0'
IsReleaseBuild: ${{parameters.IsReleaseBuild}}
AdditionalBuildFlags: ${{parameters.AdditionalBuildFlags}}

- template: ios-framework-build.yml
parameters:
Platform: 'maccatalyst'
IosArch: 'x86_64'
IosVersion: '14.0'
IosVersion: '15.0'
IsReleaseBuild: ${{parameters.IsReleaseBuild}}
AdditionalBuildFlags: ${{parameters.AdditionalBuildFlags}}

Expand All @@ -62,7 +62,7 @@ jobs:
- IOS_C_API_Packaging_maccatalyst_x86_64
- IOS_C_API_Packaging_maccatalyst_arm64
pool:
vmImage: "macOS-13"
vmImage: "macOS-14"

timeoutInMinutes: 120

Expand Down
2 changes: 1 addition & 1 deletion .pipelines/templates/build-package-for-macosx.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ stages:
- MacOS_C_API_Packaging_CPU_arm64
- MacOS_C_API_Packaging_CPU_universal2
pool:
vmImage: 'macOS-13'
vmImage: 'macOS-14'
steps:
- task: DownloadPipelineArtifact@2
inputs:
Expand Down
2 changes: 1 addition & 1 deletion .pipelines/templates/ios-framework-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ jobs:
- job: IOS_C_API_Packaging_${{ parameters.Platform }}_${{ parameters.IosArch }}
condition: succeeded()
pool:
vmImage: "macOS-13"
vmImage: "macOS-14"
timeoutInMinutes: 120
variables:
buildConfig: Release
Expand Down
2 changes: 1 addition & 1 deletion .pipelines/templates/mac-shared-lib-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
MACOSX_DEPLOYMENT_TARGET: '11.0'
TODAY: $[format('{0:dd}{0:MM}{0:yyyy}', pipeline.startTime)]
pool:
vmImage: 'macOS-13'
vmImage: 'macOS-14'
timeoutInMinutes: 300

steps:
Expand Down
2 changes: 1 addition & 1 deletion .pipelines/templates/use-xcode-version.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
parameters:
- name: xcodeVersion
type: string
default: "14.3"
default: "16.2"

steps:
- bash: |
Expand Down
2 changes: 1 addition & 1 deletion .pipelines/wheels_macos.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ jobs:
- job: macos
timeoutInMinutes: 180
pool:
vmImage: 'macOS-13'
vmImage: 'macOS-14'
variables:
CIBW_BUILD: "cp3{8,9,10,11,12}-*"
CIBW_ARCHS_MACOS: "x86_64 universal2 arm64"
Expand Down
4 changes: 3 additions & 1 deletion cmake/ext_tests.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,9 @@ function(add_test_target)

xctest_add_bundle(${ARG_TARGET} ${dummy_testee_target})

xctest_add_test("${ARG_TARGET}_xctest" ${ARG_TARGET})
# Clear LIBRARY_OUTPUT_DIRECTORY to avoid duplicate PlugIns directory issue
# See: https://gitlab.kitware.com/cmake/cmake/-/issues/26301
set_property(TARGET ${ARG_TARGET} PROPERTY LIBRARY_OUTPUT_DIRECTORY)

target_sources(${ARG_TARGET} PRIVATE
${ARG_TEST_SOURCES}
Expand Down
37 changes: 33 additions & 4 deletions operators/tokenizer/case_encoder.h
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,18 @@ class CaseEncoder {
virtual ~CaseEncoder() {}
void SetNormalizer(Normalizer normalizer) { normalizer_ = normalizer; }

// Reset all state for a new tokenization call - call this before starting new input
void Reset() {
buffer_.clear();
buffer_queue_.clear();
signature_.clear();
offset_ = 0;
dump_buffer_from_ = -1;
state_ = 0;
spans_ = 0;
seen_three_spans_ = false;
}

public:
CaseEncoder(bool remove_extra_white_space) : remove_extra_white_space_(remove_extra_white_space) {}

Expand Down Expand Up @@ -177,42 +189,59 @@ class CaseEncoder {
norm_to_orig_temp.reserve(norm_to_orig->size());

const char* sig_it = signature_.data();
const char* sig_end = signature_.data() + signature_.length();

auto nrm_it = normalized->cbegin();
auto nrm_end = normalized->cend();
auto n2o_it = norm_to_orig->cbegin();
auto n2o_end = norm_to_orig->cend();

for (const auto& span : Search(signature_)) {
size_t len = std::distance(sig_it, span.first);

// Bounds check before advancing iterators
if (std::distance(nrm_it, nrm_end) < static_cast<ptrdiff_t>(len) ||
std::distance(n2o_it, n2o_end) < static_cast<ptrdiff_t>(len)) {
break;
}

normalized_temp.insert(normalized_temp.end(), nrm_it, nrm_it + len);
norm_to_orig_temp.insert(norm_to_orig_temp.end(), n2o_it, n2o_it + len);

sig_it += len;
nrm_it += len;
n2o_it += len;

// Bounds check before dereferencing
if (n2o_it == n2o_end) break;

normalized_temp.push_back(cAllUppercase);
norm_to_orig_temp.push_back(*n2o_it);

while (sig_it != span.second) {
if (sig_it >= sig_end || nrm_it >= nrm_end || n2o_it >= n2o_end) break;

if (*sig_it == cUppercase) {
sig_it++;
nrm_it++;
n2o_it++;
}
if (sig_it >= sig_end || nrm_it >= nrm_end || n2o_it >= n2o_end) break;

sig_it++;
normalized_temp.push_back(*nrm_it++);
norm_to_orig_temp.push_back(*n2o_it++);
}
if (sig_it != signature_.data() + signature_.length()) {
if (*sig_it != cUppercase) {
if (sig_it != sig_end) {
if (*sig_it != cUppercase && n2o_it != n2o_end) {
normalized_temp.push_back(cLowercase);
norm_to_orig_temp.push_back(*n2o_it);
}
}
}

if (nrm_it != normalized->cend()) normalized_temp.insert(normalized_temp.end(), nrm_it, normalized->cend());
if (n2o_it != norm_to_orig->cend()) norm_to_orig_temp.insert(norm_to_orig_temp.end(), n2o_it, norm_to_orig->cend());
if (nrm_it != nrm_end) normalized_temp.insert(normalized_temp.end(), nrm_it, nrm_end);
if (n2o_it != n2o_end) norm_to_orig_temp.insert(norm_to_orig_temp.end(), n2o_it, n2o_end);

normalized->swap(normalized_temp);
norm_to_orig->swap(norm_to_orig_temp);
Expand Down
11 changes: 11 additions & 0 deletions operators/tokenizer/ugm_kernels.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -288,6 +288,14 @@ struct SpmUgmTokenizer {

OrtxStatus ComputeNoOp(const std::string& input, std::vector<extTokenId_t>& output,
bool add_special_tokens = true) const {
// Validate UTF-8 encoding before processing
ptrdiff_t validation_result = ustring::ValidateUTF8(input);
if (validation_result < 0) {
return OrtxStatus(extError_t::kOrtxErrorInvalidArgument,
"Invalid UTF-8 encoding detected in input string at position " +
std::to_string(-validation_result));
}

std::string normalized;
if (case_encoder_) {
normalized = NmtNormalize(input);
Expand Down Expand Up @@ -535,6 +543,9 @@ struct SpmUgmTokenizer {
}

std::string NmtNormalize(const std::string& input) const {
// Reset the case encoder state before starting new normalization
case_encoder_->Reset();

std::string normalized;
normalized.reserve(input.size() * 3);
std::vector<size_t> norm_to_orig(input.size() * 3);
Expand Down
2 changes: 1 addition & 1 deletion test/ios/OrtExtensionsUsage/Podfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ target 'OrtExtensionsUsage' do
# Comment the next line if you don't want to use dynamic frameworks
use_frameworks!

platform :ios, '13.0'
platform :ios, '15.0'

# Pods for OrtExtensionsUsage

Expand Down
4 changes: 2 additions & 2 deletions test/pp_api_test/test_imgcodec.cc
Original file line number Diff line number Diff line change
Expand Up @@ -108,11 +108,11 @@ TEST(ImageDecoderTest, TestJpegEncoderDecoder) {
#elif __APPLE__
out_range = out_tensor.Data() + 1296 * 3;
ASSERT_EQ(std::vector<uint8_t>(out_range, out_range + 12),
std::vector<uint8_t>({225, 236, 222, 228, 235, 219, 218, 220, 199, 203, 201, 178}));
std::vector<uint8_t>({225, 236, 222, 225, 236, 222, 221, 219, 196, 203, 201, 178}));

out_range = out_tensor.Data() + 438 * width * 3;
ASSERT_EQ(std::vector<uint8_t>(out_range, out_range + 12),
std::vector<uint8_t>({84, 68, 53, 86, 70, 55, 92, 76, 59, 101, 86, 65}));
std::vector<uint8_t>({84, 68, 53, 86, 70, 55, 92, 77, 58, 101, 86, 67}));

out_range = out_tensor.Data() + 875 * width * 3 + 1296 * 3;
ASSERT_EQ(std::vector<uint8_t>(out_range, out_range + 12),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ extends:
parameters:
pool:
name: "Azure Pipelines"
image: "macOS-13"
image: "macOS-14"
os: macOS
sdl:
sourceAnalysisPool:
Expand Down
2 changes: 1 addition & 1 deletion tools/ci_build/github/azure-pipeline/ios_packaging.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ extends:
parameters:
pool:
name: "Azure Pipelines"
image: "macOS-13"
image: "macOS-14"
os: macOS
sdl:
sourceAnalysisPool:
Expand Down
Loading