Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/workflows/linux-cpu-x64-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,11 @@ jobs:
with:
gradle-version: '8.6'

- uses: actions/setup-python@v6
with:
python-version: '3.11.x'
architecture: 'x64'

- uses: microsoft/onnxruntime-github-actions/setup-build-tools@v0.0.8
with:
vcpkg-version: '2025.03.19'
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/linux-gpu-x64-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ env:
jobs:
linux-cuda-x64-build:
env:
PYTHON_EXECUTABLE: "/opt/python/cp310-cp310/bin/python3.10"
PYTHON_EXECUTABLE: "/opt/python/cp311-cp311/bin/python3.11"
runs-on: ["self-hosted", "1ES.Pool=onnxruntime-genai-Ubuntu2204-A10"]
steps:
- name: Checkout OnnxRuntime GenAI repo
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/win-cpu-arm64-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ jobs:
# Uninstalling LLVM/Clang as it is no longer required and causes issues with numpy installation
choco uninstall llvm --yes
python -m pip install "numpy<2" coloredlogs flatbuffers packaging protobuf sympy pytest
python -m pip install onnxruntime-qnn
python -m pip install onnxruntime-qnn==1.25.0.dev20260126001 -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to specify a version here or would not listing a version and using the latest nightly package be sufficient?

python -m pip install (Get-ChildItem ("$env:binaryDir\wheel\*.whl")) --no-deps
- name: Run the Python Tests
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/win-directml-x64-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ jobs:
run: |
$resp = Invoke-RestMethod "${{ env.ORT_NIGHTLY_REST_API }}"
# $ORT_NIGHTLY_VERSION = $resp.value[0].versions[0].normalizedVersion
$ORT_NIGHTLY_VERSION = "1.23.0"
$ORT_NIGHTLY_VERSION = "1.25.0-dev-20260125-0556-727db0d3dc"
Write-Host "$ORT_NIGHTLY_VERSION"
"ORT_NIGHTLY_VERSION=$ORT_NIGHTLY_VERSION" | Out-File -FilePath $env:GITHUB_ENV -Append
Expand Down
6 changes: 3 additions & 3 deletions .pipelines/nuget-publishing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ parameters:
- name: ort_version
displayName: 'OnnxRuntime version'
type: string
default: '1.23.0'
default: '1.25.0-dev-20260125-1205-727db0d3dc'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are default values for publishing the official packages for ORT GenAI. I think we should keep the defaults as a stable version of ORT. We can always override what version of ORT to use when the packages are built.


- name: ort_winml_version
displayName: 'Microsoft.WindowsAppSDK.ML Version (should match CMakeList.txt)'
Expand All @@ -71,12 +71,12 @@ parameters:
- name: ort_cuda_version
displayName: 'OnnxRuntime GPU version'
type: string
default: '1.23.0'
default: '1.25.0-dev-20260125-0617-727db0d3dc'

- name: ort_dml_version
displayName: 'OnnxRuntime DML version'
type: string
default: '1.23.0'
default: '1.25.0-dev-20260125-0556-727db0d3dc'

- name: cuda_version
displayName: 'CUDA version'
Expand Down
8 changes: 4 additions & 4 deletions cmake/ortlib.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -81,16 +81,16 @@ if(ORT_HOME)
endif()
else()
# If ORT_HOME is not specified, download the onnxruntime headers and libraries from the nightly feed
set(ORT_VERSION "1.23.0")
set(ORT_VERSION "1.25.0-dev-20260125-1205-727db0d3dc")
set(ORT_FEED_ORG_NAME "aiinfra")
set(ORT_FEED_PROJECT "2692857e-05ef-43b4-ba9c-ccf1c22c437c")
set(ORT_NIGHTLY_FEED_ID "7982ae20-ed19-4a35-a362-a96ac99897b7")

if (USE_DML)
set(ORT_VERSION "1.23.0")
set(ORT_VERSION "1.25.0-dev-20260125-0556-727db0d3dc")
set(ORT_PACKAGE_NAME "Microsoft.ML.OnnxRuntime.DirectML")
elseif(USE_CUDA)
set(ORT_VERSION "1.23.0")
set(ORT_VERSION "1.25.0-dev-20260125-0617-727db0d3dc")
if(CMAKE_SYSTEM_NAME STREQUAL "Linux")
set(ORT_PACKAGE_NAME "Microsoft.ML.OnnxRuntime.Gpu.Linux")
elseif(WIN32)
Expand All @@ -99,7 +99,7 @@ else()
message(FATAL_ERROR "Unsupported platform for CUDA")
endif()
elseif(USE_ROCM)
set(ORT_VERSION "1.23.0")
set(ORT_VERSION "1.25.0-dev-20260125-0617-727db0d3dc")
set(ORT_PACKAGE_NAME "Microsoft.ML.OnnxRuntime.Rocm")
else()
set(ORT_PACKAGE_NAME "Microsoft.ML.OnnxRuntime")
Expand Down
23 changes: 23 additions & 0 deletions documents/Runtime_option.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,26 @@ To recover from a terminated state, use this key value pair: ("terminate_session
Key: "terminate_session"

Accepted values: ("0", "1")

## Enable Profiling

Enable Profiling is a runtime option to dynamically enable or disable ONNX Runtime profiling during generation. Once enabled, each subsequent token generation will produce profiling data saved to a separate JSON file. You can stop profiling at any time.

To enable profiling with default file prefix "onnxruntime_run_profile", use this key value pair: ("enable_profiling", "1")

To disable profiling, use this key value pair: ("enable_profiling", "0")

To enable profiling with a custom file prefix, use this key value pair: ("enable_profiling", "<your_custom_prefix>")

Key: "enable_profiling"

Accepted values: ("0", "1", or a custom profile file prefix string)

Note: Difference from SessionOptions `enable_profiling` in genai_config.json

The `enable_profiling` option in `genai_config.json` under `SessionOptions` is a session-level configuration. When enabled, it collects all profiling data from session creation to session end and aggregates them into a single JSON file. This configuration cannot be started or stopped dynamically during inference.

In contrast, ``enable_profiling` in runtime option provides dynamic control:
- Can be enabled or disabled at any point during generation
- Each token generation produces its own profiling file when enabled
- Useful for profiling specific portions of the generation process
6 changes: 3 additions & 3 deletions examples/slm_engine/build_scripts/build_deps.py
Original file line number Diff line number Diff line change
Expand Up @@ -577,9 +577,9 @@ def main():
ort_home = None
if args.build_ort_from_source:
if args.ort_version_to_use is None:
# If not Windows then use 1.23.0
# If not Windows then use 1.25.0-dev-20260125-1205-727db0d3dc
if platform.system() != "Windows":
args.ort_version_to_use = "v1.23.0"
args.ort_version_to_use = "v1.25.0-dev-20260125-1205-727db0d3dc"
else:
args.ort_version_to_use = "main"
ort_home = build_ort(args, dep_src_dir, artifacts_dir)
Expand All @@ -590,7 +590,7 @@ def main():
# The ORT binaries are available as they were downloaded during the GenAI build
# This is the supported version for most platforms
if args.ort_version_to_use is None:
ORT_VERSION = "1.23.0"
ORT_VERSION = "1.25.0-dev-20260125-1205-727db0d3dc"
else:
ORT_VERSION = args.ort_version_to_use
# Copy the ORT artifacts to the artifacts directory.
Expand Down
13 changes: 13 additions & 0 deletions src/models/model.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,19 @@ void State::SetRunOption(const char* key, const char* value) {
throw std::runtime_error(std::string("terminate_session key value unexpected: ") + value);
}
return;
} else if (strcmp(key, "enable_profiling") == 0) {
if (strcmp(value, "0") == 0) {
run_options_->DisableProfiling();
} else if (strcmp(value, "1") == 0) {
run_options_->EnableProfiling(ORT_TSTR("onnxruntime_run_profile"));
} else {
auto ToProfileString = [](const char* s) -> std::basic_string<ORTCHAR_T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Why is converting a char* to basic_string<ORTCHAR_T> before going back to a char* needed?
  2. Can you add a comment here to explain that this else condition is for a custom prefix for the log file?
  3. Can the else if and else conditions be merged since they both enable profiling?

std::string str(s);
return std::basic_string<ORTCHAR_T>(str.begin(), str.end());
};
run_options_->EnableProfiling(ToProfileString(value).c_str());
}
return;
}
run_options_->AddConfigEntry(key, value);
}
Expand Down
4 changes: 3 additions & 1 deletion src/models/onnxruntime_api.h
Original file line number Diff line number Diff line change
Expand Up @@ -555,7 +555,9 @@ struct OrtRunOptions {
*/
OrtRunOptions& UnsetTerminate();

OrtRunOptions& AddActiveLoraAdapter(const OrtLoraAdapter& adapter); ///< Wraps OrtApi::RunOptionsSetActiveLoraAdapter
OrtRunOptions& AddActiveLoraAdapter(const OrtLoraAdapter& adapter); ///< Wraps OrtApi::RunOptionsSetActiveLoraAdapter
OrtRunOptions& EnableProfiling(const ORTCHAR_T* profile_file_prefix); ///< Wraps OrtApi::RunOptionsEnableProfiling
OrtRunOptions& DisableProfiling(); ///< Wraps OrtApi::RunOptionsDisableProfiling

static void operator delete(void* p) { Ort::api->ReleaseRunOptions(reinterpret_cast<OrtRunOptions*>(p)); }
Ort::Abstract make_abstract;
Expand Down
10 changes: 10 additions & 0 deletions src/models/onnxruntime_inline.h
Original file line number Diff line number Diff line change
Expand Up @@ -532,6 +532,16 @@ inline OrtRunOptions& OrtRunOptions::AddActiveLoraAdapter(const OrtLoraAdapter&
return *this;
}

inline OrtRunOptions& OrtRunOptions::EnableProfiling(const ORTCHAR_T* profile_file_prefix) {
Ort::ThrowOnError(Ort::api->RunOptionsEnableProfiling(this, profile_file_prefix));
return *this;
}

inline OrtRunOptions& OrtRunOptions::DisableProfiling() {
Ort::ThrowOnError(Ort::api->RunOptionsDisableProfiling(this));
return *this;
}

inline std::unique_ptr<OrtCUDAProviderOptionsV2> OrtCUDAProviderOptionsV2::Create() {
OrtCUDAProviderOptionsV2* p;
Ort::ThrowOnError(Ort::api->CreateCUDAProviderOptions(&p));
Expand Down
7 changes: 6 additions & 1 deletion src/python/python.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,10 @@ struct PyGenerator {
generator_->SetActiveAdapter(adapters, adapter_name.c_str());
}

void SetRuntimeOption(const std::string& key, const std::string& value) {
generator_->SetRuntimeOption(key.c_str(), value.c_str());
}

private:
std::unique_ptr<OgaGenerator> generator_;
};
Expand Down Expand Up @@ -467,7 +471,8 @@ PYBIND11_MODULE(onnxruntime_genai, m) {
.def("rewind_to", &PyGenerator::RewindTo)
.def("get_next_tokens", &PyGenerator::GetNextTokens)
.def("get_sequence", &PyGenerator::GetSequence)
.def("set_active_adapter", &PyGenerator::SetActiveAdapter);
.def("set_active_adapter", &PyGenerator::SetActiveAdapter)
.def("set_runtime_option", &PyGenerator::SetRuntimeOption);

pybind11::class_<OgaImages>(m, "Images")
.def_static("open", [](pybind11::args image_paths) {
Expand Down
3 changes: 2 additions & 1 deletion test/python/cpu/ort/requirements.txt
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
onnxruntime==1.23.0
-i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/
onnxruntime==1.25.0.dev20260126001
3 changes: 2 additions & 1 deletion test/python/cuda/ort/requirements.txt
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
onnxruntime-gpu==1.23.0
-i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/
onnxruntime-gpu==1.25.0.dev20260123001
3 changes: 2 additions & 1 deletion test/python/directml/ort/requirements.txt
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
onnxruntime-directml==1.23.0
-i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/
onnxruntime-directml==1.25.0.dev20260125001
3 changes: 2 additions & 1 deletion test/python/macos/ort/requirements.txt
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
onnxruntime==1.23.0
-i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/
onnxruntime==1.25.0.dev20260126001
2 changes: 1 addition & 1 deletion test/python/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@ sympy
pytest
onnx
onnx_ir>=0.1.3
transformers
transformers<5.0.0
huggingface_hub[cli]
Loading