Skip to content

Conversation

@xiaofeihan1
Copy link
Contributor

@xiaofeihan1 xiaofeihan1 commented Jan 19, 2026

Background

Previously, developers could only enable profiling through genai_config.json, which profiles the entire lifetime of a session—from session creation to session destruction. When the number of runs is large (for example, generating ~3000 tokens during inference), the profiling output can become excessively large, causing profiling to fail due to oversized files.

2025-12-17 11:17:36.496 Python[9861:2179219] 2025-12-17 11:17:36.493043 [E:onnxruntime:onnxruntime-genai, profiler.cc:93 EndTimeAndRecordEvent] Maximum number of events reached, could not record profile event.

Description

This PR addresses this limitation by introducing run-level profiling. We expose enable_profiling in RuntimeOptions. Developers can now enable profiling for specific runs only.

With the following code, two profiling JSON files will be generated using the run_profiler_file prefix:
• one containing profiling data for the 100th run
• another containing profiling data for the 101th run

while not generator.is_done():
    if len(new_tokens) == 100 or len(new_tokens) == 101 :
        generator.set_runtime_option("enable_profiling", "run_profiler_file")
    else:
        generator.set_runtime_option("enable_profiling", "0")
    generator.generate_next_token()
    new_token = generator.get_next_tokens()[0]
    new_tokens.append(new_token)

@xiaofeihan1 xiaofeihan1 marked this pull request as ready for review January 27, 2026 15:01
choco uninstall llvm --yes
python -m pip install "numpy<2" coloredlogs flatbuffers packaging protobuf sympy pytest
python -m pip install onnxruntime-qnn
python -m pip install onnxruntime-qnn==1.25.0.dev20260126001 -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to specify a version here or would not listing a version and using the latest nightly package be sufficient?

displayName: 'OnnxRuntime version'
type: string
default: '1.23.0'
default: '1.25.0-dev-20260125-1205-727db0d3dc'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are default values for publishing the official packages for ORT GenAI. I think we should keep the defaults as a stable version of ORT. We can always override what version of ORT to use when the packages are built.

} else if (strcmp(value, "1") == 0) {
run_options_->EnableProfiling(ORT_TSTR("onnxruntime_run_profile"));
} else {
auto ToProfileString = [](const char* s) -> std::basic_string<ORTCHAR_T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Why is converting a char* to basic_string<ORTCHAR_T> before going back to a char* needed?
  2. Can you add a comment here to explain that this else condition is for a custom prefix for the log file?
  3. Can the else if and else conditions be merged since they both enable profiling?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants