-
Notifications
You must be signed in to change notification settings - Fork 461
feat(openai): add parse hooks for chat completions and responses #14824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 241 ± 2 ms. The average import time from base is: 242 ± 2 ms. The import time difference between this PR and base is: -1.69 ± 0.07 ms. Import time breakdownThe following import paths have shrunk:
|
ba86c1d
to
439cc4a
Compare
Performance SLOsComparing candidate alex/feat/openai-parse (5520b5c) with baseline main (646d5ab) 🟡 Near SLO Breach (5 suites)🟡 djangosimple - 30/30✅ appsecTime: ✅ 20.486ms (SLO: <22.300ms -8.1%) vs baseline: ~same Memory: ✅ 65.493MB (SLO: <67.000MB -2.2%) vs baseline: +5.0% ✅ exception-replay-enabledTime: ✅ 1.344ms (SLO: <1.450ms -7.3%) vs baseline: +0.3% Memory: ✅ 64.541MB (SLO: <67.000MB -3.7%) vs baseline: +4.9% ✅ iastTime: ✅ 20.477ms (SLO: <22.250ms -8.0%) vs baseline: -0.3% Memory: ✅ 65.434MB (SLO: <67.000MB -2.3%) vs baseline: +4.8% ✅ profilerTime: ✅ 15.383ms (SLO: <16.550ms -7.1%) vs baseline: +1.0% Memory: ✅ 53.733MB (SLO: <54.500MB 🟡 -1.4%) vs baseline: +4.8% ✅ resource-renamingTime: ✅ 20.487ms (SLO: <21.750ms -5.8%) vs baseline: -0.3% Memory: ✅ 65.423MB (SLO: <67.000MB -2.4%) vs baseline: +4.7% ✅ span-code-originTime: ✅ 26.150ms (SLO: <28.200ms -7.3%) vs baseline: -0.4% Memory: ✅ 67.962MB (SLO: <69.500MB -2.2%) vs baseline: +4.8% ✅ tracerTime: ✅ 20.440ms (SLO: <21.750ms -6.0%) vs baseline: -0.4% Memory: ✅ 65.445MB (SLO: <67.000MB -2.3%) vs baseline: +4.8% ✅ tracer-and-profilerTime: ✅ 22.069ms (SLO: <23.500ms -6.1%) vs baseline: +0.3% Memory: ✅ 66.653MB (SLO: <67.500MB 🟡 -1.3%) vs baseline: +5.0% ✅ tracer-dont-create-db-spansTime: ✅ 19.244ms (SLO: <21.500ms 📉 -10.5%) vs baseline: -0.7% Memory: ✅ 65.424MB (SLO: <66.000MB 🟡 -0.9%) vs baseline: +4.9% ✅ tracer-minimalTime: ✅ 16.600ms (SLO: <17.500ms -5.1%) vs baseline: ~same Memory: ✅ 65.406MB (SLO: <66.000MB 🟡 -0.9%) vs baseline: +4.8% ✅ tracer-nativeTime: ✅ 20.503ms (SLO: <21.750ms -5.7%) vs baseline: ~same Memory: ✅ 71.433MB (SLO: <72.500MB 🟡 -1.5%) vs baseline: +4.9% ✅ tracer-no-cachesTime: ✅ 18.371ms (SLO: <19.650ms -6.5%) vs baseline: -0.5% Memory: ✅ 65.433MB (SLO: <67.000MB -2.3%) vs baseline: +4.8% ✅ tracer-no-databasesTime: ✅ 18.804ms (SLO: <20.100ms -6.4%) vs baseline: ~same Memory: ✅ 65.447MB (SLO: <67.000MB -2.3%) vs baseline: +4.9% ✅ tracer-no-middlewareTime: ✅ 20.159ms (SLO: <21.500ms -6.2%) vs baseline: -0.2% Memory: ✅ 65.405MB (SLO: <67.000MB -2.4%) vs baseline: +4.9% ✅ tracer-no-templatesTime: ✅ 20.340ms (SLO: <22.000ms -7.5%) vs baseline: +0.4% Memory: ✅ 65.496MB (SLO: <67.000MB -2.2%) vs baseline: +4.9% 🟡 errortrackingdjangosimple - 6/6✅ errortracking-enabled-allTime: ✅ 17.996ms (SLO: <19.850ms -9.3%) vs baseline: -0.4% Memory: ✅ 65.136MB (SLO: <66.500MB -2.1%) vs baseline: +4.7% ✅ errortracking-enabled-userTime: ✅ 18.014ms (SLO: <19.400ms -7.1%) vs baseline: -0.5% Memory: ✅ 65.294MB (SLO: <66.500MB 🟡 -1.8%) vs baseline: +4.9% ✅ tracer-enabledTime: ✅ 18.174ms (SLO: <19.450ms -6.6%) vs baseline: +0.6% Memory: ✅ 65.274MB (SLO: <66.500MB 🟡 -1.8%) vs baseline: +5.0% 🟡 flasksimple - 18/18✅ appsec-getTime: ✅ 4.573ms (SLO: <4.750ms -3.7%) vs baseline: -0.4% Memory: ✅ 61.971MB (SLO: <65.000MB -4.7%) vs baseline: +5.0% ✅ appsec-postTime: ✅ 6.600ms (SLO: <6.750ms -2.2%) vs baseline: ~same Memory: ✅ 61.912MB (SLO: <65.000MB -4.8%) vs baseline: +4.7% ✅ appsec-telemetryTime: ✅ 4.577ms (SLO: <4.750ms -3.6%) vs baseline: -0.3% Memory: ✅ 61.932MB (SLO: <65.000MB -4.7%) vs baseline: +4.9% ✅ debuggerTime: ✅ 1.854ms (SLO: <2.000ms -7.3%) vs baseline: -0.1% Memory: ✅ 45.456MB (SLO: <47.000MB -3.3%) vs baseline: +4.9% ✅ iast-getTime: ✅ 1.882ms (SLO: <2.000ms -5.9%) vs baseline: +0.9% Memory: ✅ 42.408MB (SLO: <49.000MB 📉 -13.5%) vs baseline: +4.9% ✅ profilerTime: ✅ 1.913ms (SLO: <2.100ms -8.9%) vs baseline: ~same Memory: ✅ 46.439MB (SLO: <47.000MB 🟡 -1.2%) vs baseline: +4.6% ✅ resource-renamingTime: ✅ 3.379ms (SLO: <3.650ms -7.4%) vs baseline: +0.2% Memory: ✅ 52.199MB (SLO: <53.500MB -2.4%) vs baseline: +4.8% ✅ tracerTime: ✅ 3.372ms (SLO: <3.650ms -7.6%) vs baseline: -0.4% Memory: ✅ 52.239MB (SLO: <53.500MB -2.4%) vs baseline: +5.0% ✅ tracer-nativeTime: ✅ 3.370ms (SLO: <3.650ms -7.7%) vs baseline: -0.1% Memory: ✅ 58.170MB (SLO: <60.000MB -3.0%) vs baseline: +4.6% 🟡 otelspan - 22/22✅ add-eventTime: ✅ 42.397ms (SLO: <47.150ms 📉 -10.1%) vs baseline: -0.1% Memory: ✅ 44.422MB (SLO: <47.000MB -5.5%) vs baseline: +4.7% ✅ add-metricsTime: ✅ 319.386ms (SLO: <344.800ms -7.4%) vs baseline: +0.8% Memory: ✅ 595.145MB (SLO: <600.000MB 🟡 -0.8%) vs baseline: +4.8% ✅ add-tagsTime: ✅ 286.310ms (SLO: <314.000ms -8.8%) vs baseline: -1.1% Memory: ✅ 596.902MB (SLO: <600.000MB 🟡 -0.5%) vs baseline: +4.8% ✅ get-contextTime: ✅ 80.661ms (SLO: <92.350ms 📉 -12.7%) vs baseline: +0.1% Memory: ✅ 40.036MB (SLO: <46.500MB 📉 -13.9%) vs baseline: +5.3% ✅ is-recordingTime: ✅ 38.873ms (SLO: <44.500ms 📉 -12.6%) vs baseline: -0.3% Memory: ✅ 43.973MB (SLO: <47.500MB -7.4%) vs baseline: +5.0% ✅ record-exceptionTime: ✅ 58.465ms (SLO: <67.650ms 📉 -13.6%) vs baseline: -0.3% Memory: ✅ 40.260MB (SLO: <47.000MB 📉 -14.3%) vs baseline: +4.9% ✅ set-statusTime: ✅ 44.886ms (SLO: <50.400ms 📉 -10.9%) vs baseline: -0.1% Memory: ✅ 43.907MB (SLO: <47.000MB -6.6%) vs baseline: +4.6% ✅ startTime: ✅ 38.099ms (SLO: <43.450ms 📉 -12.3%) vs baseline: ~same Memory: ✅ 43.993MB (SLO: <47.000MB -6.4%) vs baseline: +4.9% ✅ start-finishTime: ✅ 82.578ms (SLO: <88.000ms -6.2%) vs baseline: ~same Memory: ✅ 34.564MB (SLO: <46.500MB 📉 -25.7%) vs baseline: +4.9% ✅ start-finish-telemetryTime: ✅ 84.104ms (SLO: <89.000ms -5.5%) vs baseline: +0.5% Memory: ✅ 34.564MB (SLO: <46.500MB 📉 -25.7%) vs baseline: +5.0% ✅ update-nameTime: ✅ 40.325ms (SLO: <45.150ms 📉 -10.7%) vs baseline: +0.3% Memory: ✅ 44.195MB (SLO: <47.000MB -6.0%) vs baseline: +4.9% 🟡 span - 26/26✅ add-eventTime: ✅ 20.653ms (SLO: <22.500ms -8.2%) vs baseline: -0.3% Memory: ✅ 50.353MB (SLO: <53.000MB -5.0%) vs baseline: +5.0% ✅ add-metricsTime: ✅ 90.865ms (SLO: <93.500ms -2.8%) vs baseline: ~same Memory: ✅ 661.197MB (SLO: <961.000MB 📉 -31.2%) vs baseline: +4.8% ✅ add-tagsTime: ✅ 148.181ms (SLO: <155.000ms -4.4%) vs baseline: ~same Memory: ✅ 661.271MB (SLO: <962.500MB 📉 -31.3%) vs baseline: +4.8% ✅ get-contextTime: ✅ 19.418ms (SLO: <20.500ms -5.3%) vs baseline: ~same Memory: ✅ 49.168MB (SLO: <53.000MB -7.2%) vs baseline: +4.7% ✅ is-recordingTime: ✅ 19.668ms (SLO: <20.500ms -4.1%) vs baseline: +0.2% Memory: ✅ 49.109MB (SLO: <53.000MB -7.3%) vs baseline: +4.6% ✅ record-exceptionTime: ✅ 38.480ms (SLO: <40.000ms -3.8%) vs baseline: +0.3% Memory: ✅ 42.749MB (SLO: <53.000MB 📉 -19.3%) vs baseline: +4.7% ✅ set-statusTime: ✅ 21.295ms (SLO: <22.000ms -3.2%) vs baseline: +0.1% Memory: ✅ 49.168MB (SLO: <53.000MB -7.2%) vs baseline: +4.9% ✅ startTime: ✅ 19.271ms (SLO: <20.500ms -6.0%) vs baseline: +0.2% Memory: ✅ 49.162MB (SLO: <53.000MB -7.2%) vs baseline: +4.8% ✅ start-finishTime: ✅ 51.442ms (SLO: <52.500ms -2.0%) vs baseline: +0.1% Memory: ✅ 32.047MB (SLO: <34.000MB -5.7%) vs baseline: +4.7% ✅ start-finish-telemetryTime: ✅ 52.766ms (SLO: <54.500ms -3.2%) vs baseline: ~same Memory: ✅ 32.126MB (SLO: <34.000MB -5.5%) vs baseline: +4.8% ✅ start-finish-traceid128Time: ✅ 55.029ms (SLO: <56.000ms 🟡 -1.7%) vs baseline: +0.6% Memory: ✅ 32.145MB (SLO: <34.000MB -5.5%) vs baseline: +4.8% ✅ start-traceid128Time: ✅ 19.653ms (SLO: <22.500ms 📉 -12.7%) vs baseline: +0.1% Memory: ✅ 49.091MB (SLO: <53.000MB -7.4%) vs baseline: +4.8% ✅ update-nameTime: ✅ 20.230ms (SLO: <22.000ms -8.0%) vs baseline: ~same Memory: ✅ 49.761MB (SLO: <53.000MB -6.1%) vs baseline: +4.7%
|
87ca2e5
to
caa8f9f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @PROFeNoM! Just a couple clarifying questions but otherwise should be good to stamp once addressed
733a49d
to
5b5197c
Compare
…perations + generate cassettes/snapshots
5b5197c
to
580336a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @PROFeNoM! 🚢
Venv( | ||
pys=select_pys(min_version="3.8", max_version="3.13"), | ||
pkgs={ | ||
"openai": [latest, "~=1.76.2", "==1.66.0"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intentional to drop testing latest
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If my understanding is correct, the CI was currently testing v1.91.0 (see requirement files). However, I needed 1.92.0+ for the parse methods to be available.
Given that v2 of the openai client got released, I deemed it OK to separate things in two PRs. There might have been breaking changes that make the v1 test fail, which could require changes beyond the initial scope of this PR.
If you prefer, I can try to look at it in this PR though 👍
Description
Adds instrumentation support for OpenAI's
parse()
methods for chat completions and responses, out of beta since version 1.92.0.Implementation
Creates two new hook classes that inherit from existing ones:
_ChatCompletionParseHook
extends_ChatCompletionHook
_ResponseParseHook
extends_ResponseHook
Each only overrides the
OPERATION_ID
to create distinct span names (parseChatCompletion
andparseResponse
) in APM traces while inheriting all other functionality (streaming, error handling, token counting, metadata, etc.).Also, the patching logic now checks for endpoint existence before wrapping by using deep_getattr on the appropriate dotted paths for sync/async resources. This ensures we only wrap parse when it exists (>=1.92.0) and avoids import-time errors on older SDKs.
Test Version Changes
Updated OpenAI test requirements from
==1.91.0
(See requirements files) to<2.0.0
(riotfile). This is necessary because parse methods were released in 1.92.0 (moved out of beta). The upper bound excludes v2.0 which was recently released. I believe it would be out of scope is out of scope for this PR to handle any v2 breaking changes. Considering this is ~my first PR on this repo, if you believe there is a better course of action lmk.Metadata Fix
Added
exclude_none=True
tomodel_dump()
inload_data_value()
to fix failing LLMObs tests (e.g., failing job). Without this, Pydantic models serialize withNone
values (e.g.,verbosity: None
) that don't match test expectations.Testing
Snapshot tests for sync/async variants with VCR cassettes and version gating for openai >= 1.92.
Risks
Low - minimal code changes.
Additional Notes