feat(amd): port tunable params and postpone-termination tool from python#1368
Open
toubatbrian wants to merge 5 commits intomainfrom
Open
feat(amd): port tunable params and postpone-termination tool from python#1368toubatbrian wants to merge 5 commits intomainfrom
toubatbrian wants to merge 5 commits intomainfrom
Conversation
Ports python livekit/agents#5584 (AMD improvement) into agents-js. - Expose `humanSpeechThresholdMs`, `humanSilenceThresholdMs`, `machineSilenceThresholdMs`, and `prompt` as `AMDOptions` fields. - Defer to the LLM (instead of forcing HUMAN) when a transcript is already available after a short greeting. - Add `postpone_termination` LLM tool (capped at 3 extensions × 10s) alongside `save_prediction`; fall back to JSON-content parsing when the LLM does not emit tool calls. - Add `participantIdentity` and `suppressCompatibilityWarning` options. - Warn once when the resolved LLM is not in `EVALUATED_LLM_MODELS`. Skipped (architectural divergence — see PR description): dedicated AMD STT pipeline, track-subscription wait, and the `start()` / `start_timers()` lifecycle split.
🦋 Changeset detectedLatest commit: 5e34d10 The changes in this PR will be included in the next version bump. This PR includes changesets to release 29 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
|
- Gate `save_prediction` and `postpone_termination` tool side effects on the current `detectGeneration`. Stale in-flight classifications now no-op instead of mutating timers, budget, or capturing a verdict that belongs to a superseded transcript window. - Normalize `save_prediction`'s `label` argument through `parseCategory` before storing, so an off-enum value from a misbehaving LLM (or our manual JSON path that bypasses Zod) is treated as UNCERTAIN rather than producing an `AMDResult` with an invalid category string. - Fix `warnIfNotEvaluated` substring check to also handle date-suffixed model names (e.g. `openai/gpt-4.1-mini-2025-04-14`).
Without this, a postpone_termination tool call resolved after aclose() would still see isStale() === false (settled was never flipped) and install a fresh silenceTimer that survives cleanup, eventually firing scheduleLLMClassification + tryEmitResult and potentially triggering session.interrupt on a closed AMD.
Without a lower bound and NaN guard, a misbehaving LLM passing a negative or non-numeric `seconds` argument would compute a clampedMs of NaN or a negative number, which setTimeout treats as 0 and fires immediately. The manual tool-execution path here bypasses the Zod schema, so this defense lives in execute().
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Automated port of livekit/agents#5584 (
fix(amd): amd improvement (AGT-2777)) intoagents-js.Note
This is an automated Claude Code Routine created by @toubatbrian. Right now it is in experimentation stage.
cc @toubatbrian @livekit/agent-devs for review.
Ported features
All listed below land in
agents/src/voice/amd.tsand are wired through the existing two-gate (verdict + silence) AMD architecture.1. Expose all tunable parameters
New optional fields on
AMDOptions:humanSpeechThresholdMs2_500humanSilenceThresholdMs500HUMAN.machineSilenceThresholdMs1_500promptAMD_PROMPTparticipantIdentityundefinedsuppressCompatibilityWarningfalsenoSpeechTimeoutMs,detectionTimeoutMs, andmaxTranscriptTurnswere already exposed.2. Use LLM when a transcript is available
Mirrors the python change in
classifier.py::on_user_speech_ended. If the user just spoke for ≤humanSpeechThresholdMsand a transcript is already on the record, AMD now waitsmachineSilenceThresholdMs(instead of the shorterhumanSilenceThresholdMs+ automaticHUMANverdict) so the LLM gets the final word.3.
save_prediction+postpone_terminationtoolsdetect()now exposes two tools to the LLM viatoolCtxandtoolChoice: 'required':save_prediction({ label })— commits the verdict (mirrors pythonsave_prediction).postpone_termination({ seconds })— extends the silence window; capped atMAX_EXTENSIONS = 3×MAX_EXTENSION_MS = 10_000. On expiration, opens the silence gate and re-runs classification with the latest transcript; with extensions exhausted, the tool is no longer offered, forcing the LLM to commit.If the LLM doesn't emit tool calls (for example, the in-tree
StaticLLMtest mock or providers that ignoretoolChoice='required'), AMD falls back to the previous JSON-content parsing path so the existing 4 unit tests remain green.4. Compatibility warning for evaluated LLM models
EVALUATED_LLM_MODELS(the same 12 inference IDs from python) is checked againstLLM.modelonce at construction; a warning is logged when the resolved model isn't in the list, suppressible viasuppressCompatibilityWarning: true.What was intentionally not ported
These pieces of
agents#5584are tightly coupled to the pythonAudioRecognition/RoomIOpipeline and don't have direct counterparts in agents-js today. Skipping them avoids a much larger architectural change and keeps the JS AMD compatible with its current session-event model.sttparameter onAMDAgentSessionUserInputTranscribedevents; it has no audio-frame channel comparable to python'saudio_recognition.push_audio→ AMD path. Adding a parallel STT pipeline is a larger redesign and out of scope for this porting PR.wait_for_track_publication(wait_for_subscription=True)+start()/start_timers()splitexecute(), which the user calls aftersession.start({ agent, room }). The "start before SIP participant joins, then wait for subscription" pattern requires async lifecycle changes toAMDthat don't have an analogue in JS.EVALUATED_STT_MODELSwarningexamples/telephony/amd.pyrewriteexamples/src/telephony_amd.tsgets a comment block showing the new tunable options; the SIP-participant-creation choreography is not duplicated since JS doesn't have the sameroom_io.set_participantAPI surface.NO_SPEECH_THRESHOLD = 10.0/TIMEOUT = 20.0defaults10_000/20_000ms.If a follow-up needs the dedicated AMD STT pipeline or the participant-track lifecycle, that can be tracked as a separate issue — please flag in review if you'd like me to file one.
Implementation nuances
""into a channel; JS usesscheduleLLMClassification()to re-trigger classification with the joined transcript.tool_choice='required'is passed through; if a provider ignores it, the JSON-content fallback inparseDetection()keeps behavior reasonable.CLAUDE.md: all new fields are milliseconds.MAX_EXTENSION_MSis the JS analogue ofMAX_EXTENSION_SECS(10s → 10_000 ms).// Ref: python <path> - <line range>comments perCLAUDE.mdguidance.Test plan
pnpm --filter @livekit/agents build— passespnpm --filter @livekit/agents lint—amd.ts/amd.test.tsclean (0 errors, 0 warnings)pnpm exec prettier --checkon changed files — passespnpm exec vitest run agents/src/voice/amd.test.ts— 6/6 pass (4 existing + 2 new)Changeset
patchfor@livekit/agents(per the routine's standing instructions).Generated by Claude Code