feat(amd): port tunable params and postpone-termination tool from python by toubatbrian · Pull Request #1368 · livekit/agents-js

toubatbrian · 2026-05-01T12:07:13Z

Summary

Automated port of livekit/agents#5584 (fix(amd): amd improvement (AGT-2777)) into agents-js.

Note

This is an automated Claude Code Routine created by @toubatbrian. Right now it is in experimentation stage.

cc @toubatbrian @livekit/agent-devs for review.

Ported features

All listed below land in agents/src/voice/amd.ts and are wired through the existing two-gate (verdict + silence) AMD architecture.

1. Expose all tunable parameters

New optional fields on AMDOptions:

Option	Default	Notes
`humanSpeechThresholdMs`	`2_500`	Speech longer than this is treated as machine-like and skips the short-greeting heuristic.
`humanSilenceThresholdMs`	`500`	Silence after a short greeting before settling as `HUMAN`.
`machineSilenceThresholdMs`	`1_500`	Silence after machine-like speech before opening the silence gate.
`prompt`	bundled `AMD_PROMPT`	Override the AMD classification system prompt.
`participantIdentity`	`undefined`	Currently informational (used for span attribution / logs).
`suppressCompatibilityWarning`	`false`	Silences the "model not evaluated" warning.

noSpeechTimeoutMs, detectionTimeoutMs, and maxTranscriptTurns were already exposed.

2. Use LLM when a transcript is available

Mirrors the python change in classifier.py::on_user_speech_ended. If the user just spoke for ≤ humanSpeechThresholdMs and a transcript is already on the record, AMD now waits machineSilenceThresholdMs (instead of the shorter humanSilenceThresholdMs + automatic HUMAN verdict) so the LLM gets the final word.

3. `save_prediction` + `postpone_termination` tools

detect() now exposes two tools to the LLM via toolCtx and toolChoice: 'required':

save_prediction({ label }) — commits the verdict (mirrors python save_prediction).
postpone_termination({ seconds }) — extends the silence window; capped at MAX_EXTENSIONS = 3 × MAX_EXTENSION_MS = 10_000. On expiration, opens the silence gate and re-runs classification with the latest transcript; with extensions exhausted, the tool is no longer offered, forcing the LLM to commit.

If the LLM doesn't emit tool calls (for example, the in-tree StaticLLM test mock or providers that ignore toolChoice='required'), AMD falls back to the previous JSON-content parsing path so the existing 4 unit tests remain green.

4. Compatibility warning for evaluated LLM models

EVALUATED_LLM_MODELS (the same 12 inference IDs from python) is checked against LLM.model once at construction; a warning is logged when the resolved model isn't in the list, suppressible via suppressCompatibilityWarning: true.

What was intentionally not ported

These pieces of agents#5584 are tightly coupled to the python AudioRecognition/RoomIO pipeline and don't have direct counterparts in agents-js today. Skipping them avoids a much larger architectural change and keeps the JS AMD compatible with its current session-event model.

Python change	JS status	Reason
Dedicated `stt` parameter on `AMD`	Skipped	The JS AMD listens to `AgentSession` `UserInputTranscribed` events; it has no audio-frame channel comparable to python's `audio_recognition.push_audio` → AMD path. Adding a parallel STT pipeline is a larger redesign and out of scope for this porting PR.
`wait_for_track_publication(wait_for_subscription=True)` + `start()` / `start_timers()` split	Skipped	The JS AMD starts timers inside `execute()`, which the user calls after `session.start({ agent, room })`. The "start before SIP participant joins, then wait for subscription" pattern requires async lifecycle changes to `AMD` that don't have an analogue in JS.
`EVALUATED_STT_MODELS` warning	Skipped	Paired with the dedicated STT pipeline above.
Python-only `examples/telephony/amd.py` rewrite	Adapted	`examples/src/telephony_amd.ts` gets a comment block showing the new tunable options; the SIP-participant-creation choreography is not duplicated since JS doesn't have the same `room_io.set_participant` API surface.
`NO_SPEECH_THRESHOLD = 10.0` / `TIMEOUT = 20.0` defaults	Already matched	JS already defaulted to `10_000` / `20_000` ms.

If a follow-up needs the dedicated AMD STT pipeline or the participant-track lifecycle, that can be tracked as a separate issue — please flag in review if you'd like me to file one.

Implementation nuances

Python signals "more audio expected" by sending "" into a channel; JS uses scheduleLLMClassification() to re-trigger classification with the joined transcript.
Python's tool_choice='required' is passed through; if a provider ignores it, the JSON-content fallback in parseDetection() keeps behavior reasonable.
Time units follow CLAUDE.md: all new fields are milliseconds. MAX_EXTENSION_MS is the JS analogue of MAX_EXTENSION_SECS (10s → 10_000 ms).
All ported sections carry // Ref: python <path> - <line range> comments per CLAUDE.md guidance.

Test plan

pnpm --filter @livekit/agents build — passes
pnpm --filter @livekit/agents lint — amd.ts / amd.test.ts clean (0 errors, 0 warnings)
pnpm exec prettier --check on changed files — passes
pnpm exec vitest run agents/src/voice/amd.test.ts — 6/6 pass (4 existing + 2 new)
Manual smoke test against a real SIP call (left to reviewer with phone-number infra)

Changeset

patch for @livekit/agents (per the routine's standing instructions).

Generated by Claude Code

Ports python livekit/agents#5584 (AMD improvement) into agents-js. - Expose `humanSpeechThresholdMs`, `humanSilenceThresholdMs`, `machineSilenceThresholdMs`, and `prompt` as `AMDOptions` fields. - Defer to the LLM (instead of forcing HUMAN) when a transcript is already available after a short greeting. - Add `postpone_termination` LLM tool (capped at 3 extensions × 10s) alongside `save_prediction`; fall back to JSON-content parsing when the LLM does not emit tool calls. - Add `participantIdentity` and `suppressCompatibilityWarning` options. - Warn once when the resolved LLM is not in `EVALUATED_LLM_MODELS`. Skipped (architectural divergence — see PR description): dedicated AMD STT pipeline, track-subscription wait, and the `start()` / `start_timers()` lifecycle split.

changeset-bot · 2026-05-01T12:07:18Z

🦋 Changeset detected

Latest commit: 5e34d10

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 29 packages

Name	Type
@livekit/agents	Patch
@livekit/agents-plugin-anam	Patch
@livekit/agents-plugin-assemblyai	Patch
@livekit/agents-plugin-baseten	Patch
@livekit/agents-plugin-bey	Patch
@livekit/agents-plugin-cartesia	Patch
@livekit/agents-plugin-cerebras	Patch
@livekit/agents-plugin-deepgram	Patch
@livekit/agents-plugin-elevenlabs	Patch
@livekit/agents-plugin-google	Patch
@livekit/agents-plugin-hedra	Patch
@livekit/agents-plugin-inworld	Patch
@livekit/agents-plugin-lemonslice	Patch
@livekit/agents-plugin-liveavatar	Patch
@livekit/agents-plugin-livekit	Patch
@livekit/agents-plugin-minimax	Patch
@livekit/agents-plugin-mistral	Patch
@livekit/agents-plugin-mistralai	Patch
@livekit/agents-plugin-neuphonic	Patch
@livekit/agents-plugin-openai	Patch
@livekit/agents-plugin-phonic	Patch
@livekit/agents-plugin-resemble	Patch
@livekit/agents-plugin-rime	Patch
@livekit/agents-plugin-runway	Patch
@livekit/agents-plugin-sarvam	Patch
@livekit/agents-plugin-silero	Patch
@livekit/agents-plugins-test	Patch
@livekit/agents-plugin-trugen	Patch
@livekit/agents-plugin-xai	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

CLAassistant · 2026-05-01T12:07:21Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ toubatbrian
❌ claude
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

- Gate `save_prediction` and `postpone_termination` tool side effects on the current `detectGeneration`. Stale in-flight classifications now no-op instead of mutating timers, budget, or capturing a verdict that belongs to a superseded transcript window. - Normalize `save_prediction`'s `label` argument through `parseCategory` before storing, so an off-enum value from a misbehaving LLM (or our manual JSON path that bypasses Zod) is treated as UNCERTAIN rather than producing an `AMDResult` with an invalid category string. - Fix `warnIfNotEvaluated` substring check to also handle date-suffixed model names (e.g. `openai/gpt-4.1-mini-2025-04-14`).

Without this, a postpone_termination tool call resolved after aclose() would still see isStale() === false (settled was never flipped) and install a fresh silenceTimer that survives cleanup, eventually firing scheduleLLMClassification + tryEmitResult and potentially triggering session.interrupt on a closed AMD.

Without a lower bound and NaN guard, a misbehaving LLM passing a negative or non-numeric `seconds` argument would compute a clampedMs of NaN or a negative number, which setTimeout treats as 0 and fires immediately. The manual tool-execution path here bypasses the Zod schema, so this defense lives in execute().

This comment was marked as resolved.

Sign in to view

claude and others added 2 commits May 4, 2026 09:37

Merge branch 'main' into claude/quirky-galileo-51AGi

5bd733f

This comment was marked as resolved.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(amd): port tunable params and postpone-termination tool from python#1368

feat(amd): port tunable params and postpone-termination tool from python#1368
toubatbrian wants to merge 5 commits intomainfrom
claude/quirky-galileo-51AGi

toubatbrian commented May 1, 2026

Uh oh!

changeset-bot Bot commented May 1, 2026 •

edited

Loading

Uh oh!

CLAassistant commented May 1, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

toubatbrian commented May 1, 2026

Summary

Ported features

1. Expose all tunable parameters

2. Use LLM when a transcript is available

3. save_prediction + postpone_termination tools

4. Compatibility warning for evaluated LLM models

What was intentionally not ported

Implementation nuances

Test plan

Changeset

Uh oh!

changeset-bot Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

CLAassistant commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

3. `save_prediction` + `postpone_termination` tools

changeset-bot Bot commented May 1, 2026 •

edited

Loading

CLAassistant commented May 1, 2026 •

edited

Loading