neurolink/docs/features/real-time-services.md at release · dsp-testing/neurolink

title	Real-time Voice Services
description	Bidirectional realtime voice via OpenAI Realtime and Gemini Live, accessed through the RealtimeProcessor static class and the neurolink serve voice command.

NeuroLink integrates the two major realtime voice APIs behind a single, provider-agnostic interface: OpenAI Realtime (openai-realtime) and Google Gemini Live (gemini-live). These let you build full-duplex voice agents where audio streams in and out simultaneously, with the model responding mid-utterance and calling tools in-flight.

Realtime voice is exposed through the RealtimeProcessor static class (not a method on the NeuroLink instance). For non-realtime synthesis and transcription, see the TTS Guide and STT Guide.

Overview

Capability	`openai-realtime`	`gemini-live`
Provider value	`"openai-realtime"`	`"gemini-live"`
Transport	WebSocket	WebSocket / WebRTC
Modalities	audio in/out, text in/out	audio in/out, text, video
Tool calls	Yes (via `onFunctionCall`)	Yes (via `onFunctionCall`)
Interruption	Server-side VAD + manual cancel	Native barge-in + manual cancel

Both APIs support concurrent audio input and output streams, so the user can interrupt the model mid-response and the model can stream audio while still listening for new input.

Quick Start (SDK)

The RealtimeProcessor is a static class — there is no new RealtimeProcessor() and no neurolink.openRealtimeSession(...) method. Connect with RealtimeProcessor.connect(provider, config, handlers):

import { RealtimeProcessor } from "@juspay/neurolink";

// OpenAI Realtime
const session = await RealtimeProcessor.connect(
  "openai-realtime",
  {
    provider: "openai-realtime",
    model: "gpt-4o-realtime-preview",
    voice: "alloy",
    instructions: "You are a helpful voice assistant.",
  },
  {
    onAudio: (chunk) => speaker.write(chunk.audio),
    onTranscript: (text, isFinal) => {
      if (isFinal) console.log("User said:", text);
    },
    onError: (err) => console.error(err),
  },
);

// Send audio chunks (PCM16 mono 24kHz, raw Buffer or RealtimeAudioChunk)
await RealtimeProcessor.sendAudio("openai-realtime", audioChunk);

// Send text input alongside audio
await RealtimeProcessor.sendText("openai-realtime", "What's the weather?");

// Manually request a model response (for manual turn detection)
await RealtimeProcessor.triggerResponse("openai-realtime");

// Cancel an in-progress response (barge-in)
await RealtimeProcessor.cancelResponse("openai-realtime");

// Close the session
await RealtimeProcessor.disconnect("openai-realtime");

// Gemini Live — same handler shape, just a different provider value
const session = await RealtimeProcessor.connect(
  "gemini-live",
  {
    provider: "gemini-live",
    model: "gemini-2.0-flash-live",
    instructions: "Speak naturally and ask follow-up questions.",
  },
  { onAudio, onTranscript, onError },
);

The handler shape is provider-agnostic: the same RealtimeEventHandlers object works across both providers, so you can switch with a single string change.

Event handler reference

type RealtimeEventHandlers = {
  onAudio?: (chunk: RealtimeAudioChunk) => void;
  onTranscript?: (text: string, isFinal: boolean) => void;
  onText?: (text: string, isFinal: boolean) => void;
  onFunctionCall?: (
    name: string,
    args: Record<string, unknown>,
  ) => Promise<unknown>;
  onStateChange?: (state: RealtimeSessionState) => void;
  onError?: (error: Error) => void;
  onTurnStart?: () => void;
  onTurnEnd?: () => void;
};

Quick Start (CLI)

NeuroLink does not ship a neurolink voice realtime interactive CLI. Instead, the realtime voice server is exposed via:

# Canonical: start the realtime voice WebSocket server
npx @juspay/neurolink serve voice --port 8081

# Deprecated alias (still works, prints a deprecation notice)
npx @juspay/neurolink voice-server --port 8081

Connect a browser/mobile client to ws://localhost:8081/voice to drive the session. The server bridges the client to the chosen provider (configured via env vars and per-session messages) and forwards events bidirectionally.

The TTS and STT flags on generate / stream (e.g. --tts, --stt, --input-audio) are for non-realtime synthesis and transcription — see TTS and STT.

Self-hosted Realtime Voice Server

For multi-tenant deployments — voice bots, IVR-style applications, in-app voice features — NeuroLink ships a real-time voice agent server. It bridges browser/mobile clients to provider realtime APIs with session management, observability, and tool routing.

// startVoiceServer is the canonical export
import { startVoiceServer } from "@juspay/neurolink/dist/lib/server/voice/voiceServerApp.js";

await startVoiceServer(8081);

Note: the server is a function export (startVoiceServer), not a NeuroLinkVoiceServer class. To run it from the CLI, prefer npx @juspay/neurolink serve voice --port 8081.

The server emits OTEL spans + Langfuse traces per session, supports HITL approvals on tool calls, and can be deployed standalone or behind your own gateway.

Provider Selection

Use case	Recommended provider
English-first, broad voice catalog, GPT-4o reasoning	`openai-realtime`
Multilingual, video input, lowest latency in many regions	`gemini-live`
Customer support voice bots with structured tool calls	`openai-realtime` (more deterministic function calls)
In-app voice search / multimodal queries	`gemini-live`

Either can be wrapped behind providerFallback so a model-access denial automatically falls through to the alternate model. See Provider Fallback — note that the orchestrator only triggers on access-denied errors, not on rate limits or generic failures.

Tool Calls Inside Realtime Sessions

Both providers can call functions registered with the realtime session. Use the onFunctionCall handler (not onToolCall — that name is reserved for the streaming-text API):

const session = await RealtimeProcessor.connect(
  "openai-realtime",
  {
    provider: "openai-realtime",
    model: "gpt-4o-realtime-preview",
    tools: [
      {
        name: "lookupOrderStatus",
        description: "Look up the status of an order",
        parameters: {
          type: "object",
          properties: { id: { type: "string" } },
          required: ["id"],
        },
      },
    ],
  },
  {
    onFunctionCall: async (name, args) => {
      if (name === "lookupOrderStatus") {
        const order = await db.findOrder(args.id as string);
        return { status: order.status, eta: order.eta };
      }
    },
  },
);

When HITL middleware is wired in front of the function-call handler, sensitive operations (e.g. cancelOrder, chargeCard) pause for human approval before responding back into the realtime stream.

Observability

Realtime sessions emit:

session:start, session:end events with duration + token usage
Per-utterance transcript:user, transcript:assistant events
tool:call, tool:result events
audio:in:bytes, audio:out:bytes for bandwidth tracking

These flow into the same OTEL/Langfuse pipeline as text generation. See the Observability Guide.

Status & Inspection

RealtimeProcessor.isConnected("openai-realtime"); // boolean
RealtimeProcessor.getProviders(); // string[] of registered providers
RealtimeProcessor.supports("openai-realtime"); // boolean
RealtimeProcessor.getSession("openai-realtime"); // RealtimeSession | null
RealtimeProcessor.getSupportedFormats("openai-realtime"); // TTSAudioFormat[]

TTS Guide — non-realtime text-to-speech (5 providers)
STT Guide — transcription (4 providers)
Voice Agent Guide — building voice agents end-to-end
Provider Fallback — failover between models on access denial
Observability — wiring fallback events into your monitoring stack

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overview

Quick Start (SDK)

Event handler reference

Quick Start (CLI)

Self-hosted Realtime Voice Server

Provider Selection

Tool Calls Inside Realtime Sessions

Observability

Status & Inspection

Related

FilesExpand file tree

real-time-services.md

Latest commit

History

real-time-services.md

File metadata and controls

Overview

Quick Start (SDK)

Event handler reference

Quick Start (CLI)

Self-hosted Realtime Voice Server

Provider Selection

Tool Calls Inside Realtime Sessions

Observability

Status & Inspection

Related