Purpose: Concept demonstration and learning platform - NOT for production use
âś… Perfect for:
- Learning Conversation Relay concepts
- Testing and experimentation
- Understanding core architecture patterns
- Educational and tutorial purposes
⚠️ Key Limitations:
- Manual Configuration: CRelay parameters hardcoded directly in
TwilioService.ts- Event Emitter Architecture: Looser coupling with event-driven patterns
- Incomplete TypeScript Typing: Basic type coverage throughout codebase
- Tighter Service Coupling: Services more interdependent
- File-based Configuration Only: No centralized cloud configuration
- Basic Error Handling: Limited recovery and resilience features
- Simple Tool Implementations: Minimal tool functionality and validation
🏠v4.0 - Production-Ready Version → v4.0 Branch
Purpose: Extended implementation for production deployments
🎯 Production Features:
- Dependency Injection Architecture: Clean service separation with proper DI patterns
- Handler-Based Design: Structured handlers replace event emitter patterns
- Complete TypeScript Interfaces: Full type safety and comprehensive typing
- Flexible Asset Loading: Choose file-based OR Twilio Sync-based configuration
- Automatic Sync Infrastructure: Creates Twilio Sync services automatically from config
- Centralized Configuration Management: Cloud-based config via Twilio Sync Maps
- Advanced Features: Listen mode, configurable silence detection, dynamic language switching
- Runtime Configuration Updates: Change settings without server restarts
- Enhanced Tool System: Robust implementations with comprehensive error handling
- Performance Optimizations: In-memory caching and efficient processing
- Deployment Ready: Includes utilities, guides, and production configurations
💡 Recommendation: Use v3.0 for learning → Upgrade to v4.0 for production deployments
This is a reference implementation aimed at introducing the key concepts of Conversation Relay. The key here is to ensure it is a workable environment that can be used to understand the basic concepts of Conversation Relay. It is intentionally simple and only the minimum has been done to ensure the understanding is focussed on the core concepts. As an overview here is how the project is put together:
This release fixes a critical timing issue where OpenAI messages were not played before terminal tool actions (like end-call) were executed. The fix ensures proper message sequencing while maintaining v3.0's event-driven architecture. See the CHANGELOG.md for detailed release history.
Configure your Conversation Relay parameters in server/src/services/TwilioService.ts
// Generate the Twiml we will need once the call is connected. Note, this could be done in two steps via the server, were we set a url: instead of twiml:, but this just seemed overly complicated.
const response = new twilio.twiml.VoiceResponse();
const connect = response.connect();
const conversationRelay = connect.conversationRelay({
url: `wss://${serverBaseUrl}/conversation-relay`,
transcriptionProvider: "deepgram",
speechModel: "nova-3-general",
interruptible: "any",
ttsProvider: "Elevenlabs",
voice: "Charlie-flash_v2_5",
dtmfDetection: true,
} as any);
conversationRelay.parameter({
name: 'callReference',
value: callReference
});- Node.js v18
- pnpm
- ngrok
- TypeScript
.
├── server/ # WebSocket server for conversation relay
│ ├── .env.example # Example environment configuration
│ ├── package.json # Server dependencies and scripts
│ ├── tsconfig.json # TypeScript configuration
│ ├── server.ts # Main server implementation
│ ├── assets/ # Configuration assets
│ │ ├── defaultContext.md # Default GPT conversation context
│ │ ├── defaultToolManifest.json # Default available tools configuration
│ │ ├── MyContext.md # Specific context
│ │ └── MyToolManifest.json # Specific tools
│ ├── src/ # Source code directory
│ │ ├── server.ts # Main server implementation
│ │ ├── services/ # Core service implementations
│ │ │ ├── ConversationRelayService.ts
│ │ │ ├── DeepSeekService.ts
│ │ │ ├── OpenAIService.ts
│ │ │ ├── ResponseService.ts
│ │ │ ├── SilenceHandler.ts
│ │ │ └── TwilioService.ts
│ │ ├── tools/ # Tool implementations
│ │ │ ├── end-call.ts
│ │ │ ├── live-agent-handoff.ts
│ │ │ ├── send-dtmf.ts
│ │ │ └── send-sms.ts
│ │ └── utils/ # Utility functions
│ │ └── logger.ts
The server handles WebSocket connections and manages conversation relay functionality. It includes GPT service integration for natural language processing and Twilio integration for voice call handling.
- Navigate to the server directory:
cd server- Install dependencies:
# Using pnpm (recommended)
pnpm install
# Or using npm
npm install- For development, start the development server:
# Using pnpm
pnpm dev
# Or using npm
npm run devFor production, build and start the server:
# Using pnpm
pnpm build
pnpm start
# Or using npm
npm run build
npm start-
Ensure the server is running on port 3001 (or configured port in
.env). -
Optionally, expose the server using ngrok:
ngrok http --domain server-yourdomain.ngrok.dev 3001-
Initialization: Silence monitoring starts after the initial setup message, ensuring the system is ready for conversation.
-
Message Tracking:
- The system tracks the time since the last meaningful message
- Info-type messages are intentionally ignored to prevent false resets
- Valid messages (prompt, interrupt, dtmf) reset both the timer and retry counter
-
Response Sequence:
- After 5 seconds of silence: Sends a reminder message ("I'm sorry, I didn't catch that...")
- Each reminder increments a retry counter
- After 3 unsuccessful attempts: Ends the call with an "unresponsive" reason code
-
Cleanup: The system properly cleans up monitoring resources when the call ends or disconnects.
The silence handling is modular and follows separation of concerns:
SilenceHandlerclass manages the logic independently- Messages are passed back to the server via callbacks
- The server maintains control of WebSocket communication
- Thresholds are configurable through constants in server.ts
This design ensures reliable conversation flow while preventing indefinite silence periods, improving the overall user experience.
The system includes a robust silence detection mechanism to manage periods of inactivity during conversations. This functionality is implemented in the SilenceHandler class and operates based on two key thresholds:
SILENCE_SECONDS_THRESHOLD(5 seconds): The duration of silence before triggering a reminderSILENCE_RETRY_THRESHOLD(3 attempts): Maximum number of reminders before ending the call
- Configure your Twilio phone number to point to the "connectConversationRelay" endpoint:
- Go to your Twilio Console > Phone Numbers > Active Numbers
- Select your phone number
- Under "Voice & Fax" > "A Call Comes In"
- Set it to "Webhook" and enter:
https://server-yourdomain.ngrok.dev/connectConversationRelay - Method: HTTP POST
This endpoint will handle incoming calls and establish the WebSocket connection for conversation relay.
-
When a call is received, Twilio initiates a WebSocket connection to
wss://server-yourdomain.ngrok.dev/conversation-relay -
The server receives a 'setup' message containing call details:
- Caller's phone number (
from) - Called number (
to) - Call SID
- Other call metadata
- Custom parameters (including callReference)
- Caller's phone number (
-
The server then:
- Stores the call parameters for the session in a wsSessionsMap
- Retrieves any parameter data associated with the callReference
- Initializes the ResponseService with the specified context and tool manifest files
- Creates a ConversationRelayService instance with:
- ResponseService for LLM interactions
- Session data containing setup information and parameters
- Silence handler for managing inactivity
- Sets up event listeners for WebSocket communication
- Begins processing incoming messages
-
Session Management:
- Each WebSocket connection maintains its own isolated session
- Sessions are stored in a wsSessionsMap keyed by Call SID
- This enables multiple concurrent calls to be handled independently
- Each session has its own ResponseService and ConversationRelayService instances
- WebSocket connections are synchronous by nature
- Using await in the main connection handler could cause you to miss messages
- Example of correct implementation:
// INCORRECT - Don't do this
app.ws('/conversation-relay', async (ws, req) => {
await someAsyncOperation(); // This could cause missed messages
ws.on('message', (msg) => {
// Handle message
});
});
// CORRECT - Do this instead
app.ws('/conversation-relay', (ws, req) => {
ws.on('message', async (msg) => {
await someAsyncOperation(); // Safe to use await here
// Handle message
});
});The server uses two key files to configure the GPT conversation context:
Located in server/assets/context.md, this file defines:
- The AI assistant's persona
- Conversation style guidelines
- Response formatting rules
- Authentication process steps
- Customer validation requirements
Key sections to configure:
- Objective - Define the AI's role and primary tasks
- Style Guardrails - Set conversation tone and behavior rules
- Response Guidelines - Specify formatting and delivery rules
- Instructions - Detail specific process steps
Located in server/assets/toolManifest.json, this file defines the tools available to the OpenAI service. The service implements a dynamic tool loading system where tools are loaded based on their names in the manifest. Each tool's filename in the /tools directory must exactly match its name in the manifest.
Available tools:
-
end-call- Gracefully terminates the current call
- Used for normal call completion or error scenarios
- Returns a "crelay" type response that bypasses LLM processing
-
live-agent-handoff- Transfers the call to a human agent
- Required parameter:
callSid - Returns a "crelay" type response that bypasses LLM processing
-
send-dtmf- Sends DTMF tones during the call
- Useful for automated menu navigation
- Returns a "crelay" type response that bypasses LLM processing
-
send-sms- Sends SMS messages during the call
- Used for verification codes or follow-up information
- Returns a "tool" type response for LLM processing or "error" type if sending fails
Each tool now uses the ToolEvent system to emit events and return simple responses for conversation context:
// Tools receive a ToolEvent object for event emission
export default function (functionArguments: ToolArguments, toolEvent?: ToolEvent): ToolResponse {
// Tool logic here
if (toolEvent) {
// Emit events for WebSocket transmission using the ToolEvent interface
toolEvent.emit('crelay', {
type: "action",
data: actionData
});
toolEvent.log(`Action completed: ${JSON.stringify(actionData)}`);
}
// Return simple response for conversation context
return {
success: true,
message: "Action completed successfully"
};
}The ResponseService creates ToolEvent objects that provide tools with controlled access to:
- Event Emission:
toolEvent.emit(eventType, data)for sending events to ConversationRelayService - Logging:
toolEvent.log(message)for standard logging - Error Logging:
toolEvent.logError(message)for error reporting
The ResponseService loads these tools during initialization and makes them available for use in conversations through OpenAI's function calling feature.
Create a .env file in the server directory with the following variables:
PORT=3001 # Server port number
SERVER_BASE_URL=your_server_url # Base URL for your server (e.g., ngrok URL)
OPENAI_API_KEY=your_openai_api_key # OpenAI API key for GPT integration
OPENAI_MODEL=gpt-4-1106-preview # OpenAI model to use for conversations
# Dynamic Context Configuration
LLM_CONTEXT=MyContext.md # Specify which context file to use (defaults to defaultContext.md)
LLM_MANIFEST=MyToolManifest.json # Specify which tool manifest to use (defaults to defaultToolManifest.json)These variables are used by the server for:
- Configuring the server port
- Setting the server's base URL for Twilio integration
- Authenticating with OpenAI's API
- Specifying the OpenAI model for conversations
- Loading specific context and tool configurations
The system supports dynamic context loading through environment variables, allowing different conversation contexts and tool configurations based on your needs. This feature enables the system to adapt its behavior and capabilities for different use cases.
The dynamic context system is organized in the server/assets directory with multiple context and tool manifest files:
defaultContext.mdanddefaultToolManifest.json- Used when no specific context is configuredMyContext.mdandMyToolManifest.json- Specialized context and tools for Bill of Quantities calls
To use a specific context:
- Add the context and tool manifest files to the
server/assetsdirectory - Configure the environment variables in your
.envfile:LLM_CONTEXT=YourContext.md LLM_MANIFEST=YourToolManifest.json
If these variables are not set, the system defaults to:
defaultContext.mddefaultToolManifest.json
This approach allows you to:
- Support multiple use cases with different requirements
- Maintain separation of concerns between different contexts
- Easily add new contexts and tool sets
- Switch contexts by updating environment variables
To deploy the server to Fly.io, follow these steps:
- Navigate to the server directory:
cd server- For new deployments, use the
--no-deployoption to create a new Fly.io app without deploying it immediately. This allows you to configure your app before the first deployment:
fly launch --no-deployNote: Make sure to update your SERVER_BASE_URL in the .env file to use your Fly.io app's hostname without the "https://" prefix.
- Ensure your
fly.tomlfile has the correct port configuration:
[http]
internal_port = 3001 # Make sure this matches your application port- Add the volume mount configuration to your
fly.tomlfile:
[mounts]
source = "assets"
destination = "/assets"- Import your environment variables as secrets:
fly secrets import < .env- Now deploy your application and check the logs to make sure it is up and running.
fly deploy- Verify your context and manifest files are in the mount by logging into the machine:
fly ssh console
cd assets
lsThis will show your context and manifest files in the mounted volume.
- Finally, check that the server is reachable and up by going to the fly.io base directory set above for SERVER_BASE_URL in a browser. You should get "WebSocket Server Running" as a response.
The system supports initiating outbound calls via an API endpoint. This allows external systems to trigger calls that connect to the Conversation Relay service.
POST /outboundCall
interface RequestData {
properties: {
phoneNumber: string; // [REQUIRED] Destination phone number in E.164 format
callReference: string; // [OPTIONAL] Unique reference to associate with the call
firstname?: string; // [OPTIONAL] Additional parameter data
lastname?: string; // [OPTIONAL] Additional parameter data
[key: string]: any; // Other optional parameters
}
}curl -X POST \
'https://server-yourdomain.ngrok.dev/outboundCall' \
--header 'Content-Type: application/json' \
--data-raw '{
"properties": {
"phoneNumber": "+1234567890",
"callReference": "abc123",
"firstname": "Bob",
"lastname": "Jones"
}
}'The system uses a reference mechanism to maintain context and pass parameters throughout the call lifecycle:
-
Initial Storage: When the outbound call endpoint is hit, all provided parameter data is stored in a
parameterDataMapusing the reference as the key:parameterDataMap.set(requestData.callReference, { requestData });
-
Conversation Relay Parameter: The reference is passed to the Conversation Relay service as a parameter:
conversationRelay.parameter({ name: 'callReference', value: callReference });
-
WebSocket Session: When the Conversation Relay establishes the WebSocket connection:
- The initial setup message contains the reference in customParameters
- The server retrieves the stored parameter data using this reference
- The parameter data is attached to the session for use throughout the call
This mechanism allows you to:
- Pass arbitrary parameters to the call session without size limitations
- Access all parameter data throughout the call lifecycle
- Maintain session-specific parameter storage
- The endpoint stores the provided parameter data in a session map using the reference as the key
- Initiates an outbound call via Twilio using the provided phone number
- Connects the call to the Conversation Relay service once established
- The callReference is passed as a parameter to the Conversation Relay, allowing access to the stored parameter data during the call
Success:
{
"success": true,
"response": "CA1234..." // Twilio Call SID
}Error:
{
"success": false,
"error": "Error message"
}- express - Web application framework
- express-ws - WebSocket support for Express
- openai - OpenAI API client for GPT integration
- dotenv - Environment configuration
- winston - Logging framework
- uuid - Unique identifier generation
The server includes several built-in tools for call management, each implementing the new ToolEvent-based system:
-
end-call- Gracefully terminates the current call
- Used for normal call completion or error scenarios
- Uses
toolEvent.emit('crelay', endCallData)to send termination events directly to WebSocket
-
live-agent-handoff- Transfers the call to a human agent
- Handles escalation scenarios
- Uses
toolEvent.emit('crelay', handoffData)to send handoff events directly to WebSocket
-
send-dtmf- Sends DTMF tones during the call
- Useful for automated menu navigation
- Uses
toolEvent.emit('crelay', dtmfData)to send DTMF events directly to WebSocket
-
send-sms- Sends SMS messages during the call
- Used for verification codes or follow-up information
- Returns a simple response for LLM conversation context
- Uses standard logging through the ToolEvent interface
The system implements a sophisticated tool handling mechanism that categorizes tool responses by type to determine appropriate processing:
Tools now return a structured response with toolType and toolData properties:
interface ToolResult {
toolType: string;
toolData: any;
}The system supports four distinct tool types:
-
tool - Standard tools that return results to be consumed by the LLM:
- Results are added to conversation history
- LLM generates a response based on the tool result
- Example:
send-smsreturns confirmation message
-
crelay - Conversation Relay specific tools:
- Results are emitted directly to the WebSocket
- Bypasses LLM processing
- Used for direct control actions like
send-dtmfandend-call - ConversationRelayService listens for these events and forwards them
-
error - Error handling responses:
- Error messages are added to conversation history as system messages
- LLM can acknowledge and respond to the error
- Provides graceful error handling in conversations
-
llm - LLM controller responses (not currently implemented):
- Would allow tools to modify LLM behavior
- Reserved for future expansion
The ResponseService processes tool results based on their type using the new Response API architecture:
switch (toolResult.toolType) {
case "tool":
// Add function call to input messages
this.inputMessages.push({
type: 'function_call',
id: currentToolCall.id,
call_id: currentToolCall.call_id,
name: currentToolCall.name,
arguments: currentToolCall.arguments
});
// Add function result to input messages
this.inputMessages.push({
type: 'function_call_output',
call_id: currentToolCall.call_id,
output: JSON.stringify(toolResult.toolData)
});
// Create follow-up response with tool results
const followUpStream = await this.openai.responses.create({
model: this.model,
input: this.inputMessages,
tools: this.toolDefinitions.length > 0 ? this.toolDefinitions : undefined,
previous_response_id: this.currentResponseId,
stream: true,
store: true
});
break;
case "crelay":
// Emit directly to ConversationRelayService
this.emit('responseService.toolResult', toolResult);
break;
case "error":
// Log error - API will handle the error context
logError('ResponseService', `Tool error: ${toolResult.toolData}`);
break;
}The ConversationRelayService listens for tool results and processes them:
this.responseService.on('responseService.toolResult', (toolResult: ToolResult) => {
// Check if the tool result is for the conversation relay
if (toolResult.toolType === "crelay") {
// Send the tool result to the WebSocket server
this.emit('conversationRelay.outgoingMessage', toolResult.toolData);
}
});This architecture enables:
- Clear separation between conversation flow and direct actions
- Proper handling of Conversation Relay specific commands
- Flexible error handling within the conversation
- Future extensibility for new tool types
The server is organized into modular services:
-
ConversationRelayService- Manages the core conversation flow
- Handles WebSocket communication
- Coordinates between different services
-
OpenAIService- Manages GPT integration
- Handles prompt construction and response processing
- Implements retry logic and error handling
-
SilenceHandler- Manages silence detection and response
- Implements configurable thresholds
- Handles conversation flow control
-
twilioService- Manages Twilio-specific functionality
- Handles call control operations
- Implements SMS and DTMF features
The ResponseService supports interrupting ongoing AI responses to enable natural conversation flow. This reference implementation uses a simple boolean flag approach (this.isInterrupted) for simplicity and ease of understanding. However, for production systems, an AbortController-based approach offers significant advantages.
The current implementation uses a simple boolean flag to manage interrupts:
class ResponseService extends EventEmitter {
protected isInterrupted: boolean;
interrupt(): void {
this.isInterrupted = true;
}
resetInterrupt(): void {
this.isInterrupted = false;
}
private async processStream(stream: any): Promise<void> {
for await (const event of stream) {
if (this.isInterrupted) {
break; // Exit the loop when interrupted
}
// Process events...
}
}
}Advantages of Boolean Flag Approach:
- Simplicity: Easy to understand and implement
- Minimal Dependencies: No additional APIs or concepts required
- Educational Value: Clear demonstration of interrupt logic
- Low Overhead: Minimal performance impact
- Synchronous: No async complications in interrupt handling
Limitations of Boolean Flag Approach:
- Manual Management: Requires explicit reset before each operation
- No Native Integration: Cannot integrate with native APIs that support AbortSignal
- Resource Cleanup: No automatic cleanup of related resources
- Limited Scope: Only works within the application's polling loop
- Race Conditions: Potential timing issues in concurrent scenarios
The AbortController approach provides a more robust and standards-compliant interrupt mechanism:
class ResponseService extends EventEmitter {
protected abortController: AbortController | null;
interrupt(): void {
if (this.abortController) {
this.abortController.abort();
}
}
private async processStream(stream: any): Promise<void> {
// Create new AbortController for this operation
this.abortController = new AbortController();
// Pass abort signal to OpenAI API
const followUpStream = await this.openai.responses.create(
completionParams,
{
signal: this.abortController.signal // Native API integration
}
);
// No manual checking required - API handles abortion internally
for await (const event of followUpStream) {
// Events automatically stop when aborted
// Process events...
}
}
}Advantages of AbortController Approach:
- Native API Integration: Directly supported by fetch(), OpenAI client, and other modern APIs
- Automatic Resource Cleanup: APIs handle cleanup when signal is aborted
- Standards Compliant: Web standard supported across browsers and Node.js
- Better Performance: No manual polling required - APIs check signal internally
- Immediate Cancellation: Operations can stop immediately rather than waiting for next loop iteration
- Built-in Race Condition Protection: AbortSignal state is managed atomically
- Event-Based: Can listen for abort events for custom cleanup logic
Implementation Differences:
| Aspect | Boolean Flag | AbortController |
|---|---|---|
| API Integration | Manual checking in loops | Native support in fetch/HTTP APIs |
| Resource Cleanup | Manual cleanup required | Automatic when signal aborted |
| Performance | Polling overhead | Event-driven, no polling |
| Timing | Checked at loop iterations | Immediate cancellation possible |
| Standards | Custom implementation | Web/Node.js standard |
| Error Handling | Manual error states | Built-in AbortError handling |
Boolean Flag:
- Minimal CPU overhead for simple boolean checks
- Requires polling on each iteration of processing loops
- Cannot interrupt long-running operations between checks
- Memory usage: single boolean per service instance
AbortController:
- Higher initial overhead (object creation, event system)
- More efficient for long-running operations (no polling)
- Can interrupt operations immediately when APIs support it
- Memory usage: AbortController object + potential listeners
Use Boolean Flag When:
- Building educational/reference implementations
- System simplicity is the primary goal
- Working with APIs that don't support AbortSignal
- Performance overhead of AbortController is a concern
- Team is unfamiliar with AbortController concepts
Use AbortController When:
- Building production systems with reliability requirements
- Integrating with modern APIs that support AbortSignal
- Need immediate cancellation of network operations
- Want standards-compliant interrupt handling
- Require automatic resource cleanup
- Building systems that may scale to handle many concurrent operations
To upgrade from the boolean flag to AbortController approach:
-
Replace boolean flag with AbortController:
// Replace: protected isInterrupted: boolean; protected abortController: AbortController | null;
-
Update interrupt method:
interrupt(): void { if (this.abortController) { this.abortController.abort(); } }
-
Remove manual reset calls:
// Remove: this.resetInterrupt(); // AbortController creates new instance for each operation
-
Add signal to API calls:
const stream = await this.openai.responses.create(params, { signal: this.abortController.signal });
-
Remove manual interrupt checks:
// Remove: if (this.isInterrupted) break; // API handles interruption automatically
This reference implementation uses the boolean flag approach to maintain simplicity and educational clarity. The concept of interrupting AI responses is more important than the specific implementation mechanism. For production systems requiring robust interrupt handling, consider migrating to the AbortController approach to leverage native API support, automatic resource cleanup, and improved performance characteristics.