twilio-calls

Minimal backend for a turn-based translated PSTN call POC.

This service keeps the call logic on the backend:

the mobile app captures microphone input and shows the translated transcript UI
this backend controls the Twilio PSTN call
this backend translates app text on the server
this backend can synthesize phone-side speech with Azure TTS and feed it to Twilio <Play>
this backend downloads the phone-side recording from Twilio
this backend runs Azure STT
this backend translates the recognized phone reply back to the app language
the app polls session state and plays back the already translated reply

The code is split into focused modules for config, logging, Twilio integration, TTS asset serving, session storage, and direct-conversation orchestration so it is easier to evolve toward Redis or a more production-grade deployment later.

This is not a full real-time duplex interpreter. It is:

text-in -> translated-PSTN out -> recording-in -> translated-text-out

Endpoints

GET /health
GET /provider-status
GET /media/tts/:assetName
POST /api/direct-call-session
GET /api/sessions/:id
POST /api/sessions/:id/direct-speak

Request Flow

App starts a session with POST /api/direct-call-session.
Backend creates an outbound Twilio PSTN call with inline TwiML.
App sends sourceText with POST /api/sessions/:id/direct-speak.
Backend translates the text.
Backend prefers Azure TTS -> Twilio <Play> when it has a public HTTPS base URL, and falls back to Twilio <Say> if TTS asset generation is unavailable.
Backend records the callee response, downloads the new recording, runs Azure STT, translates the recognized phone-side text, and stores both texts in session state.
App polls GET /api/sessions/:id and plays back the already translated reply.

Run

cp .env.example .env
npm install
npm start

Environment variables are loaded with dotenv from:

.env
.env.local with override enabled

Render

This service is ready for a Render web service deployment.

Render gives the app a public onrender.com URL and injects RENDER_EXTERNAL_URL. The backend uses that value as the default PUBLIC_BASE_URL, which lets Twilio fetch audio generated by Azure TTS from:

https://<your-service>.onrender.com/media/tts/<asset>.mp3

Recommended Render setup:

Instance type: Web Service
Build command: npm install
Start command: npm start
Health check path: /health
Environment variables: set all TWILIO_* and AZURE_* values in the Render dashboard

Example

Start a call:

curl -X POST http://localhost:8787/api/direct-call-session \
  -H 'Content-Type: application/json' \
  -d '{
    "to": "+15551234567",
    "sourceLanguage": "uk-UA",
    "targetLanguage": "en-US",
    "notes": "poc"
  }'

Send a phrase into the live call:

curl -X POST http://localhost:8787/api/sessions/<session-id>/direct-speak \
  -H 'Content-Type: application/json' \
  -d '{
    "sourceText": "Привіт, як справи?"
  }'

Fetch session state:

curl http://localhost:8787/api/sessions/<session-id>

Limitations

session storage is in-memory
no horizontal scaling yet
no Twilio webhook signature validation yet
no server-side app audio streaming
PSTN side is turn-based because it uses Twilio <Record>
TTS asset files are ephemeral and best suited to short-lived POC hosting

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

twilio-calls

Endpoints

Request Flow

Run

Render

Example

Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

twilio-calls

Endpoints

Request Flow

Run

Render

Example

Limitations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages