Minimal backend for a turn-based translated PSTN call POC.
This service keeps the call logic on the backend:
- the mobile app captures microphone input and shows the translated transcript UI
- this backend controls the
TwilioPSTN call - this backend translates app text on the server
- this backend can synthesize phone-side speech with
Azure TTSand feed it toTwilio <Play> - this backend downloads the phone-side recording from
Twilio - this backend runs
Azure STT - this backend translates the recognized phone reply back to the app language
- the app polls session state and plays back the already translated reply
The code is split into focused modules for config, logging, Twilio integration, TTS asset serving, session storage, and direct-conversation orchestration so it is easier to evolve toward Redis or a more production-grade deployment later.
This is not a full real-time duplex interpreter. It is:
text-in -> translated-PSTN out -> recording-in -> translated-text-out
GET /healthGET /provider-statusGET /media/tts/:assetNamePOST /api/direct-call-sessionGET /api/sessions/:idPOST /api/sessions/:id/direct-speak
- App starts a session with
POST /api/direct-call-session. - Backend creates an outbound
TwilioPSTN call with inlineTwiML. - App sends
sourceTextwithPOST /api/sessions/:id/direct-speak. - Backend translates the text.
- Backend prefers
Azure TTS -> Twilio <Play>when it has a public HTTPS base URL, and falls back toTwilio <Say>if TTS asset generation is unavailable. - Backend records the callee response, downloads the new recording, runs
Azure STT, translates the recognized phone-side text, and stores both texts in session state. - App polls
GET /api/sessions/:idand plays back the already translated reply.
cp .env.example .env
npm install
npm startEnvironment variables are loaded with dotenv from:
.env.env.localwith override enabled
This service is ready for a Render web service deployment.
Render gives the app a public onrender.com URL and injects RENDER_EXTERNAL_URL. The backend uses that value as the default PUBLIC_BASE_URL, which lets Twilio fetch audio generated by Azure TTS from:
https://<your-service>.onrender.com/media/tts/<asset>.mp3
Recommended Render setup:
- Instance type: Web Service
- Build command:
npm install - Start command:
npm start - Health check path:
/health - Environment variables: set all
TWILIO_*andAZURE_*values in the Render dashboard
Start a call:
curl -X POST http://localhost:8787/api/direct-call-session \
-H 'Content-Type: application/json' \
-d '{
"to": "+15551234567",
"sourceLanguage": "uk-UA",
"targetLanguage": "en-US",
"notes": "poc"
}'Send a phrase into the live call:
curl -X POST http://localhost:8787/api/sessions/<session-id>/direct-speak \
-H 'Content-Type: application/json' \
-d '{
"sourceText": "Привіт, як справи?"
}'Fetch session state:
curl http://localhost:8787/api/sessions/<session-id>- session storage is in-memory
- no horizontal scaling yet
- no Twilio webhook signature validation yet
- no server-side app audio streaming
- PSTN side is turn-based because it uses
Twilio <Record> - TTS asset files are ephemeral and best suited to short-lived POC hosting