Skip to content

maksympron/tranlsation-call

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

twilio-calls

Minimal backend for a turn-based translated PSTN call POC.

This service keeps the call logic on the backend:

  • the mobile app captures microphone input and shows the translated transcript UI
  • this backend controls the Twilio PSTN call
  • this backend translates app text on the server
  • this backend can synthesize phone-side speech with Azure TTS and feed it to Twilio <Play>
  • this backend downloads the phone-side recording from Twilio
  • this backend runs Azure STT
  • this backend translates the recognized phone reply back to the app language
  • the app polls session state and plays back the already translated reply

The code is split into focused modules for config, logging, Twilio integration, TTS asset serving, session storage, and direct-conversation orchestration so it is easier to evolve toward Redis or a more production-grade deployment later.

This is not a full real-time duplex interpreter. It is:

text-in -> translated-PSTN out -> recording-in -> translated-text-out

Endpoints

  • GET /health
  • GET /provider-status
  • GET /media/tts/:assetName
  • POST /api/direct-call-session
  • GET /api/sessions/:id
  • POST /api/sessions/:id/direct-speak

Request Flow

  1. App starts a session with POST /api/direct-call-session.
  2. Backend creates an outbound Twilio PSTN call with inline TwiML.
  3. App sends sourceText with POST /api/sessions/:id/direct-speak.
  4. Backend translates the text.
  5. Backend prefers Azure TTS -> Twilio <Play> when it has a public HTTPS base URL, and falls back to Twilio <Say> if TTS asset generation is unavailable.
  6. Backend records the callee response, downloads the new recording, runs Azure STT, translates the recognized phone-side text, and stores both texts in session state.
  7. App polls GET /api/sessions/:id and plays back the already translated reply.

Run

cp .env.example .env
npm install
npm start

Environment variables are loaded with dotenv from:

  1. .env
  2. .env.local with override enabled

Render

This service is ready for a Render web service deployment.

Render gives the app a public onrender.com URL and injects RENDER_EXTERNAL_URL. The backend uses that value as the default PUBLIC_BASE_URL, which lets Twilio fetch audio generated by Azure TTS from:

https://<your-service>.onrender.com/media/tts/<asset>.mp3

Recommended Render setup:

  • Instance type: Web Service
  • Build command: npm install
  • Start command: npm start
  • Health check path: /health
  • Environment variables: set all TWILIO_* and AZURE_* values in the Render dashboard

Example

Start a call:

curl -X POST http://localhost:8787/api/direct-call-session \
  -H 'Content-Type: application/json' \
  -d '{
    "to": "+15551234567",
    "sourceLanguage": "uk-UA",
    "targetLanguage": "en-US",
    "notes": "poc"
  }'

Send a phrase into the live call:

curl -X POST http://localhost:8787/api/sessions/<session-id>/direct-speak \
  -H 'Content-Type: application/json' \
  -d '{
    "sourceText": "Привіт, як справи?"
  }'

Fetch session state:

curl http://localhost:8787/api/sessions/<session-id>

Limitations

  • session storage is in-memory
  • no horizontal scaling yet
  • no Twilio webhook signature validation yet
  • no server-side app audio streaming
  • PSTN side is turn-based because it uses Twilio <Record>
  • TTS asset files are ephemeral and best suited to short-lived POC hosting

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors