aiOla TypeScript SDK

This repository contains the official JavaScript/TypeScript SDKs for Aiola's voice services.

TL;DR - Demo

Want to try out the playground? Just clone and run:

npm run install:all
npm run build
npm run serve

Navigate to examples and navigate to wanted example:

For STT examples: http://localhost:3000/examples/stt
For TTS examples: http://localhost:3000/examples/tts

stt example code | tts example code

SDKs

aiola-tts: Text-to-Speech SDK
aiola-stt: Speech-to-Text SDK

Installation

npm i @aiola/web-sdk-stt
npm i @aiola/web-sdk-tts

Quick Examples

Text-to-Speech

import AiolaTTSClient from "@aiola/tts";
const client = new AiolaTTSClient({
  baseUrl: "https://api.aiola.ai",
  bearer: "<API-KEY>",
});
const voices = client.getVoices()
const audioBlob = await client.synthesizeSpeech(
  "Hello, world!",
  voices.Bella
);

Speech-to-Text

import {
  AiolaStreamingClient,
  AiolaSocketNamespace,
  AiolaSocketConfig,
} from "@aiola/stt";

const client = new AiolaStreamingClient({
  baseUrl: "https://api.aiola.ai", // for enterprises, use custom endpoint 
  namespace: AiolaSocketNamespace.EVENTS,
  bearer: "<API-KEY>",
  queryParams: {
    flow_id: "TBD", //for enterprises, use custom flow id
    execution_id: "<unique-execution-id>",
    lang_code: "en_US",
    time_zone: "UTC",
  },
  events: {
    onTranscript: (data) => {
      console.log("Transcript:", data);
    },
    onEvents: (data) => {
      console.log("Event:", data);
    },
    onError: (error) => {
      console.error("Error:", error);
    },
    onStartRecord: () => {
      console.log("Recording started");
    },
    onStopRecord: () => {
      console.log("Recording stopped");
    },
  },
});

// Connect to the service
client.connect();

// Or connect and start recording automatically
client.connect(true);

client.setKeywords(['aiola', 'api', 'testing']);

// start recording
client.startRecording();

Features

Speech-to-Text (STT)	Text-to-Speech (TTS)
Real-time speech transcription	Convert text to speech and save as WAV files
Keyword spotting	Real-time streaming of synthesized speech
Event-driven architecture	Multiple voice options available
Multiple language support (en_US, de_DE, fr_FR, zh_ZH, es_ES, pt_PT)	Support for different audio formats (LINEAR16, PCM)

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!