Edge TTS is a powerful Text-to-Speech (TTS) package that leverages Microsoft's Edge capabilities. This package allows you to synthesize speech from text and manage voice options easily through a command-line interface (CLI).
- Text-to-Speech: Convert text into natural-sounding speech using Microsoft Edge's TTS capabilities.
- TypeScript Support: Full TypeScript support with comprehensive type definitions included.
- Multiple Audio Formats: Support for 36+ audio formats (MP3, WebM, OGG, WAV, PCM, and more).
- Multiple Voices: Access a variety of voices to suit your project's needs.
- Voice Filtering: Filter voices by language and gender for better selection.
- Audio Information: Get detailed information about generated audio (size, duration, format).
- Audio Export Options: Export synthesized audio in different formats (raw, base64, or directly to a file).
- Streaming Support: Stream audio data in real-time for better performance.
- Word Boundaries Metadata: Get word boundary information with precise timestamps.
- Command-Line Interface: Use a simple CLI for easy access to functionality.
- Easy Integration: Modular structure allows for easy inclusion in existing projects.
You can install Edge TTS via npm or bun:
bun add @andresaya/edge-ttsnpm install @andresaya/edge-ttsEdge TTS is written in TypeScript and includes full type definitions. No additional @types packages are needed.
import {
EdgeTTS,
Constants,
Voice,
SynthesisOptions,
WordBoundary
} from '@andresaya/edge-tts';
// Voice interface
interface Voice {
Name: string;
ShortName: string;
Gender: 'Male' | 'Female';
Locale: string;
FriendlyName: string;
LocalName: string;
}
// Synthesis options
interface SynthesisOptions {
pitch?: string | number; // e.g., '+20Hz' or 20
rate?: string | number; // e.g., '50%' or 50
volume?: string | number; // e.g., '90%' or 90
outputFormat?: string; // e.g., Constants.OUTPUT_FORMAT.AUDIO_24KHZ_96KBITRATE_MONO_MP3
}
// Word boundary metadata
interface WordBoundary {
type: "WordBoundary";
offset: number;
duration: number;
text: string;
}import { EdgeTTS, SynthesisOptions, Constants } from '@andresaya/edge-tts';
const tts = new EdgeTTS();
const options: SynthesisOptions = {
pitch: '+10Hz',
rate: '100%',
volume: '90%',
outputFormat: Constants.OUTPUT_FORMAT.WEBM_24KHZ_16BIT_MONO_OPUS
};
await tts.synthesize("TypeScript example", 'en-US-AriaNeural', options);
const info = tts.getAudioInfo(); // Returns: { size: number; format: string; estimatedDuration: number }
const boundaries = tts.getWordBoundaries(); // Returns: WordBoundary[]Install globally to use the CLI:
npm install -g @andresaya/edge-ttsTo synthesize speech from text:
edge-tts synthesize -t "Hello, world!" -o hello_world_audioTo list available voices:
edge-tts voice-listimport { EdgeTTS } from '@andresaya/edge-tts';
// Initialize the EdgeTTS service
const tts = new EdgeTTS();const voices = await tts.getVoices();
console.log(`Found ${voices.length} voices`);// Get all English voices
const englishVoices = await tts.getVoicesByLanguage('en');
// Get specific locale voices
const usEnglishVoices = await tts.getVoicesByLanguage('en-US');// Get all female voices
const femaleVoices = await tts.getVoicesByGender('Female');
// Get all male voices
const maleVoices = await tts.getVoicesByGender('Male');// Simple synthesis with default voice
await tts.synthesize("Hello, world!");
// Synthesis with specific voice
await tts.synthesize("Hello, world!", 'en-US-AriaNeural');await tts.synthesize("Hello, world!", 'en-US-AriaNeural', {
rate: '50%', // Speech rate: -100% to +200% (or number)
volume: '90%', // Speech volume: -100% to +100% (or number)
pitch: '+20Hz', // Voice pitch: -100Hz to +100Hz (or number)
outputFormat: 'audio-24khz-96kbitrate-mono-mp3' // Audio output format
});Edge TTS supports multiple audio formats. You can specify the format using the outputFormat option:
import { EdgeTTS, Constants } from '@andresaya/edge-tts';
const tts = new EdgeTTS();
// High quality MP3
await tts.synthesize("Hello!", 'en-US-AriaNeural', {
outputFormat: Constants.OUTPUT_FORMAT.AUDIO_24KHZ_96KBITRATE_MONO_MP3
});
await tts.toFile('./output/audio'); // Automatically saved as .mp3
// WebM/Opus for web
await tts.synthesize("Hello!", 'en-US-AriaNeural', {
outputFormat: Constants.OUTPUT_FORMAT.WEBM_24KHZ_16BIT_MONO_OPUS
});
await tts.toFile('./output/audio'); // Automatically saved as .webm
// Lossless WAV
await tts.synthesize("Hello!", 'en-US-AriaNeural', {
outputFormat: Constants.OUTPUT_FORMAT.RIFF_24KHZ_16BIT_MONO_PCM
});
await tts.toFile('./output/audio'); // Automatically saved as .wavAvailable formats (all 36 tested and compatible):
- MP3 Formats (Streaming): 16kHz, 24kHz, 48kHz with various bitrates (32-192 kbps)
- Opus Formats (Streaming): Audio, WebM, and OGG containers
- WAV/PCM Formats (Non-streaming): RIFF (8-48 kHz) and RAW variants
- Specialized Codecs: AMR-WB, G.722, TrueSilk, A-law, μ-law
See Constants.OUTPUT_FORMAT for the complete list. The file extension is automatically detected based on the format.
Format recommendations:
- 🌐 Web streaming:
WEBM_24KHZ_16BIT_MONO_OPUSorAUDIO_24KHZ_96KBITRATE_MONO_MP3 - 📱 Mobile apps:
AUDIO_24KHZ_48KBITRATE_MONO_MP3 - 💾 High quality:
AUDIO_48KHZ_192KBITRATE_MONO_MP3orRIFF_48KHZ_16BIT_MONO_PCM - ⚡ Low bandwidth:
AUDIO_16KHZ_32KBITRATE_MONO_MP3
// Stream audio data in real-time
for await (const chunk of tts.synthesizeStream("Long text to stream...", 'en-US-AriaNeural')) {
// Process each audio chunk as it arrives
console.log(`Received chunk: ${chunk.length} bytes`);
}await tts.synthesize("Hello, world!");
const audioInfo = tts.getAudioInfo();
console.log(`Size: ${audioInfo.size} bytes`);
console.log(`Format: ${audioInfo.format}`);
console.log(`Duration: ${audioInfo.estimatedDuration} seconds`);const duration = tts.getDuration();
console.log(`Audio duration: ${duration} seconds`);await tts.synthesize("Hello, world!");
const base64Audio = tts.toBase64();
console.log(`Base64 length: ${base64Audio.length}`);const rawAudio = tts.toRaw(); // Alias for toBase64()
const buffer = tts.toBuffer(); // Get as Buffer objectconst filePath = await tts.toFile("output_audio");
console.log(`Audio saved to: ${filePath}`);
// Creates: output_audio.mp3// Get word boundaries with timestamps
$boundaries = $tts->getWordBoundaries();
// Save metadata to file
$tts->saveMetadata('metadata.json');import { EdgeTTS } from '@andresaya/edge-tts';
async function textToSpeechExample() {
const tts = new EdgeTTS();
// Get available English voices
const englishVoices = await tts.getVoicesByLanguage('en-US');
console.log(`Available English voices: ${englishVoices.length}`);
// Use the first available voice
const voice = englishVoices[0];
console.log(`Using voice: ${voice.FriendlyName}`);
// Synthesize with custom options
await tts.synthesize(
"This is a test of the Edge TTS system with custom voice parameters.",
voice.ShortName,
{
pitch: '+10Hz',
rate: '-10%',
volume: '90%'
}
);
// Get audio information
const info = tts.getAudioInfo();
console.log(`Generated audio: ${info.size} bytes, ${info.estimatedDuration.toFixed(2)}s`);
// Save to file
const outputPath = await tts.toFile('./output/speech');
console.log(`Audio saved to: ${outputPath}`);
}
textToSpeechExample().catch(console.error);import { EdgeTTS } from '@andresaya/edge-tts';
import { createWriteStream } from 'fs';
async function streamingExample() {
const tts = new EdgeTTS();
const writeStream = createWriteStream('streaming_output.mp3');
const longText = "This is a very long text that will be streamed...";
for await (const chunk of tts.synthesizeStream(longText, 'en-US-AriaNeural')) {
writeStream.write(chunk);
console.log(`Streamed ${chunk.length} bytes`);
}
writeStream.end();
console.log('Streaming completed!');
}
streamingExample().catch(console.error);import { EdgeTTS } from '@andresaya/edge-tts';
async function exploreVoices() {
const tts = new EdgeTTS();
// Get all voices
const allVoices = await tts.getVoices();
console.log(`Total voices available: ${allVoices.length}`);
// Group by language
const languages = [...new Set(allVoices.map(v => v.Locale.split('-')[0]))];
console.log(`Languages available: ${languages.join(', ')}`);
// Get Spanish voices
const spanishVoices = await tts.getVoicesByLanguage('es');
console.log(`Spanish voices: ${spanishVoices.length}`);
// Get female voices
const femaleVoices = await tts.getVoicesByGender('Female');
console.log(`Female voices: ${femaleVoices.length}`);
// Test different voices
const testText = "Hola, este es un ejemplo de síntesis de voz.";
for (const voice of spanishVoices.slice(0, 3)) {
console.log(`Testing voice: ${voice.FriendlyName}`);
await tts.synthesize(testText, voice.ShortName);
const filePath = await tts.toFile(`./voices/${voice.ShortName}`);
console.log(`Saved: ${filePath}`);
}
}
exploreVoices().catch(console.error);This library can be used directly in web browsers via CDN or ES modules.
<!-- Load from CDN -->
<script src="https://unpkg.com/@andresaya/edge-tts@latest/dist/browser/edge-tts.umd.min.js"></script>
<script>
const tts = new EdgeTTS();
// Get available voices
tts.getVoices().then(voices => {
console.log('Available voices:', voices.length);
});
// Synthesize speech
async function speak() {
await tts.synthesize("Hello from the browser!", 'en-US-AriaNeural');
const audioData = tts.getAudioData();
// Play audio
const audioBlob = new Blob([audioData], { type: 'audio/mp3' });
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
audio.play();
}
</script><script type="module">
import { EdgeTTS } from 'https://unpkg.com/@andresaya/edge-tts@latest/dist/browser/edge-tts.esm.min.js';
const tts = new EdgeTTS();
// Use the library
const voices = await tts.getVoices();
console.log(voices);
</script><script type="module">
import { EdgeTTS } from 'https://unpkg.com/@andresaya/edge-tts@latest/dist/browser/edge-tts.esm.min.js';
const tts = new EdgeTTS();
const chunks = [];
// Stream audio chunks in real-time
for await (const chunk of tts.synthesizeStream("Long text to stream...", 'en-US-AriaNeural')) {
chunks.push(chunk);
console.log(`Received chunk: ${chunk.length} bytes`);
}
// Combine and play all chunks
const totalLength = chunks.reduce((acc, chunk) => acc + chunk.length, 0);
const audioData = new Uint8Array(totalLength);
let offset = 0;
for (const chunk of chunks) {
audioData.set(chunk, offset);
offset += chunk.length;
}
const blob = new Blob([audioData], { type: 'audio/mp3' });
const audio = new Audio(URL.createObjectURL(blob));
audio.play();
</script>For a full working example with voice selection and synthesis, see examples/browser-standalone.html.
| Parameter | Type | Range | Description |
|---|---|---|---|
pitch |
string | number |
-100Hz to +100Hz |
Voice pitch adjustment |
rate |
string | number |
-100% to +200% |
Speech rate adjustment |
volume |
string | number |
-100% to +100% |
Volume adjustment |
// Using numbers (recommended)
{ pitch: 20, rate: -10, volume: 90 }
// Using strings
{ pitch: '+20Hz', rate: '-10%', volume: '90%' }
// Mixed usage
{ pitch: 15, rate: '25%', volume: 85 }import { EdgeTTS } from '@andresaya/edge-tts';
async function handleErrors() {
const tts = new EdgeTTS();
try {
await tts.synthesize("Test text", 'invalid-voice-name');
} catch (error) {
console.error('Synthesis failed:', error.message);
}
try {
// This will throw an error - no audio data
const duration = tts.getDuration();
} catch (error) {
console.error('No audio data available:', error.message);
}
try {
// Invalid volume range
await tts.synthesize("Test", 'en-US-AriaNeural', { volume: -150 });
} catch (error) {
console.error('Invalid parameter:', error.message);
}
}If you want to use Edge TTS with PHP, you can check out the PHP version of this package: Edge TTS PHP
This project is licensed under the GNU General Public License v3 (GPLv3).
We would like to extend our gratitude to the developers and contributors of the following projects for their inspiration and groundwork: