"I make friends. They're toys. My friends are toys. I make them. It's a hobby." — J.F. Sebastian, Blade Runner
An AI conversation system that brings life to vintage animatronic toys. Built with a modular device architecture, this system supports multiple output devices including the 1985 Teddy Ruxpin and Squawkers McCaw. Features real-time voice conversations with ChatGPT, a modular personality system with unique wake words, voices, and conversational styles.
Includes six distinct personalities: a tiki bartender, Abraham Lincoln (a homage to Disney's pioneering animatronics), an eccentric conspiracy theorist, Mister Rogers, K.I.T.T. from Knight Rider, and the classic Teddy Ruxpin character. Add your own personalities using simple YAML files - no programming required!
- Features
- Quick Start
- System Requirements
- Hardware Setup
- Installation
- Personalities
- Usage
- Configuration Options
- Architecture
- Troubleshooting
- Debug Mode
- Development
- Performance Metrics
- Cost Estimates
- About the Name
- License
- Credits
- Contributing
- Support
- Modular Device Architecture: Supports multiple output devices with simple configuration
- Teddy Ruxpin: Full animatronic control with PPM signals for mouth and eyes
- Squawkers McCaw: Simple stereo audio output without PPM
- Easy to extend for additional devices
- Modular Personality System: Switch between different AI personalities with unique voices and behaviors
- Johnny: Tiki bartender with deep knowledge of tiki culture ("Hey, Johnny")
- Mr. Lincoln: Abraham Lincoln, 16th President - homage to Disney's animatronics ("Hey, Mr. Lincoln")
- Leopold: Eccentric conspiracy theorist with a wild backstory ("Hey, Leopold")
- Fred: Mister Rogers with gentle warmth and simple wisdom ("Hey, Fred")
- K.I.T.T.: Knight Industries Two Thousand AI from Knight Rider ("Hey, Kitt")
- Teddy Ruxpin: The classic storytelling bear from Grundo ("Hey, Teddy Ruxpin")
- Wake Word Activation: Custom wake words per personality using OpenWakeWord (free & open source)
- Low-Latency Fillers: Pre-generated personality-specific phrases play immediately while processing
- Speech Recognition: OpenAI Whisper API for accurate speech-to-text transcription
- AI Conversation: GPT-4o-mini powers personality-driven responses with conversation context
- Streaming Response Pipeline: Word-based sentence chunking enables parallel TTS/RVC processing while LLM generates
- Natural Voice: OpenAI TTS (gpt-4o-mini-tts) generates speech with personality-specific voices, speeds, and tones
- RVC Voice Conversion (Optional): Transform TTS output with custom trained voice models for unique character voices beyond OpenAI TTS
- Animatronic Control (Teddy Ruxpin): Generates PPM control signals for mouth (syllable-based lip sync) and eyes (sentiment-based)
- Flexible Output: Device-specific audio processing (stereo with PPM for Teddy, simple stereo for Squawkers)
- Proactive Scheduler (Optional): Per-personality
scheduled_events.yamlfor morning greetings, bedtime stories, holiday surprises — never interrupts an in-progress conversation
New to J.F. Sebastian? See the Quick Start Guide to get your animatronic talking in 5 minutes!
- Python 3.10.x (specifically - RVC dependencies are not compatible with 3.11+)
- macOS (currently configured for Mac audio devices)
- Internet connection for OpenAI APIs
- Supported Devices:
- 1985 Teddy Ruxpin doll (cassette-based model) - full animatronic control
- Squawkers McCaw - simple audio output
- Other animatronics can be added via the modular device architecture
- Bluetooth cassette adapter (for Teddy Ruxpin, recommended: Arsvita Car Audio Bluetooth Wireless Cassette Receiver)
- Microphone for voice input
This system is designed to work with an original 1985 cassette-based Teddy Ruxpin doll. The cassette mechanism provides both audio playback and motor control through a stereo audio signal.
Arsvita Car Audio Bluetooth Wireless Cassette Receiver
- Designed for car cassette players but works perfectly with Teddy Ruxpin
- Reliable Bluetooth 5.0 connection
- Good audio quality for both voice and control signals
- Rechargeable battery (charges via USB-C)
- Insert Bluetooth Cassette Adapter: Place the adapter into Teddy's cassette deck
- Audio Routing:
- LEFT channel → Teddy's speaker (voice)
- RIGHT channel → Control track (mouth/eye motors)
- Pairing: Pair the Bluetooth adapter with your Mac
- Device Selection: Note the device name (see Configuration section)
You can install J.F. Sebastian using either the automated setup script (recommended) or manual step-by-step installation.
The easiest way to get started is using the provided setup script:
# Clone the repository
git clone https://github.com/pjdoland/jf-sebastian.git
cd jf-sebastian
# Run automated setup
./setup.shThe setup script will automatically:
- Check Python version (3.10.x specifically - required for RVC compatibility)
- Create virtual environment with Python 3.10
- Upgrade pip to latest version
- Install Python dependencies from requirements.txt
- Optionally install RVC voice conversion dependencies (asks for confirmation)
- Download OpenWakeWord preprocessing models
- Install system dependencies (PortAudio, FFmpeg via Homebrew)
- Create required directories
- Create
.envconfiguration file from template - List available audio devices
- Optionally generate filler audio for all personalities (asks for confirmation)
- Check for wake word models
After running setup.sh:
- Edit
.envand add your OpenAI API key:OPENAI_API_KEY=sk-your-openai-api-key
- Configure audio devices in
.env(see device list from setup):INPUT_DEVICE_NAME=MacBook Air Microphone OUTPUT_DEVICE_NAME=Arsvita
- You're ready to run:
./run.sh
If you prefer more control or need to troubleshoot, you can install manually:
git clone https://github.com/pjdoland/jf-sebastian.git
cd jf-sebastianImportant: RVC voice conversion requires Python 3.10.x specifically (not 3.11+). If you plan to use RVC, ensure you're using Python 3.10:
# Check your Python version
python3 --version # Should show 3.10.x
# If you need Python 3.10, install via:
# - pyenv: pyenv install 3.10.13 && pyenv local 3.10.13
# - Homebrew: brew install python@3.10 (then use python3.10 command)
# Create virtual environment with Python 3.10
python3 -m venv venv
# Or if using python3.10 from Homebrew:
# python3.10 -m venv venv
source venv/bin/activate # On macOS/Linux# Upgrade pip first
pip install --upgrade pip
# Install Python dependencies
pip install -r requirements.txtOpenWakeWord requires preprocessing models that must be downloaded separately:
python3 -c "from openwakeword import utils; utils.download_models(['alexa'])"This downloads the required melspectrogram.onnx and embedding_model.onnx files to the openwakeword package directory.
For audio processing, you may need additional system libraries:
# macOS
brew install portaudio ffmpeg
# The application uses:
# - PortAudio (for PyAudio)
# - FFmpeg (for MP3 to PCM conversion)Create a .env file from the example:
cp .env.example .envEdit .env and add your API keys:
# Personality Selection
PERSONALITY=johnny # Options: fred, jarvis, johnny, kitt, leopold, mr_lincoln, teddy_ruxpin
# Required API Keys
OPENAI_API_KEY=sk-your-openai-api-key
# Audio device names (see "Finding Audio Devices" below)
INPUT_DEVICE_NAME=MacBook Air Microphone
OUTPUT_DEVICE_NAME=ArsvitaOpenAI API Key:
- Go to https://platform.openai.com/api-keys
- Create a new API key
- Add to
.envasOPENAI_API_KEY
Wake Word Models (OpenWakeWord):
No API key required! OpenWakeWord is completely free and open source.
Each personality includes its own wake word model file at personalities/{name}/hey_{name}.onnx. Run ls personalities/*/hey_*.onnx to see which models are currently installed.
Some personalities (e.g. Jarvis) use OpenWakeWord's bundled pre-trained models (hey_jarvis_v0.1); others ship custom-trained models.
To create a custom wake word for a new personality:
- Follow the guide in
docs/TRAIN_WAKE_WORDS.md - Train a model for your desired wake phrase
- Place the
.onnxmodel file in your personality's directory
To use a bundled OpenWakeWord model instead of training one, download with python3 -c "from openwakeword import utils; utils.download_models(['hey_jarvis'])" and copy from venv/lib/python3.10/site-packages/openwakeword/resources/models/.
Run the audio output utility to list all devices:
python -m jf_sebastian.modules.audio_outputThis will display:
Available Audio Devices:
--------------------------------------------------------------------------------
[0] MacBook Pro Microphone
Type: INPUT
Channels: In=1, Out=0
Sample Rate: 48000.0 Hz
[1] MacBook Pro Speakers
Type: OUTPUT
Channels: In=0, Out=2
Sample Rate: 48000.0 Hz
[2] Bluetooth Cassette Adapter
Type: OUTPUT
Channels: In=0, Out=2
Sample Rate: 44100.0 Hz
Update .env with the appropriate device names:
INPUT_DEVICE_NAME=MacBook Air Microphone
OUTPUT_DEVICE_NAME=ArsvitaRVC (Retrieval-based Voice Conversion) is optional. The system works perfectly with OpenAI TTS voices alone. Install RVC only if you want to use custom trained voice models for unique character voices.
Requirements:
- Python 3.10.x specifically (RVC is not compatible with Python 3.11+)
- Requires temporarily downgrading pip for compatibility
Installation:
# Method 1: Use the automated script (recommended)
./scripts/install_rvc.sh
# Method 2: Manual installation
# Step 1: Downgrade pip for RVC compatibility
pip install pip==24.0
# Step 2: Install RVC dependencies
pip install -r requirements-rvc.txt
# Step 3: Upgrade pip back to latest
pip install --upgrade pipThe requirements-rvc.txt file includes: torch, torchaudio, fairseq, librosa, and rvc-python.
Troubleshooting RVC Installation:
- If you get dependency conflicts, try:
pip install rvc-python --no-depsthen install dependencies manually - For Apple Silicon (M1/M2/M3): Ensure you have the MPS-enabled torch version
- For CUDA: Install CUDA-enabled torch first:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
To skip RVC: The system will automatically detect RVC availability and work without it.
See docs/CREATING_PERSONALITIES.md for RVC configuration and usage.
Filler phrases are pre-recorded audio clips that play immediately when you speak, creating a natural conversational feel while the system processes your question in the background. The system can work without them, but conversations will feel more responsive with filler audio.
Generate the filler audio for all personalities:
python scripts/generate_fillers.pyThis creates device-specific filler audio for each registered output device. For each personality, it generates:
filler_audio/teddy_ruxpin/- Filler audio with PPM control signals for mouth and eyesfiller_audio/squawkers_mccaw/- Filler audio with simple stereo (no PPM)filler_audio/headless/- Filler audio with simple stereo for computer playback
Each device-specific directory contains 30 WAV files with:
- Voice audio synthesized with the personality's configured voice, speed, and tone
- Device-appropriate audio processing (PPM signals for Teddy Ruxpin, simple stereo for others)
Note: The filler audio files are generated per output device type, so each personality will have device-specific versions automatically created based on the registered output devices.
When to regenerate filler audio:
- After creating a new personality
- After switching to a different personality (if fillers don't exist yet for your output device)
- After modifying a personality's
tts_voice,tts_speed, ortts_stylesettings - After editing the
filler_phraseslist in the personality YAML file - After adding support for a new output device type
Personality-specific generation: By default, the script generates fillers for all personalities. To generate for just one personality:
python scripts/generate_fillers.py --personality johnnyDevice-specific storage: Each personality stores filler audio in device-specific subdirectories:
personalities/johnny/filler_audio/
├── teddy_ruxpin/ # 30 WAV files with PPM control signals
│ ├── filler_01.wav
│ └── ...
├── squawkers_mccaw/ # 30 WAV files with simple stereo
│ ├── filler_01.wav
│ └── ...
└── headless/ # 30 WAV files with simple stereo
├── filler_01.wav
└── ...
The system includes a modular personality framework. Each personality has:
- Unique wake word for activation
- Custom system prompt defining character and knowledge
- Specific TTS voice from OpenAI
- Filler phrases that play immediately for low-latency feel
- Pre-generated filler audio files with motor control signals
- Wake word: "Hey, Johnny"
- Voice: Onyx (casual male)
- Character: Laid-back beatnik bartender with deep tiki culture knowledge
- Topics: Cocktails, surf music, Polynesian pop, tiki history
- Wake word: "Hey, Mr. Lincoln"
- Voice: Echo (dignified male)
- Character: 16th President of the United States - a homage to Disney's Great Moments with Mr. Lincoln
- Topics: Liberty, equality, union, Constitution, leadership, moral conviction
- Wake word: "Hey, Leopold"
- Voice: Onyx (conspiratorial)
- Character: Eccentric truth-seeker with an insane backstory (Turkish prison, UFO abductions, intelligence work)
- Topics: Conspiracies, surveillance, government secrets, paranoid theories
- Wake word: "Hey, Fred"
- Voice: Echo with RVC (gentle, warm)
- Character: Fred Rogers from Mister Rogers' Neighborhood - speaks with gentle warmth and simple wisdom
- Topics: Feelings, kindness, being special just as you are, taking time, the neighborhood
- Wake word: "Hey, Kitt"
- Voice: Onyx with RVC (sophisticated AI)
- Character: Advanced AI from the Knight Rider Trans Am - intelligent with dry wit
- Topics: Advanced technology, crime fighting, surveillance mode, turbo boost, molecular bonded shell
- Wake word: "Hey, Teddy Ruxpin"
- Voice: Echo with RVC (friendly, enthusiastic)
- Character: Adventurous teddy bear from the magical land of Grundo
- Topics: Adventures, friendship, Grubby, ancient treasures, magical crystals, storytelling
Edit .env to change personalities:
# Switch to Johnny (Tiki Bartender)
PERSONALITY=johnny
# Switch to Mr. Lincoln (Abraham Lincoln)
PERSONALITY=mr_lincoln
# Switch to Leopold (Conspiracy Theorist)
PERSONALITY=leopold
# Switch to Fred (Mister Rogers)
PERSONALITY=fred
# Switch to K.I.T.T. (Knight Rider AI)
PERSONALITY=kitt
# Switch to Teddy Ruxpin (Storytelling Bear)
PERSONALITY=teddy_ruxpinImportant: After switching personalities, regenerate the filler audio if it doesn't exist yet:
python scripts/generate_fillers.pySee personalities/README.md for detailed instructions on creating new personalities.
Personalities are simple YAML files - no coding required! Just copy an existing personality folder and modify the personality.yaml file.
python -m jf_sebastian.mainYou should see (example with Johnny personality):
================================================================================
J.F. Sebastian - Animatronic AI Conversation System
"I make friends. They're toys. My friends are toys."
================================================================================
Personality: Johnny
Wake word: Hey Johnny
...
System ready! Say 'Hey, Johnny' to start talking.
Press Ctrl+C to exit.
================================================================================
For unattended deployments — museum exhibits, eldercare companions, kids' rooms, anywhere the toy needs to keep running across PortAudio/RVC crashes — wrap the app in scripts/supervisor.py and let launchd (macOS) or systemd (Linux) keep the supervisor itself alive.
The supervisor:
- Restarts the child process on unexpected exit, with exponential backoff (1s → 60s → 10-min permanent-failure mode if it keeps crashing)
- Detects hung children via heartbeat-file staleness; SIGTERMs (then SIGKILLs) the whole process group
- Writes enriched crash reports (PID, personality, ran_for, log tail) to
crash_reports/and prunes to the most recent N - Forwards SIGTERM/SIGINT cleanly so
launchctl unload/systemctl stopdoes the right thing
Quick start (foreground, for testing):
HEARTBEAT_FILE=/tmp/jf_sebastian.heartbeat python scripts/supervisor.pymacOS launchd: edit the __EDIT_ME_*__ placeholders in scripts/jf-sebastian.plist, copy to ~/Library/LaunchAgents/, then:
launchctl bootstrap gui/$UID ~/Library/LaunchAgents/com.jf-sebastian.supervisor.plistLinux systemd: edit scripts/jf-sebastian.service, copy to ~/.config/systemd/user/, then:
systemctl --user daemon-reload
systemctl --user enable --now jf-sebastian.serviceAll supervisor settings (HEARTBEAT_INTERVAL, WATCHDOG_TIMEOUT, RESTART_BACKOFF_*, CRASH_REPORT_DIR, etc.) are documented in .env.example under "SUPERVISOR / WATCHDOG". Set HEARTBEAT_FILE in .env to opt the main app into liveness reporting; otherwise the supervisor can only catch crashes, not hangs.
- Wake the character: Say the wake word ("Hey, Johnny", "Hey, Mr. Lincoln", "Hey, Leopold", "Hey, Fred", "Hey, Kitt", or "Hey, Teddy Ruxpin")
- Speak: Once detected, speak your message
- Listen: Character responds with personality-appropriate answer
- Repeat: Continue the conversation
The system will:
- Listen for your speech
- Auto-detect when you stop talking
- Play a filler phrase immediately (0.5-second natural pause)
- Process in background: transcribe, generate response, synthesize speech
- Seamlessly transition from filler to real response
- Animate mouth and eyes during speech
With Johnny (Tiki Bartender):
You: "Hey, Johnny"
Johnny: [Acknowledges]
You: "What's your favorite rum?"
Johnny: "Hold on, I'm checking the rum barrel... Alright, so I gotta say, a good aged Jamaican rum is hard to beat. The funk and complexity are just incredible in a Mai Tai."
You: "Tell me about tiki culture"
Johnny: "Just grabbing some fresh mint... So tiki culture started in the 1930s with Don the Beachcomber and Trader Vic. They created this whole Polynesian fantasy..."
With Mr. Lincoln (Abraham Lincoln):
You: "Hey, Mr. Lincoln"
Mr. Lincoln: [Acknowledges]
You: "What does freedom mean to you?"
Mr. Lincoln: "Let me consult my notes on this... Freedom is not the liberty to do as you please, but the opportunity to do what is right. True liberty requires both law and moral conviction."
You: "How did you handle the pressure of the Civil War?"
Mr. Lincoln: "Reviewing my correspondence from that time... The burden was immense, but I was guided by one principle - preserving the Union and ending slavery. When the cause is just, you endure."
With Leopold (Conspiracy Theorist):
You: "Hey, Leopold"
Leopold: [Acknowledges]
You: "What do you think about surveillance?"
Leopold: "Hold on, checking my bug detector... Okay, we're clean. Listen, they're watching everything. I've found three devices in my apartment last month. The surveillance is real, and it's everywhere."
You: "Have you really been abducted by aliens?"
Leopold: "Just reviewing my notes from the second abduction... Twice, actually. 1997. The first time was definitely more invasive. After they installed the implant, communication improved. You probably think I'm crazy, but the evidence is all there."
Settings are loaded with three layers, highest precedence first:
personalities/{PERSONALITY}/.env— per-personality overridesdevice_overrides/{OUTPUT_DEVICE_TYPE}/.env— per-device-type overrides.env— base configuration
Use the overlays for things like VOICE_GAIN that differ by speaker hardware or by personality (some RVC models output quieter than others). PERSONALITY and OUTPUT_DEVICE_TYPE must come from the base — they're the selection keys, so setting them inside an overlay has no effect on overlay loading. Overlay files are git-ignored automatically by the existing .env rule. Loaded overlay paths are logged at startup.
| Setting | Description | Default |
|---|---|---|
PERSONALITY |
Active personality ('johnny', 'mr_lincoln', 'leopold', 'fred', 'kitt', 'teddy_ruxpin') | johnny |
OPENAI_API_KEY |
OpenAI API key (required) | - |
| Setting | Description | Default |
|---|---|---|
INPUT_DEVICE_NAME |
Microphone device name | - |
OUTPUT_DEVICE_NAME |
Speaker device name | - |
OUTPUT_DEVICE_TYPE |
Output device type ('teddy_ruxpin', 'squawkers_mccaw', 'headless') | teddy_ruxpin |
SAMPLE_RATE |
Audio capture sample rate (Hz). 16000 in practice — Silero VAD requires it. Validator still accepts 22050/44100/48000, but VAD warns and disables itself at those rates. | 16000 |
CHUNK_SIZE |
Audio chunk size for processing | 1024 |
| Setting | Description | Default |
|---|---|---|
VAD_THRESHOLD |
Silero VAD per-window speech probability cutoff (0.0-1.0, higher = stricter) | 0.5 |
SILENCE_TIMEOUT |
Max silence before timeout (seconds) | 5.0 |
SPEECH_END_SILENCE_SECONDS |
Silence required to end speech after talking (seconds) | 0.5 |
MIN_LISTEN_SECONDS |
Minimum listen window after wake word (seconds) | 1.0 |
MIN_AUDIO_RMS |
Min peak RMS amplitude to send audio to Whisper (filters silence) | 60 |
MIN_SPEECH_RATIO |
Min ratio of speech-bearing frames (0.0–1.0) before transcribing | 0.3 |
| Setting | Description | Default |
|---|---|---|
WEATHER_PROVIDER |
wttr / homeassistant / manual / none / auto (unset) |
unset → auto |
ZIPCODE |
US zipcode for the wttr.in provider | - |
HOME_ASSISTANT_URL |
HA URL for the homeassistant provider | - |
HOME_ASSISTANT_TOKEN |
Long-lived HA access token | - |
HOME_ASSISTANT_WEATHER_ENTITY |
HA entity_id (e.g., weather.home) |
- |
MANUAL_WEATHER |
Free-form description for the manual provider (no network egress) | - |
| Setting | Description | Default |
|---|---|---|
NEWS_PROVIDER |
rss / hackernews / manual / none / auto |
unset → auto |
NEWS_RSS_URL |
Any RSS or Atom feed URL | NPR Topics: News |
MANUAL_NEWS |
Newline-separated headlines (no network egress) | - |
NEWS_HEADLINE_LIMIT |
Max headlines injected per turn | 5 |
NEWS_CACHE_TTL_MINUTES |
Headline cache duration (minimum 60 seconds) | 30 |
| Setting | Description | Default |
|---|---|---|
SCHEDULER_ENABLED |
Run per-personality scheduled_events.yaml (when present) |
true |
QUIET_HOURS_START |
Global quiet-hours start (HH:MM, overrides personality YAML) | - |
QUIET_HOURS_END |
Global quiet-hours end (HH:MM) | - |
| Setting | Description | Default |
|---|---|---|
CONVERSATION_TIMEOUT |
Clear history after idle (seconds) | 120.0 |
MAX_HISTORY_LENGTH |
Maximum conversation history messages to maintain | 20 |
MIN_CHUNK_WORDS |
Minimum word count per streaming chunk (word-based sentence chunking) | 15 |
MAX_TOKENS |
Maximum tokens for non-streaming GPT responses | 300 |
MAX_TOKENS_STREAMING |
Maximum tokens for streaming GPT responses | 200 |
| Setting | Description | Default |
|---|---|---|
WHISPER_MODEL |
OpenAI Whisper speech-to-text model | whisper-1 |
GPT_MODEL |
OpenAI GPT model for conversation | gpt-4o-mini |
TTS_MODEL |
OpenAI text-to-speech model | gpt-4o-mini-tts |
Note: TTS voice, speed, and style are defined per personality in personalities/ (not in .env). The gpt-4o-mini-tts model supports prompting for tone, emotional range, intonation, and speaking style.
| Setting | Description | Default |
|---|---|---|
PLAYBACK_PREROLL_MS |
Audio playback preroll (milliseconds) to prevent clipped starts | 240 |
PLAYBACK_TAIL_GUARD_MS |
Wait after playback ends before re-opening mic; covers speaker drain so the bot doesn't capture its own tail | 500 |
VOICE_GAIN |
Voice audio volume level (0.0 to 2.0; applied to both RVC and non-RVC paths) | 1.05 |
CONTROL_GAIN |
Control track volume level (0.0 to 1.0) | 0.52 |
SENTIMENT_POSITIVE_THRESHOLD |
Sentiment threshold for positive eye expressions | 0.3 |
SENTIMENT_NEGATIVE_THRESHOLD |
Sentiment threshold for negative eye expressions | -0.3 |
| Setting | Description | Default |
|---|---|---|
WAKE_WORD_THRESHOLD |
Wake word detection threshold (0.0 to 1.0, higher = more strict) | 0.99 |
| Setting | Description | Default |
|---|---|---|
RVC_ENABLED |
Global enable/disable for RVC voice conversion | true |
RVC_DEVICE |
Device for RVC inference (cpu/mps/cuda) - mps for Apple Silicon GPU | mps |
Note: RVC is configured per-personality in personality.yaml. See docs/CREATING_PERSONALITIES.md for RVC setup details. When enabled, RVC transforms TTS output to create unique character voices that go beyond what OpenAI TTS can provide alone.
| Setting | Description | Default |
|---|---|---|
DEBUG_MODE |
Enable debug features | false |
LOG_LEVEL |
Logging level (DEBUG, INFO, WARNING, ERROR) | INFO |
SAVE_DEBUG_AUDIO |
Save audio files for debugging | false |
DEBUG_AUDIO_PATH |
Directory for debug audio files | ./debug_audio/ |
These settings only apply when running under scripts/supervisor.py. See Running Unattended.
| Setting | Description | Default |
|---|---|---|
HEARTBEAT_FILE |
Path to liveness file (set to opt the main process into reporting) | - |
HEARTBEAT_INTERVAL |
Seconds between liveness touches | 10.0 |
WATCHDOG_TIMEOUT |
Seconds without heartbeat before child is considered hung | 60.0 |
RESTART_BACKOFF_INITIAL |
Initial restart delay (seconds) | 1.0 |
RESTART_BACKOFF_MAX |
Cap on restart delay (seconds) | 60.0 |
CRASH_REPORT_DIR |
Directory for per-crash reports | ./crash_reports/ |
CRASH_REPORT_KEEP |
Max crash reports retained (older are pruned) | 200 |
Creating a new personality is easy - just create a folder and a YAML file!
📖 Full Guide: See docs/CREATING_PERSONALITIES.md for a comprehensive step-by-step guide.
Quick Reference: See personalities/README.md for technical details.
Each personality is defined in a simple personality.yaml file:
name: YourName
tts_voice: onyx # or echo, fable, nova, shimmer, alloy
tts_speed: 1.0 # Optional: 0.25 to 4.0 (slower for dignified, faster for manic)
tts_style: "Speak warmly and conversationally" # Optional: tone/style instruction
wake_word_model: hey_yourname.onnx
system_prompt: |
You are YourName, a character description...
Keep responses conversational and concise.
filler_phrases:
- "Your filler phrase 1..."
- "Your filler phrase 2..."No programming required - just copy an existing personality folder and edit the YAML file!
See ARCHITECTURE.md for detailed system design, component descriptions, and technical specifications.
Deploying on an NVIDIA Jetson Orin Nano? See JETSON_DEPLOYMENT.md for system packages, GPU power tuning, USB-mic AGC, and other host-level setup that isn't covered by setup.sh.
- State Machine: Manages conversation flow (IDLE → LISTENING → PROCESSING → SPEAKING). Atomic
try_transitionCAS lets the scheduler enter SPEAKING without racing the wake-word detector. - Wake Word Detector: Always-on personality-specific wake word detection (OpenWakeWord)
- Audio Input Pipeline: Microphone capture with voice activity detection
- Speech-to-Text: OpenAI Whisper transcription
- Conversation Engine: GPT-4o integration with word-based streaming chunking (MIN_CHUNK_WORDS configurable)
- Text-to-Speech: OpenAI TTS synthesis with personality-specific voices and styles
- RVC Voice Converter (Optional): Transforms TTS output with trained voice models for unique character voices
- PPM Generator: Creates precise PPM control signals (60Hz, 400µs pulses, 630-1590µs gaps)
- Device Output Processors: Device-specific audio processing (Teddy Ruxpin with PPM, Squawkers McCaw simple stereo)
- Audio Output Pipeline: Stereo playback with parallel chunk processing for minimal latency
- Real-World Context Provider: Injects current date/time, weather (pluggable: wttr / Home Assistant / manual), and top news headlines (pluggable: RSS / Hacker News / manual) into the LLM context every turn
- Proactive Scheduler: Per-personality
scheduled_events.yamlfor proactive utterances (greetings, bedtime stories) — fires only when state is IDLE, never interrupts a conversation - Process Supervisor (Optional):
scripts/supervisor.pykeeps the app alive across crashes with exponential-backoff restart, watchdog kill of hung children, and crash reports — for unattended deployments via launchd / systemd
- Issue: Wake word not recognized
- Solutions:
- Check microphone is working and selected correctly
- Speak clearly and slightly louder
- Ensure wake word model files exist in
models/directory - See
docs/TRAIN_WAKE_WORDS.mdfor training custom wake words
- Issue: No audio output or "Device not found"
- Solutions:
- Run
python -m jf_sebastian.modules.audio_outputto list devices - Verify device indices in
.env - Check Bluetooth connection to cassette adapter
- Try
-1for default device first
- Run
- Issue: "OpenAI API error" or rate limit
- Solutions:
- Verify
OPENAI_API_KEYis correct and has credits - Check internet connection
- Wait if rate limited (free tier limits)
- Review logs in
jf_sebastian.log
- Verify
- Issue: Audio plays but Teddy doesn't move
- Solutions:
- Verify stereo output is working (both channels)
- Check Bluetooth adapter is properly inserted
- Test with original cassette tape first
- Verify PPM control signal generation (enable debug audio)
- Check motor batteries in Teddy
- Inspect output WAV file in Audacity: RIGHT channel should show regular 60Hz pulses
- Issue: Slow response times
- Solutions:
- Use faster models:
TTS_MODEL=tts-1(nottts-1-hd) - Check internet connection speed
- Reduce
MAX_HISTORY_LENGTHfor shorter context - Consider local Whisper (whisper.cpp) instead of API
- Use faster models:
Enable detailed logging and audio file saving:
# In .env
DEBUG_MODE=true
SAVE_DEBUG_AUDIO=true
DEBUG_AUDIO_PATH=./debug_audio/This will:
- Save input audio as
input_YYYYMMDD_HHMMSS.wav - Save stereo output as
output_YYYYMMDD_HHMMSS.wav - Enable verbose logging to console and
jf_sebastian.log
Inspect stereo output in Audacity:
- Open output WAV file
- Split stereo to mono tracks
- LEFT = voice audio
- RIGHT = PPM control signal (should show regular 60Hz negative pulses with varying gap widths)
jf-sebastian/
├── jf_sebastian/ # Main application package
│ ├── __init__.py
│ ├── main.py # Main application
│ ├── config/
│ │ └── settings.py # Configuration management (.env loader + validation)
│ ├── devices/ # Modular output device architecture
│ │ ├── base.py # OutputDevice abstract class
│ │ ├── factory.py # Device registry and factory
│ │ ├── teddy_ruxpin.py # Teddy Ruxpin device (with PPM)
│ │ ├── headless.py # Headless device (computer playback)
│ │ ├── squawkers_mccaw.py # Squawkers McCaw (subclass of HeadlessDevice)
│ │ └── shared/
│ │ ├── audio_processor.py # MP3→PCM conversion (FFmpeg)
│ │ └── sentiment_analyzer.py # Sentiment analysis (VADER)
│ ├── modules/
│ │ ├── state_machine.py # State management (IDLE / LISTENING / PROCESSING / SPEAKING)
│ │ ├── wake_word.py # Wake word detection (OpenWakeWord)
│ │ ├── audio_input.py # Microphone + VAD
│ │ ├── speech_to_text.py # Whisper API
│ │ ├── conversation.py # GPT-4o-mini streaming + word-chunked sentences
│ │ ├── text_to_speech.py # OpenAI TTS
│ │ ├── filler_phrases.py # Pre-recorded filler audio for low-latency feel
│ │ ├── ppm_generator.py # PPM signal generation (60Hz, 8-channel)
│ │ ├── rvc_processor.py # Optional RVC voice conversion
│ │ ├── audio_output.py # Stereo playback
│ │ └── scheduler.py # Proactive scheduler (per-personality scheduled_events.yaml)
│ └── utils/
│ ├── audio_utils.py # RMS, VAD-based speech detection
│ ├── audio_device_utils.py # PyAudio device-name lookup
│ ├── async_file_utils.py # Non-blocking file writes
│ ├── gpu_utils.py # MPS/CUDA detection for RVC
│ ├── context_provider.py # Date/time + weather + news context for LLM
│ ├── weather.py # Pluggable weather providers (wttr / HA / manual)
│ ├── news.py # Pluggable news providers (RSS / HN / manual)
│ └── heartbeat.py # Liveness file for the supervisor
├── personalities/ # Device-agnostic personality system (auto-discovered)
│ ├── README.md # Personality creation guide
│ ├── base.py # Personality loader (reads YAML)
│ └── <personality>/ # One folder per personality (drop-in)
│ ├── personality.yaml # Required: name, voice, prompt, fillers, etc.
│ ├── hey_<name>.onnx # Required: wake word model
│ ├── scheduled_events.yaml # Optional: proactive utterances on a schedule
│ ├── *.pth, *.index # Optional: RVC voice conversion models
│ └── filler_audio/ # Pre-generated filler audio per device type
│ ├── teddy_ruxpin/ # PPM-control flavor
│ ├── headless/ # Simple stereo
│ └── squawkers_mccaw/
├── scripts/
│ ├── generate_fillers.py # Generate personality filler audio
│ ├── test_microphone.py # Microphone testing utility
│ ├── test_channels.py # PPM channel inspection
│ ├── benchmark_rvc.py # RVC inference timing
│ ├── install_rvc.sh # One-shot RVC install for Python 3.10
│ ├── generate_toc.py # Regenerate the table of contents in this README
│ ├── supervisor.py # Process supervisor for unattended deployments
│ ├── jf-sebastian.plist # launchd template (macOS)
│ └── jf-sebastian.service # systemd user-unit template (Linux)
├── tests/
│ ├── config/ # Settings tests
│ ├── devices/ # Device factory + per-device tests
│ ├── modules/ # State machine, scheduler, etc.
│ ├── personalities/ # Personality loading + validation
│ ├── scripts/ # Supervisor unit + integration tests
│ └── utils/ # Weather, news, heartbeat, audio utilities
├── docs/
│ ├── ARCHITECTURE.md
│ ├── CREATING_PERSONALITIES.md
│ ├── JETSON_DEPLOYMENT.md
│ ├── QUICKSTART.md
│ └── TRAIN_WAKE_WORDS.md
├── CLAUDE.md # Guidance for AI coding tools working in this repo
├── ROADMAP.md # Tier 1-3 prioritization from the 7-persona codebase review
├── requirements.txt
├── requirements-rvc.txt # Optional RVC dependencies (Python 3.10 only)
├── .env.example
├── run.sh # Convenience launcher
├── setup.sh # Automated installer
├── LICENSE
└── README.md
# Install development dependencies
pip install -r requirements.txt
# Run tests
pytest tests/The modular architecture makes it easy to extend:
- New personalities: See docs/CREATING_PERSONALITIES.md for a complete guide
- Custom wake words: Train using OpenWakeWord (see docs/TRAIN_WAKE_WORDS.md)
- Different filler phrases: Edit your personality's
filler_phrasesin the YAML file - Improved lip sync: Adjust syllable detection in
ppm_generator.py - Alternative PPM timing: Modify timing parameters in
PPMGenerator.__init__()
Target latencies (typical):
- Wake word detection: <500ms
- Filler phrase playback: Immediate (0.5-second natural pause + 8-15 second filler)
- Speech transcription: 1-2 seconds (during filler)
- GPT-4o-mini response: 1-2 seconds (during filler)
- TTS synthesis: 1-2 seconds (during filler)
- Total response time: Feels nearly instant due to fillers, actual processing 4-6 seconds
OpenAI API usage (approximate):
- Whisper: $0.006 per minute of audio
- GPT-4o-mini: $0.001-0.005 per conversation turn (much cheaper than GPT-4o)
- TTS: $0.015 per 1000 characters
Typical conversation (10 exchanges): ~$0.20-0.40
This project is named after J.F. Sebastian, the genetic designer from Blade Runner (1982) who creates synthetic companions in his lonely apartment. Like Sebastian, this project is about bringing personality and life to inanimate friends.
"I think, Sebastian, therefore I am." — Pris
This project is licensed under the MIT License - see the LICENSE file for details.
Teddy Ruxpin is a trademark of Wicked Cool Toys. This project is not affiliated with or endorsed by Wicked Cool Toys.
- Named after J.F. Sebastian from Blade Runner (1982)
- Built with Python, OpenAI APIs, and OpenWakeWord
- Inspired by the classic 1985 Teddy Ruxpin animatronic
- Uses VADER sentiment analysis, syllable-based lip sync, and Silero VAD
- PPM format based on analysis of Svengali and original Teddy Ruxpin tapes
Contributions welcome! Areas for improvement:
- Additional personalities (scientists, artists, historians, etc.)
- New output device types (Cricket, Grubby, other animatronics)
- Local wake word detection (OpenWakeWord)
- Local Whisper (whisper.cpp)
- Phoneme-based lip sync (more precise than syllables)
- Web interface for monitoring conversations
- Real-time PPM waveform visualization
- Multi-language support
For issues or questions:
- Check the Troubleshooting section
- Review logs in
jf_sebastian.log - Enable debug mode for detailed diagnostics
- Open an issue on GitHub
"It's not an easy thing to meet your maker." — Roy Batty
Make friends. Make them talk. 🤖✨