Skip to content

Refactor: migrate core stack from OpenAI to Groq and Edge-TTS#875

Closed
tmvalijib24 wants to merge 1 commit into
Shubhamsaboo:mainfrom
tmvalijib24:refactor/migrate-to-groq-edgetts
Closed

Refactor: migrate core stack from OpenAI to Groq and Edge-TTS#875
tmvalijib24 wants to merge 1 commit into
Shubhamsaboo:mainfrom
tmvalijib24:refactor/migrate-to-groq-edgetts

Conversation

@tmvalijib24

Copy link
Copy Markdown

📝 Summary

This PR completely refactors the Voice RAG Agent to remove all dependencies on the paid OpenAI API, transitioning the project to a 100% free and lightning-fast stack.

The text generation "brain" is now powered by Groq (using the llama-3.1-8b-instant model), and the audio generation "voice" is now handled by edge-tts (Microsoft Edge's free Neural TTS API). Additionally, this refactor simplifies the application architecture by removing the reliance on the custom local agents module, handling the API logic directly within the main script for better stability and readability.

🛠️ Key Changes

  • LLM Migration: Replaced the openai SDK with the groq Python client.
  • Model Upgrade: Configured the agent to use llama-3.1-8b-instant for high-speed, large-context RAG generation.
  • TTS Migration: Replaced OpenAI's TTS with edge-tts for asynchronous, high-quality audio generation without requiring API keys.
  • Architecture Simplification: Removed imports and dependencies on the local agents.py (Agent/Runner classes), integrating the prompting logic directly into process_query().
  • UI Updates: Updated the Streamlit sidebar configuration fields to prompt for a Groq API Key instead of an OpenAI key. Voice selection options have been updated to match edge-tts formats (e.g., en-US-ChristopherNeural).
  • Documentation: Completely rewrote the README.md to accurately reflect the zero-cost architecture, updated environment variables, and new setup instructions.

⚠️ Breaking Changes

  • Environment Variables: OPENAI_API_KEY is no longer used. Users must generate a free API key from the Groq Console and update their .env file to use GROQ_API_KEY.
  • Local Modules: The local agents module is deprecated in this branch.

✅ Testing / Verification

  1. Pulled the branch and installed new dependencies (pip install groq edge-tts).
  2. Passed the Groq API key via the Streamlit sidebar.
  3. Successfully uploaded and processed a PDF into Qdrant.
  4. Queried the document, verified that Groq generated an accurate text response based on the context.
  5. Verified that edge-tts successfully generated and streamed the .mp3 audio response in the Streamlit UI.

Images

image image image
rag_voice.webm

- Replace OpenAI API with Groq client using 'llama-3.1-8b-instant'.
- Replace paid OpenAI TTS with 'edge-tts' for free audio synthesis.
- Remove reliance on custom local 'agents' module for stability.
@0xreconlion

Copy link
Copy Markdown

excellent work

Copy link
Copy Markdown
Owner

Thanks for the thorough work here, the migration is clearly well executed.

That said, we're going to pass on this one. The Voice RAG Agent is an existing tutorial built deliberately around the OpenAI stack, and this PR swaps out the entire core (LLM, TTS, and the local agents module) to a different stack. That changes the intent of a shipped example rather than adding something new, and we'd rather keep the original tutorial as-is for people following it.

If you'd like to contribute a Groq + edge-tts voice agent, a brand new self-contained example in its own folder would be very welcome instead of replacing this one. Closing this for now.


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants