This project is a playground for building a Multimodal AI Assistant with a Flask backend and a React frontend. The project currently supports real-time messaging, PDF file uploads for context, and speech (both text-to-speech/speech-to-text).
- Real-time messaging
- PDF File Upload for context
- Speech (both text-to-speech/speech-to-text)
- System Prompt defined for AI Assistant
- Conversation context stored in session cookie
- Visual feedback with thinking dots during AI response generation
- AI assistant's responses are typed as a human would do it, enhancing the human touch and usability
- Backend: Python, Flask, Redis, Azure OpenAI GPT-4 Omni
- Frontend: React, Tailwind CSS, TypeScript
- Speech: Microsoft Cognitive Services Speech SDK
Create a .env file with the following variables:
FLASK_ENV=development
CHOKIDAR_USEPOLLING=true
AZURE_OPENAI_API_KEY=your_azure_openai_api_key
AZURE_OPENAI_ENDPOINT=your_azure_openai_endpoint
AZURE_SPEECH_KEY=your_azure_speech_key
AZURE_SPEECH_REGION=your_azure_speech_region
REDIS_HOST=your_redis_host (default: localhost)
REDIS_PORT=your_redis_port (default: 6379)
REACT_APP_BACKEND_URL=http://localhost:5000
REACT_APP_AZURE_SPEECH_KEY=your_azure_speech_key
REACT_APP_AZURE_SPEECH_REGION=your_azure_speech_region
multimodal-ai-assistant
├── assets
│ └── ai-assistant-screenshot.png
├── backend
│ ├── app
│ │ ├── __init__.py
│ │ ├── chat.py
│ │ ├── config.py
│ │ ├── file.py
│ │ ├── speech.py
│ └── requirements.txt
├── frontend
│ ├── public
│ │ ├── index.html
│ └── src
│ ├── components
│ │ ├── Chat.tsx
│ │ ├── ChatInput.tsx
│ │ ├── Message.tsx
│ │ ├── FileUpload.tsx
│ │ └── Chat.css
│ ├── hooks
│ │ └── useSpeech.ts
│ ├── services
│ │ └── api.ts
│ ├── App.tsx
│ ├── index.tsx
│ ├── index.css
│ └── setupProxy.js
├── docker-compose.yml
├── .env
└── README.md
- Navigate to the
backenddirectory. - Create a virtual environment:
python -m venv venv
- Activate the virtual environment:
- On Windows:
venv\Scripts\activate
- On macOS/Linux:
source venv/bin/activate
- On Windows:
- Install the required dependencies:
pip install -r requirements.txt
- Run the Flask application:
python run.py
- Navigate to the
frontenddirectory. - Install the required dependencies:
npm install
- Start the React application:
npm start
- Ensure Docker is installed and running on your machine.
- Navigate to the root directory of the project.
- Build and start the services using Docker Compose:
docker-compose up --build
The AI Assistant is initialized with a system prompt that defines its behavior and response style. This prompt ensures that the assistant provides concise and actionable insights during conversations.
The AI Assistant maintains the conversation context by storing the conversation history in a session cookie. This allows the assistant to provide coherent and contextually relevant responses throughout the interaction.
sequenceDiagram
participant User
participant Frontend
participant Backend
participant AzureSpeech
participant OpenAI
User->>Frontend: Voice Input
Frontend->>AzureSpeech: Recognize Speech
AzureSpeech->>Frontend: Speech to Text
Frontend->>Backend: Send Text Message
Backend->>OpenAI: Send Message to OpenAI
OpenAI->>Backend: OpenAI Response
Backend->>Frontend: Send Text Response
Frontend->>AzureSpeech: Synthesize Speech
AzureSpeech->>Frontend: Text to Speech
Frontend->>User: Audio Output
sequenceDiagram
participant User
participant Frontend
participant Backend
participant AzureOpenAI
User->>Frontend: Upload PDF
Frontend->>Backend: Send PDF File
Backend->>Backend: Extract Text from PDF
Backend->>AzureOpenAI: Send Text to OpenAI
AzureOpenAI->>Backend: OpenAI Response
Backend->>Frontend: Send Response
Frontend->>User: Display Response
This project is open-source and available under the MIT License.
