A Discord bot powered by Google's Gemini AI, capable of engaging in conversations, processing various media types, generating spoken responses, and more. This bot is currently under development.
- Multimodal: Understands text, images, audio, videos, and documents.
- Enhanced video understanding for comprehensive analysis.
- Web page analysis for real-time information access.
- Function Calling: Uses Gemini's function calling feature for robust tool usage.
- Memory: Long-term memory per user.
- Only accessible by the respective user, in any server.
- Saved locally.
- Reset with by prompting the bot to forget everything.
- Text-to-Speech: Generate native Discord voice messages. Supports different speech styles.
- Request using natural language.
- Google Search & URL Context: Access Google Search or web URLs using native tools.
- Code Execution: Generate and run Python code to aid responses.
- Discord Event Management: Create and manage scheduled events directly within Discord.
- Image Generation: Generate new images based on textual descriptions.
- Project Diagnosis: Inspect the bot's own project structure and file contents.
- Memory: Long-term memory per user.
- Context-Aware: Understands message reply chains along with any attachments.
- Dynamic Interaction: Adapts its responses if the user edits or deletes their messages.
- Re-evaluates messages if edited or deleted.
- Injects Discord environment context into prompts for grounded responses.
- Processes attachments from replied messages for complete understanding.
- Automatically generates concise, relevant titles for threads created from long responses.
- Direct Messages (DMs): Responds to every message sent in a direct message channel.
- Server Channels: Responds when mentioned (
@<BotName>or replied to with pinging enabled).
- React to the bot's message with the retry emoji
🔄to regenerate its last response.
- React to your own message with the cancel emoji
🚫to cancel a response that is currently being generated.
This section outlines the critical processes for setting up, configuring, and running the application.
- Python 3.10+
- FFmpeg
git clone https://github.com/KuroZantetsuken/Bard.git
cd Bard# Set up a Virtual Environment (Recommended):
python3 -m venv .venv
source .venv/bin/activate
# Install Python Dependencies:
pip install -r requirements.txt- Copy
example.envto a new file named.env.cp example.env .env
- Open the
.envfile and fill in the required environment variables:DISCORD_BOT_TOKEN: Your Discord bot token.GEMINI_API_KEY: Your Google Gemini API key.
- Edit
personality.prompt.mdto define the bot's personality. capabilities.prompt.mdis highly optimized for the bot's capabilities, take care in editing it.- Discord Privileged Intents: Enable Presence Intent and Server Members Intent in the Discord Developer Portal.
This optional step is for pre-configuring the browser with custom settings, extensions, or other preferences. If you skip this, the project will automatically set up its own browser instance, but without any custom configurations.
python setup_browser.pyThis script launches a Chromium browser, allowing you to manually configure extensions or settings. When you close the browser, a data/browser directory is created, preserving your custom setup.
To run the bot, execute the main application script.
python3 src/main.pyTo streamline development, the bot supports hot-reloading, which automatically restarts the application when changes are detected in .py, .env, and .prompt.md files.
To run with hot-reloading enabled:
python3 src/hotload.py