Welcome to the Jarvis AI Assistant project! ๐๏ธ This AI-powered assistant can perform various tasks such as providing weather reports ๐ฆ๏ธ, summarizing news ๐ฐ, sending emails ๐ง , CAG , and more, all through voice commands. Below, you'll find detailed instructions on how to set up, use, and interact with this assistant. ๐ง
โ
 Voice Activation: activate listening mode. ๐ค
โ
 Speech Recognition: Recognizes and processes user commands via speech input. ๐ฃ๏ธ
โ
 AI Responses: Provides responses using AI-generated text-to-speech output. ๐ถ
โ
 Task Execution: Handles multiple tasks, including:
- 
๐ง Sending emails
 - 
๐ฆ๏ธ Summarizing weather reports
 - 
๐ Data Analysis using csv*
 - 
๐ง๐ปโ๐ป Pesonalize chat
 - 
๐ฐ Reading news headlines
 - 
๐ผ๏ธ Image generation
 - 
๐ฆ Database functions
 - 
๐ฑ Phone call automation using ADB
 - 
๐ค AI-based task execution
 - 
๐ก Automate websites & applications
 - 
๐๏ธ Image processing Using gemini
Image Source: Upload URL Camera
Select Action: Basic Detection Object Detection Segmentation Resize
 - 
๐ง Retrieval-Augmented Generation (RAG) for knowledge-based interactions on various topics
 - 
โ Timeout Handling: Automatically deactivates listening mode after 5 minutes of inactivity. โณ
 - 
โ Automatic Input Processing: If no "stop" command is detected within 60 seconds, input is finalized and sent to the AI model for processing. โ๏ธ
 - 
โ Multiple Function Calls: Call multiple functions simultaneously, even if their inputs and outputs are unrelated. ๐
 
Before running the project, ensure you have the following installed:
โ
 Python 3.9 or later ๐
โ
 Required libraries (listed in requirements.txt) ๐
- 
Create a
.envfile in the root directory of the project. - 
Add your API keys and other configuration variables to the
.envfile: 
  author_name="[email protected]"
  weather_link="https://rapidapi.com/weatherapi/api/weatherapi-com"
  news_link="https://newsapi.org"
  name="ganeshnikhil"
  Rag_model="granite3.1-dense:2b"
  Chat_model="granite3.1-dense:2b"
  Function_call_model="gemma3:4b"
  Text_to_info_model="gemma3:4b"
  Image_to_text="llava:7b"
  Embedding_model="nomic-embed-text"
  genai_key=""
  Sender_email="[email protected]"
  Receiver_email=""
  Password_email=""
  Weather_api=""
  News_api=""
  Country="in"
  DEVICE_IP=""
  CSV_PATH="./DATA/business-employment-data-dec-2024-quarter.csv"
  UI="on"
  Yt_path="./DATA/youtube_video/"2 . Install system requriements
bash ./intialize.sh
- 
Setup API Keys & Passwords :
- ๐ฉ๏ธ WEATHER API - Get weather data.
 - ๐ฐ NEWS API - Fetch latest news headlines.
 - ๐ง GMAIL PASSWORD - Generate an app password for sending emails.
 - ๐ง  OLLAMA - Download models from Ollama (manual steup) .
install Models from ollama
ollama run gemma3:4b ollama run granite3.1-dense:2b ollama pull nomic-embed-text - portaudio - download portaudio to work with sound.
 - ๐ฎ GEMINI AI - API access for function execution.
 
 
  Model
    architecture        gemma3    
    parameters          4.3B      
    context length      8192      
    embedding length    2560      
    quantization        Q4_K_M    
  Parameters
    stop           "<end_of_turn>"    
    temperature    0.1                
  License
    Gemma Terms of Use                  
    Last modified: February 21, 2024
  Model
    architecture        granite    
    parameters          2.5B       
    context length      131072     
    embedding length    2048       
    quantization        Q4_K_M     
  System
    Knowledge Cutoff Date: April 2024.    
    You are Granite, developed by IBM.    
  License
    Apache License               
    Version 2.0, January 2004
gemini-2.0-flash
   Audio, images, videos, and text	Text, images (experimental), and audio (coming soon)	Next generation features, speed, thinking, realtime streaming, and     multimodal generation
gemini-2.0-flash-lite
   Audio, images, videos, and text	Text	A Gemini 2.0 Flash model optimized for cost efficiency and low latency
gemini-2.0-pro-exp-02-05
   Audio, images, videos, and text	Text	Our most powerful Gemini 2.0 model
gemini-1.5-flash
   Audio, images, videos, and text	Text	Fast and versatile performance across a diverse variety of tasks
 git clone https://github.com/ganeshnikhil/J.A.R.V.I.S.2.0.git
 cd J.A.R.V.I.S.2.0 pip install -r requirements.txt streamlit run ui.py๐ Transitioned to Gemini AI-powered function calling, allowing multiple function calls simultaneously for better efficiency! โ๏ธ If Gemini AI fails to generate function calls, the system automatically falls back to an Ollama-based model for reliable execution.ย
๐น AI Model Used: Gemini AI ๐ง 
โ
 Higher accuracy โ
 Structured data processing โ
 Reliable AI-driven interactions
๐ก Retrieval-Augmented Generation (RAG) dynamically loads relevant markdown-based knowledge files based on the queried topic, reducing hallucinations and improving response accuracy.
๐น Integrated Android Debug Bridge (ADB) to enable voice-controlled phone automation! ๐๏ธ
โ
 Make phone calls โ๏ธ
โ
 Open apps & toggle settings ๐ฒ
โ
 Access phone data & remote operations ๐ ๏ธ
๐ Windows
winget install --id=Google.AndroidSDKPlatformTools -e๐ Linux
sudo apt install adb๐ Mac
brew install android-platform-toolsโจ Deeper mobile integration ๐ฑ
โจ Advanced AI-driven automation ๐ค
โจ Improved NLP-based command execution ๐ง 
โจ Multi-modal interactions (text + voice + image) ๐ผ๏ธ
๐ Stay tuned for future updates! ๐ฅ
## Gemini Model Comparison
The following table provides a comparison of various Gemini models with respect to their rate limits:
| Model                                      | RPM  |    TPM    |  RPD  |
|-------------------------------------       |-----:|----------:| -----:|
| **Gemini 2.0 Flash**                       |  15  | 1,000,000 | 1,500 |
| **Gemini 2.0 Flash-Lite Preview**          |  30  | 1,000,000 | 1,500 |
| **Gemini 2.0 Pro Experimental 02-05**      |   2  | 1,000,000 |   50  |
| **Gemini 2.0 Flash Thinking Experimental** |  10  | 4,000,000 | 1,500 |
| **Gemini 1.5 Flash**                       |  15  | 1,000,000 | 1,500 |
| **Gemini 1.5 Flash-8B**                    |  15  | 1,000,000 | 1,500 |
| **Gemini 1.5 Pro**                         |   2  |   32,000  |   50  |
| **Imagen 3**                               |  --  |    --     |  --   |- RPM: Requests per minute
 - TPM: Tokens per minute
 - RPD: Requests per day
 
The focus of project is mostly on using small model and free (api)  models , get accurate agentic behaviours , to run these on low spec systems to.

