Skip to content

AvanishSalunke/JARVIS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

75 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

License: MIT

JARVIS is an AI-powered personal assistant designed to bridge the gap between intelligence and execution. Unlike conventional assistants that stop at conversation, JARVIS extends its capabilities into your local operating system - opening applications, managing system controls, navigating the web, and responding visually to the world around it.

At its core, JARVIS is a modular, locally-aware AI system that combines cloud-level reasoning with on-device control. It listens, understands, decides, and does things, bringing the concept of a truly functional AI assistant closer to reality.


image
image

Tech Stack

Category Technologies
Programming Languages Python
AI / LLM Groq
Vision Models BLIP
Memory / Vector DB ChromaDB
Web Search Serper
Backend Framework FastAPI
Frontend Framework React TypeScript
Local Automation os subprocess PyAutoGUI
Tools Git VS Code
Deployment / Runtime Uvicorn

πŸ—οΈ System Architecture

JARVIS operates on a modular architecture designed for speed, privacy, and extensibility.
The system is divided into three core components: the AI Brain Layer, Vision Module, and Local Agent.

graph TD
    User([πŸ‘€ User Input]) --> Brain
    Brain[🧠 AI Brain Layer] -->|Text Query| Chat[πŸ’¬ Response]
    Brain -->|Visual Data| Vision[πŸ‘οΈ Vision Module]
    Brain -->|System Command| Agent[βš™οΈ Local Agent]
    
    Vision --Description--> Brain
    Agent --Action--> OS[πŸ’» System / OS]
Loading

🧩 Core Modules

JARVIS is built on three primary components that work together to understand, analyze, and execute user commands.


🧠 1. AI Brain Layer

The intelligence core that thinks, decides, and orchestrates actions.

Powered by: Groq LLM

The AI Brain acts as the central decision engine. It interprets user input, understands intent, and determines the appropriate course of action.

Responsibilities:

  1. Natural Language Understanding: Interprets complex and conversational queries.
  2. Decision Engine: Chooses whether to respond, analyze visuals, or execute a command.
  3. Task Routing: Directs requests to the Vision Module or Local Agent.

πŸ‘οΈ 2. Vision Module

The perception system that allows JARVIS to β€œsee” and understand images.

Powered by: BLIP (Image Captioning)

This module converts visual input into meaningful textual descriptions that the AI Brain can process and reason about.

Capabilities:

  1. Image-to-Text Conversion: Generates accurate image descriptions.
  2. Visual Question Answering (VQA): Answers questions based on image content.
  3. Scene Analysis: Identifies objects, context, and relationships.

βš™οΈ 3. Local Agent

The action layer that interacts directly with the operating system.

Runs on: Local Host (Low latency & secure)

The Local Agent transforms decisions into real system actions, enabling JARVIS to control applications and system settings.

Functions:

  1. Application Control: Open, close, and manage programs.
  2. System Adjustments: Modify volume, brightness, and power settings.
  3. Web Automation: Launch websites and perform browser tasks.

πŸ“‚ Project Structure

JARVIS/
β”œβ”€β”€ backend/                        # Main Server & Logic
β”‚   β”œβ”€β”€ brain/                      # AI Intelligence Modules
β”‚   β”‚   β”œβ”€β”€ llm_services.py         # Connects to LLM (Groq/Ollama)
β”‚   β”‚   β”œβ”€β”€ local_multimodal.py     # Image recognition logic
β”‚   β”‚   β”œβ”€β”€ memory_manager.py       # Handles chat history & context
β”‚   β”‚   β”œβ”€β”€ speech_services.py      # STT and TTS handlers
β”‚   β”‚   └── web_search.py           # Google Search integration
β”‚   β”œβ”€β”€ chroma_db/                  # Vector Database for Long-term memory
β”‚   β”œβ”€β”€ main.py                     # FastAPI Entry Point (Run this to start)
β”‚   β”œβ”€β”€ auth.py                     # User Authentication & Security
β”‚   β”œβ”€β”€ agent.exe                   # Compiled Local Agent executable
β”‚   └── users.db                    # User database
β”‚
β”œβ”€β”€ frontend/                       # User Interface (React + Vite)
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”‚   β”œβ”€β”€ ChatInterface.tsx   # Main chat window
β”‚   β”‚   β”‚   β”œβ”€β”€ Login.tsx           # Authentication screen
β”‚   β”‚   β”‚   └── Sidebar.tsx         # Chat history navigation
β”‚   β”‚   β”œβ”€β”€ api.ts                  # Connection to Backend
β”‚   β”‚   β”œβ”€β”€ App.tsx                 # Main Application Layout
β”‚   β”‚   └── main.tsx                # Frontend Entry Point
β”‚   β”œβ”€β”€ package.json
β”‚   └── vite.config.ts
β”‚
β”œβ”€β”€ local_agent/                    # OS Control Source Code
β”‚   β”œβ”€β”€ agent.py                    # Websocket client for OS commands
β”‚   └── os_controller.py            # Logic to open apps/control system
β”‚
β”œβ”€β”€ voices/                         # Audio Assets
β”‚   └── jarvis_voice.wav            # Reference audio for voice cloning
β”‚
β”œβ”€β”€ requirements.txt                # Python Dependencies
└── README.md                       # Project Documentation

πŸš€ How to Use JARVIS

Jarvis can be accessed by the link. However voice input and image captioning features cannot be accessed on the hosted site due to memory limits of hosting backend. Follow the steps below to access the Local Agent for local device task on Windows.

1️⃣ Download the Agent.exe file

Download the agent.exe file.

2️⃣ Run the Agent.exe file

This can done by clicking on the downloaded exe file or using command terminal

.\agent.exe  #or the location of the downloaded file

πŸš€ How to Run JARVIS locally

Follow the steps below to run the project locally and use all its features.

1️⃣ Clone the Repository

git clone https://github.com/your-username/JARVIS.git
cd JARVIS

2️⃣ Move to backend directory

cd backend

3️⃣ Create & Activate Virtual Environment

python -m venv venv
venv\Scripts\activate

4️⃣ Install Dependencies

pip install -r requirements.txt

5️⃣ Set Environment Variables

Create a .env file in the root directory and add:

GROQ_API_KEY=your_groq_api_key
SERPER_API_KEY=your_serper_api_key

6️⃣ Run the Backend Server

python main.py

7️⃣ Start frontend

Start a new terminal and move to frontend

cd frontend

Install dependencies and run

npm install
npm run dev

🀝 Contributors

About

A multimodal AI desktop assistant capable of local OS control, image analysis, and natural voice interaction. Powered by Groq, FastAPI, and React.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors