JARVIS is an AI-powered personal assistant designed to bridge the gap between intelligence and execution. Unlike conventional assistants that stop at conversation, JARVIS extends its capabilities into your local operating system - opening applications, managing system controls, navigating the web, and responding visually to the world around it.
At its core, JARVIS is a modular, locally-aware AI system that combines cloud-level reasoning with on-device control. It listens, understands, decides, and does things, bringing the concept of a truly functional AI assistant closer to reality.
| Category | Technologies |
|---|---|
| Programming Languages | |
| AI / LLM | |
| Vision Models | |
| Memory / Vector DB | |
| Web Search | |
| Backend Framework | |
| Frontend Framework | |
| Local Automation | |
| Tools | |
| Deployment / Runtime |
JARVIS operates on a modular architecture designed for speed, privacy, and extensibility.
The system is divided into three core components: the AI Brain Layer, Vision Module, and Local Agent.
graph TD
User([π€ User Input]) --> Brain
Brain[π§ AI Brain Layer] -->|Text Query| Chat[π¬ Response]
Brain -->|Visual Data| Vision[ποΈ Vision Module]
Brain -->|System Command| Agent[βοΈ Local Agent]
Vision --Description--> Brain
Agent --Action--> OS[π» System / OS]
JARVIS is built on three primary components that work together to understand, analyze, and execute user commands.
The intelligence core that thinks, decides, and orchestrates actions.
Powered by: Groq LLM
The AI Brain acts as the central decision engine. It interprets user input, understands intent, and determines the appropriate course of action.
Responsibilities:
- Natural Language Understanding: Interprets complex and conversational queries.
- Decision Engine: Chooses whether to respond, analyze visuals, or execute a command.
- Task Routing: Directs requests to the Vision Module or Local Agent.
The perception system that allows JARVIS to βseeβ and understand images.
Powered by: BLIP (Image Captioning)
This module converts visual input into meaningful textual descriptions that the AI Brain can process and reason about.
Capabilities:
- Image-to-Text Conversion: Generates accurate image descriptions.
- Visual Question Answering (VQA): Answers questions based on image content.
- Scene Analysis: Identifies objects, context, and relationships.
The action layer that interacts directly with the operating system.
Runs on: Local Host (Low latency & secure)
The Local Agent transforms decisions into real system actions, enabling JARVIS to control applications and system settings.
Functions:
- Application Control: Open, close, and manage programs.
- System Adjustments: Modify volume, brightness, and power settings.
- Web Automation: Launch websites and perform browser tasks.
JARVIS/
βββ backend/ # Main Server & Logic
β βββ brain/ # AI Intelligence Modules
β β βββ llm_services.py # Connects to LLM (Groq/Ollama)
β β βββ local_multimodal.py # Image recognition logic
β β βββ memory_manager.py # Handles chat history & context
β β βββ speech_services.py # STT and TTS handlers
β β βββ web_search.py # Google Search integration
β βββ chroma_db/ # Vector Database for Long-term memory
β βββ main.py # FastAPI Entry Point (Run this to start)
β βββ auth.py # User Authentication & Security
β βββ agent.exe # Compiled Local Agent executable
β βββ users.db # User database
β
βββ frontend/ # User Interface (React + Vite)
β βββ src/
β β βββ components/
β β β βββ ChatInterface.tsx # Main chat window
β β β βββ Login.tsx # Authentication screen
β β β βββ Sidebar.tsx # Chat history navigation
β β βββ api.ts # Connection to Backend
β β βββ App.tsx # Main Application Layout
β β βββ main.tsx # Frontend Entry Point
β βββ package.json
β βββ vite.config.ts
β
βββ local_agent/ # OS Control Source Code
β βββ agent.py # Websocket client for OS commands
β βββ os_controller.py # Logic to open apps/control system
β
βββ voices/ # Audio Assets
β βββ jarvis_voice.wav # Reference audio for voice cloning
β
βββ requirements.txt # Python Dependencies
βββ README.md # Project Documentation
Jarvis can be accessed by the link. However voice input and image captioning features cannot be accessed on the hosted site due to memory limits of hosting backend. Follow the steps below to access the Local Agent for local device task on Windows.
Download the agent.exe file.
This can done by clicking on the downloaded exe file or using command terminal
.\agent.exe #or the location of the downloaded fileFollow the steps below to run the project locally and use all its features.
git clone https://github.com/your-username/JARVIS.git
cd JARVIScd backendpython -m venv venv
venv\Scripts\activatepip install -r requirements.txtCreate a .env file in the root directory and add:
GROQ_API_KEY=your_groq_api_key
SERPER_API_KEY=your_serper_api_keypython main.pyStart a new terminal and move to frontend
cd frontendInstall dependencies and run
npm install
npm run dev