This project runs a local Large Language Model (LLM) using llama-cpp-python and serves as a foundation for building an intelligent agent (e.g., for database CRUD tasks) in a fully offline environment.
llm-agent/
├── .venv/ # Python virtual environment (created with `venv`)
├── models/
│ └── mistral-7b-instruct-v0.1.Q4_K_M.gguf # Quantized Mistral model
├── app/
│ └── main.py # LLM inference interface (console)
├── requirements.txt # Project dependencies
Download the quantized model file from:
👉 TheBloke/Mistral-7B-Instruct-v0.1-GGUF
Recommended model:
mistral-7b-instruct-v0.1.Q4_K_M.gguf
Place it inside the models/
directory.
Activate your virtual environment and run the chat script:
.\.venv\Scripts\Activate
python app\main.py
Example interaction:
You: What is the capital of Japan?
LLM: The capital of Japan is Tokyo.
Type exit
or quit
to end the session.
The required packages are listed in requirements.txt
.
To install them:
pip install -r requirements.txt
- Integrate FastAPI to expose an API for chat or database operations
- Convert LLM into an autonomous agent (e.g., using a reasoning loop or planner)
- Add support for database CRUD task execution via natural language
- Explore LangChain and other agentic frameworks for advanced capabilities
- Model: Mistral 7B Instruct (Quantized: Q4_K_M)
- Inference: Handled via
llama-cpp-python
with GPU acceleration (RTX 2060 compatible)
- Ensure your GPU has at least 6 GB of free VRAM to run the model comfortably
- Adjust
n_gpu_layers
inmain.py
based on your GPU specs