llm-runtime-local

A collection of scripts and examples to run local LLM-based pipelines and Retrieval-Augmented Generation (RAG) experiments on local resources. This repository contains utilities for downloading models, creating embeddings, building local retrieval stores, and running simple RAG demos for text, SQL, images, audio and video.

TL;DR

Try the quickstart to run a local RAG demo.
Explore the example scripts to adapt pipelines for your models and storage backend.

Contents / Key files

localrag.py — Minimal local RAG demo / coordinator script.
llmforsql.py — Example integration for LLMs and SQL.
sql-and-rag/ — SQL + RAG examples and helpers.
videorag.py — Example pipeline for video → embeddings → RAG.
qwenvisionlanguagemodel.py — Vision + language example for Qwen-like models.
hfdownloader/ — Utilities for downloading models from Hugging Face with your hftoken
milvusdb/ — Example / helpers for Milvus vector DB integration.
vision rag/ — Image/vision RAG examples.
docs&imagestovoiceast.py -Image and document rag with voice output and reranker model
qwen3multimediaembeddings.ipynb — Notebook for multimedia embeddings (one embeddings for text,image,video unlike a different pipeline for all).
graphrag-langextract-vllm/ --An implementation of GraphRAG with help of langextract library and vLLM.

Note: The repository currently contains multiple example scripts. Read the top of each script to learn required dependencies and configurable options its advisable for running scripts in colab

Requirements

Python 3.9+
Typical Python dependencies (install per-script or project requirements). Common packages used in this ecosystem:
- torch
- transformers
- sentence-transformers or other embedding libs
- faiss-cpu or a vector DB client (Milvus client if using Milvus)
- numpy, pandas, torchvision (for vision examples)
- langchain
- langraph
GPU recommended for larger models

Quickstart (example workflow)

Clone the repo

git clone https://github.com/Dhanush-sai-reddy/llm-runtime-local.git
cd llm-runtime-local

Install dependencies (example)

python -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install torch transformers sentence-transformers faiss-cpu numpy

Download a model or weights
- Use the scripts under hfdownloader/ or your preferred method to fetch model weights.
- and specify model paths locally/in colab
Prepare data & embeddings
- Run a script or notebook (e.g., qwen3multimediaembeddings.ipynb) to generate embeddings and store them in a vector index (FAISS, Milvus, etc).
Run a local RAG demo
```
python localrag.py
```
- Check the top of the script for available flags (model path, index path, etc).

Notes:

For Milvus usage, see the milvusdb/ helper files and ensure the Milvus server is running before connecting.

Examples of how scripts fit together

hfdownloader/ → download model weights (uses docker)
Embedding scripts / notebooks → create dense vectors for documents or multimedia
Vector DB (FAISS / Milvus) → store and index embeddings
localrag.py / videorag.py → query embeddings, fetch context, and run the local LLM to synthesize answers

Contributing

Contributions welcome. Suggested workflow:

Fork the repo
Create a branch: feat/readme-improvements
Make changes and submit a PR with a clear description and examples

Troubleshooting

Model download errors: check authentication for private Hugging Face models or large file timeouts.
OOM on large models: use smaller weights or enable CPU offload or quantization methods (bitsandbytes/quantization).
Vector DB connection problems: confirm the DB server is running and the client versions are compatible.

Contact

Repo owner: Dhanush-sai-reddy — open an issue for questions or feature requests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-runtime-local

Contents / Key files

Requirements

Quickstart (example workflow)

Examples of how scripts fit together

Contributing

Troubleshooting

Contact

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

llm-runtime-local

Contents / Key files

Requirements

Quickstart (example workflow)

Examples of how scripts fit together

Contributing

Troubleshooting

Contact