Krasis is a Hybrid LLM runtime which focuses on efficient running of larger models on consumer grade VRAM limited hardware
-
Updated
May 23, 2026 - C++
Krasis is a Hybrid LLM runtime which focuses on efficient running of larger models on consumer grade VRAM limited hardware
Android inference engine running 20B+ parameter LLMs on 4GB-8GB RAM devices. Features proprietary Layer-by-Layer (LBL) streaming, zero-copy mmap loading, and native C++/Kotlin architecture.
Convert and quantize llm models
Splinter is a successful advanced AI research project to cohabitate inference and semantic governance in L3 cache and memory lanes, while simultaneously providing an attempt at standardized local POSIX-friendly tooling as building blocks on top of the provided library.
Make your digital brain inside your computer
A simple Gradio app for local translation using the GGUF versions of MADLAD-400
Privacy-first Local RAG Server: Chat with PDF & DOCX using GGUF models via llama.cpp and Qdrant. A lightweight, standalone FastAPI server with a clean HTML UI. High-performance, fully offline document intelligence. No Ollama, no cloud, no API keys.
Nectar-X-Studio is a powerful, Local AI-Inferencing application that allows the user download, create, run agents and run large language models on their own machine. With no internet connection required, Nectar ensures privacy-first, high-performance inference using cutting-edge open-source models from Hugging Face, Ollama, and beyond.
Containerized LLM for any use-case big or small
AI tool to help users research using local LLMs and automated web search.
GGUF file format for dotnet
Add a description, image, and links to the gguf-model-support topic page so that developers can more easily learn about it.
To associate your repository with the gguf-model-support topic, visit your repo's landing page and select "manage topics."