This chatbot is powered by the LLaMA 3 model and runs entirely on CPU using ctransformers
.
It is designed to be lightweight, efficient, and responsive while providing an interactive Gradio UI for seamless conversations.
β
Runs on CPU β No GPU required, making it accessible on standard hardware
β
Optimized with ctransformers
β Faster inference on CPUs
β
Concise & direct responses β Avoids unnecessary small talk
β
Interactive Gradio UI β Easy-to-use web interface
β
Maintains chat history β Context-aware responses
To run this chatbot locally, follow these steps:
Ensure you have Python 3.8+ installed, then run:
pip install gradio ctransformers
You need the LLaMA 3 GGUF model. Download it from TheBloke's Hugging Face repository. Move the .gguf model file to your project directory.
python app.py
Model Used: LLaMA 3 (8B) - Quantized (Q4_K_M)
Why CPU?: This chatbot is optimized to run without a GPU, making it accessible to more users.
Optimization: Adjusted temperature, response length, and stop tokens for more accurate answers.