Llama-2-7b-Chat (Quantized),
Particular model used is "llama-2-7b-chat.ggmlv3.q4_0.bin"
This is a specific version of Meta's Llama 2, optimized to run efficiently on consumer hardware (like a laptop CPU) rather than requiring expensive cloud GPUs.
Breakdown of the filename:
-
7b-chat: The 7-billion parameter version, fine-tuned specifically for dialogue and conversational interactions.
-
gmlv3: A legacy format designed for high-performance CPU inference (requires compatible library versions, specific ctransformers == 0.2.5).
-
q4_0: "4-bit Quantization." This compresses the model's memory footprint significantly (making it ~4GB instead of ~14GB) while maintaining nearly all of its reasoning capabilities.
Downloaded from : Model installed from Hugging Face - TheBloke/Llama-2-7B-Chat-GGML