Skip to content

Latest commit

 

History

History
21 lines (13 loc) · 928 Bytes

File metadata and controls

21 lines (13 loc) · 928 Bytes

Model Information and Setup

Llama-2-7b-Chat (Quantized),

Particular model used is "llama-2-7b-chat.ggmlv3.q4_0.bin"

This is a specific version of Meta's Llama 2, optimized to run efficiently on consumer hardware (like a laptop CPU) rather than requiring expensive cloud GPUs.

Breakdown of the filename:

  • 7b-chat: The 7-billion parameter version, fine-tuned specifically for dialogue and conversational interactions.

  • gmlv3: A legacy format designed for high-performance CPU inference (requires compatible library versions, specific ctransformers == 0.2.5).

  • q4_0: "4-bit Quantization." This compresses the model's memory footprint significantly (making it ~4GB instead of ~14GB) while maintaining nearly all of its reasoning capabilities.

Downloaded from : Model installed from Hugging Face - TheBloke/Llama-2-7B-Chat-GGML