Skip to content

pabl-o-ce/hf-exllama

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

86 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

title emoji colorFrom colorTo sdk sdk_version app_file pinned header fullWidth license short_description
Exllama
๐Ÿ˜ฝ
purple
indigo
gradio
5.29.0
app.py
false
mini
true
apache-2.0
Chat: exllama v2

Exllama Chat ๐Ÿ˜ฝ

Open In Spaces Apache 2.0

A Gradio-based chat interface for ExLlamaV2, featuring Mistral-7B-Instruct-v0.3 and Llama-3-70B-Instruct models. Experience high-performance inference on consumer GPUs with Flash Attention support.

๐ŸŒŸ Features

  • ๐Ÿš€ Powered by ExLlamaV2 inference library
  • ๐Ÿ’จ Flash Attention support for optimized performance
  • ๐ŸŽฏ Supports multiple instruction-tuned models:
    • Mistral-7B-Instruct v0.3
    • Meta's Llama-3-70B-Instruct
  • โšก Dynamic text generation with adjustable parameters
  • ๐ŸŽจ Clean, modern UI with dark mode support

๐ŸŽฎ Parameters

Customize your chat experience with these adjustable parameters:

  • System Message: Set the AI assistant's behavior and context
  • Max Tokens: Control response length (1-4096)
  • Temperature: Adjust response creativity (0.1-4.0)
  • Top-p: Fine-tune response diversity (0.1-1.0)
  • Top-k: Control vocabulary sampling (0-100)
  • Repetition Penalty: Prevent repetitive text (0.0-2.0)

๐Ÿ› ๏ธ Technical Details

  • Framework: Gradio 5.5.0
  • Models: ExLlamaV2-compatible models
  • UI: Custom-themed interface with Gradio's Soft theme
  • Optimization: Flash Attention for improved performance

๐Ÿ”— Links

๐Ÿ“ License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments


Made with โค๏ธ using ExLlamaV2 and Gradio

About

HuggingFace space with ExllamaV2

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages