Inferless
Popular repositories Loading
-
triton-co-pilot
triton-co-pilot PublicGenerate Glue Code in seconds to simplify your Nvidia Triton Inference Server Deployments
-
qwq-32b-preview
qwq-32b-preview Public templateA 32B experimental reasoning model for advanced text generation and robust instruction following. <metadata> gpu: A100 | collections: ["vLLM"] </metadata>
-
whisper-large-v3
whisper-large-v3 Public templateState‑of‑the‑art speech recognition model for English, delivering transcription accuracy across diverse audio scenarios. <metadata> gpu: T4 | collections: ["CTranslate2"] </metadata>
-
deepseek-r1-distill-qwen-32b
deepseek-r1-distill-qwen-32b Public templateA distilled DeepSeek-R1 variant built on Qwen2.5-32B, fine-tuned with curated data for enhanced performance and efficiency. <metadata> gpu: A100 | collections: ["vLLM"] </metadata>
Repositories
- qwen3-14b Public
14B model with hybrid approach to problem-solving with two distinct modes: "thinking mode," which enables step-by-step reasoning and "non-thinking mode," designed for rapid, general-purpose responses. <metadata> gpu: A100 | collections: ["vLLM"] </metadata>
inferless/qwen3-14b’s past year of commit activity - qwen2.5-omni-7b Public template
An advanced end-to-end multimodal which can processes text, images, audio, and video inputs, generating real-time text and natural speech responses. <metadata> gpu: A100 | collections: ["HF Transformers"] </metadata>
inferless/qwen2.5-omni-7b’s past year of commit activity - qwen3-8b Public template
Qwen3-8B is a language model that supports seamless switching between “thinking” mode-for advanced math, coding, and logical inference-and “non-thinking” mode for fast, natural conversation. <metadata> gpu: A100 | collections: ["HF Transformers"] </metadata>
inferless/qwen3-8b’s past year of commit activity - MCP-Google-Map-Agent Public
inferless/MCP-Google-Map-Agent’s past year of commit activity - phi-4-multimodal-instruct Public template
State‑of‑the‑art multimodal foundation model developed by Microsoft Research which seamlessly fuses robust language understanding with advanced visual and audio analysis. <metadata> gpu: A100 | collections: ["HF Transformers"] </metadata>
inferless/phi-4-multimodal-instruct’s past year of commit activity - stable-diffusion-3.5-large Public
8B model, excels in producing high-quality, detailed images up to 1 megapixel in resolution. <metadata> gpu: A100 | collections: ["Diffusers"] </metadata>
inferless/stable-diffusion-3.5-large’s past year of commit activity - phi-4-GGUF Public template
A 14B model optimized in GGUF format for efficient inference, designed to excel in complex reasoning tasks. <metadata> gpu: A100 | collections: ["llama.cpp","GGUF"] </metadata>
inferless/phi-4-GGUF’s past year of commit activity - tinyllama-1-1b-chat-v1-0 Public template
A chat model fine-tuned on TinyLlama, a compact 1.1B Llama model pretrained on 3 trillion tokens. <metadata> gpu: T4 | collections: ["vLLM"] </metadata>
inferless/tinyllama-1-1b-chat-v1-0’s past year of commit activity - llama-2-13b-chat-hf Public template
A 13B model fine-tuned with reinforcement learning from human feedback, part of Meta’s Llama 2 family for dialogue tasks. <metadata> gpu: A100 | collections: ["HF Transformers"] </metadata>
inferless/llama-2-13b-chat-hf’s past year of commit activity
Top languages
Loading…