Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.
-
Updated
May 13, 2019 - Python
Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.
Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.
d3LLM: Ultra-Fast Diffusion LLM 🚀
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.
[NeurIPS'23] Speculative Decoding with Big Little Decoder
🔥 Blazingly fast ML inference server powered by Rust and Burn framework
This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
Demo code for CVPR2023 paper "Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers"
An implementation of the encoder-decoder transformer for SMILES-to-SMILES translation tasks with inference accelerated by speculative decoding
Fast Forward-Only Deep Neural Network Library for the Nao Robots
AI-powered legal assistant for Brazilian lawyers, built with Groq to deliver fast, accurate insights and document support.
AudioMuse-AI-DCLAP is a lightweight, high-speed distilled version of LAION CLAP, designed for fast and efficient text-to-music search
Verification of the effect of speculative decoding in Japanese.
Reproducibility Project for [NeurIPS'23] Speculative Decoding with Big Little Decoder
Multilable fast inference classifiers (Ridge Regression and MLP) for NLPs with Sentence Embedder, K-Fold, Bootstrap and Boosting. NOTE: since the MLP (fully connected NN) Classifier was too heavy to be loaded, you can just compile it with the script.
Fastest Text-to-Image Generator using fal ai.
The excellent Image captioning model using the DETR inspired architecture
A simple toxicity detector.
High-performance TUI dashboard to benchmark LLM latencies across free-tier providers and instantly hot-swap models for OpenCode agents.
⚡ Groq API client with ultra-fast LLM inference — LLaMA, Mixtral, Gemma support
Add a description, image, and links to the fast-inference topic page so that developers can more easily learn about it.
To associate your repository with the fast-inference topic, visit your repo's landing page and select "manage topics."