fast-inference

Star

Here are 21 public repositories matching this topic...

foolwood / pytorch-slimming

Star

Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.

deep-learning pytorch weight-pruning l1-regularization fast-inference

Updated May 13, 2019
Python

aredden / flux-fp8-api

Star

Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.

flux pytorch quantization diffusion fast-inference fp8

Updated Oct 12, 2024
Python

hao-ai-lab / d3LLM

Star

d3LLM: Ultra-Fast Diffusion LLM 🚀

inference diffusion efficient-algorithm post-training fast-inference diffusion-models large-language-models llm text-diffusion diffusion-language-models dllm

Updated Feb 4, 2026
Python

romsto / Speculative-Decoding

Star

Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.

fast-inference llm llm-inference speculative-decoding llm-optimization

Updated Dec 2, 2024
Python

kssteven418 / BigLittleDecoder

Star

[NeurIPS'23] Speculative Decoding with Big Little Decoder

decoding efficient-inference speculative-execution fast-inference llm speculative-decoding

Updated Feb 6, 2024
Python

Gilfeather / furnace

Star

🔥 Blazingly fast ML inference server powered by Rust and Burn framework

rust high-performance deep http-server production-ready burn machine-le fast-inference ml-serving inference-ser

Updated Jul 25, 2025
Rust

JIA-Lab-research / Q-LLM

Star

This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"

fast-inference inference-acceleration large-language-models long-context kv-cache-compression

Updated Jul 16, 2024
Python

lim142857 / Sparsifiner

Star

Demo code for CVPR2023 paper "Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers"

attention-mechanism fast-inference sparse-neural-networks low-rank vision-transformer efficient-transformers sparse-attention efficient-vision-transformers

Updated Jul 4, 2023
Python

Academich / translation-transformer

Star

An implementation of the encoder-decoder transformer for SMILES-to-SMILES translation tasks with inference accelerated by speculative decoding

chemistry transformer fast-inference reaction-prediction single-step-retrosynthesis

Updated Jul 19, 2025
Python

szemenyeim / RoboDNN

Star

Fast Forward-Only Deep Neural Network Library for the Nao Robots

library deep-neural-networks deep-learning neural-network robocup pruning fast-inference nao-robots

Updated Jun 6, 2019
C++

DevRafa2007 / jurisia

Star

AI-powered legal assistant for Brazilian lawyers, built with Groq to deliver fast, accurate insights and document support.

nlp ai brazil document-analysis legaltech lawyers fast-inference legal-ai groq legal-assistant

Updated Sep 17, 2025
TypeScript

NeptuneHub / AudioMuse-AI-DCLAP

Star

AudioMuse-AI-DCLAP is a lightweight, high-speed distilled version of LAION CLAP, designed for fast and efficient text-to-music search

Updated Mar 11, 2026
Python

u-hyszk / japanese-speculative-decoding

Star

Verification of the effect of speculative decoding in Japanese.

nlp japanese fast-inference speculative-decoding

Updated Mar 4, 2024
Python

PopoDev / BiLD

Star

Reproducibility Project for [NeurIPS'23] Speculative Decoding with Big Little Decoder

reproducibility fast-inference llm speculative-decoding

Updated May 30, 2024
Python

MeoPBK / Fast_Inference_Classifiers

Star

Multilable fast inference classifiers (Ridge Regression and MLP) for NLPs with Sentence Embedder, K-Fold, Bootstrap and Boosting. NOTE: since the MLP (fully connected NN) Classifier was too heavy to be loaded, you can just compile it with the script.