Skip to content

Hello, we're Minish!

About us

We're an open-source lab, with a focus on Natural Language Processing. Minish is currently maintained by @pringled. The lab was originally founded by @pringled and @stephantul.

We believe that if you make models fast enough, you unlock new possibilities.

Using our models and packages, you can:

  • Embed the entire English Wikipedia in 5 minutes
  • Classify tens of thousands of documents per second on a CPU
  • Approximately deduplicate extremely large datasets in minutes
  • Build the fastest RAG application in the world
  • Easily evaluate which ANN algorithm works best for your data

Our projects:

  • model2vec: tiny static embedding models with state-of-the-art performance.
  • potion: the best small models in the world. 100-500x faster than a sentence-transformer, and almost as good.
  • vicinity: consistent interfaces to many approximate nearest neighbor algorithms.
  • semhash: lightning-fast, super accuracte, semantic deduplication and filtering for your text datasets.
  • model2vec-rs: a Rust port of model2vec.

You can also find us on:

Pinned Loading

  1. model2vec model2vec Public

    Fast State-of-the-Art Static Embeddings

    Python 1.9k 111

  2. semhash semhash Public

    Fast Semantic Text Deduplication & Filtering

    Python 846 51

  3. vicinity vicinity Public

    Lightweight Nearest Neighbors with Flexible Backends

    Python 316 10

  4. tokenlearn tokenlearn Public

    Pre-train Static Word Embeddings

    Python 91 8

  5. model2vec-rs model2vec-rs Public

    Official Rust Implementation of Model2Vec

    Rust 141 12

Repositories

Showing 10 of 10 repositories
  • docs Public
    MinishLab/docs’s past year of commit activity
    MDX 0 2 0 0 Updated Nov 24, 2025
  • model2vec Public

    Fast State-of-the-Art Static Embeddings

    MinishLab/model2vec’s past year of commit activity
    Python 1,915 MIT 111 3 0 Updated Nov 14, 2025
  • semhash Public

    Fast Semantic Text Deduplication & Filtering

    MinishLab/semhash’s past year of commit activity
    Python 846 MIT 51 0 0 Updated Oct 27, 2025
  • vicinity Public

    Lightweight Nearest Neighbors with Flexible Backends

    MinishLab/vicinity’s past year of commit activity
    Python 316 MIT 10 1 1 Updated Oct 5, 2025
  • model2vec-rs Public

    Official Rust Implementation of Model2Vec

    MinishLab/model2vec-rs’s past year of commit activity
    Rust 141 MIT 12 1 0 Updated Sep 29, 2025
  • evaluation Public

    Code to evaluate performance for embeddings

    MinishLab/evaluation’s past year of commit activity
    Python 12 MIT 0 0 0 Updated Sep 20, 2025
  • .github Public

    Readme

    MinishLab/.github’s past year of commit activity
    0 0 0 0 Updated Sep 14, 2025
  • tokenlearn Public

    Pre-train Static Word Embeddings

    MinishLab/tokenlearn’s past year of commit activity
    Python 91 MIT 8 1 0 Updated Sep 9, 2025
  • MinishLab/minishlab.github.io’s past year of commit activity
    SCSS 0 MIT 1 0 0 Updated Jun 1, 2025
  • watertemplate Public template

    Template

    MinishLab/watertemplate’s past year of commit activity
    Makefile 4 MIT 3 0 1 Updated Dec 9, 2024

Top languages

Loading…