Skip to content
Change the repository type filter

All

    Repositories list

    • docs

      Public
      Documentation for all Unstructured products and libraries
      MDX
      257010Updated Jan 29, 2026Jan 29, 2026
    • notebooks

      Public
      Jupyter Notebook
      0200Updated Jan 29, 2026Jan 29, 2026
    • Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats fo…
      HTML
      1.1k14k18049Updated Jan 29, 2026Jan 29, 2026
    • HTML
      531065732Updated Jan 29, 2026Jan 29, 2026
    • A JavaScript/Typescript client for the Unstructured Platform API
      TypeScript
      155871Updated Jan 29, 2026Jan 29, 2026
    • A Python client for the Unstructured Platform API
      Python
      20112141Updated Jan 29, 2026Jan 29, 2026
    • Python
      1838683411Updated Jan 28, 2026Jan 28, 2026
    • Python
      742012315Updated Jan 28, 2026Jan 28, 2026
    • Python
      3604Updated Jan 27, 2026Jan 27, 2026
    • Store Dockerfiles and Packer configs for images to use as a base to build upon
      Shell
      3512Updated Jan 14, 2026Jan 14, 2026
    • Python
      3500Updated Dec 2, 2025Dec 2, 2025
    • Jupyter Notebook
      0000Updated Oct 6, 2025Oct 6, 2025
    • Two sources (S3, ElasticSearch) to RAG DB pipeline.
      Jupyter Notebook
      1101Updated Sep 15, 2025Sep 15, 2025
    • UNS-MCP

      Public
      Jupyter Notebook
      204001Updated Sep 9, 2025Sep 9, 2025
    • .github

      Public
      2021Updated Aug 20, 2025Aug 20, 2025
    • HTML
      1800Updated Jul 23, 2025Jul 23, 2025
    • Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and …
      Python
      9.7k4100Updated Mar 17, 2025Mar 17, 2025
    • A Python wrapper for Google Tesseract
      Python
      749400Updated Mar 5, 2025Mar 5, 2025
    • Reference architecture that provides a set of guidelines and best practices for implementing a central AI API gateway to empower various line-of-business units …
      Bicep
      122100Updated Nov 22, 2024Nov 22, 2024
    • Script to accompany the AWS blog post on unstructured data ETL with Unstructured Ingest library
      Python
      0000Updated Oct 16, 2024Oct 16, 2024
    • Pairing Technical Challenge
      TypeScript
      0000Updated Sep 4, 2024Sep 4, 2024
    • FedRAMP formatted model cards
      0100Updated Aug 29, 2024Aug 29, 2024
    • danswer

      Public
      Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.
      Python
      2.3k1101Updated Aug 23, 2024Aug 23, 2024
    • JS Client Batch Processing
      JavaScript
      0000Updated Jul 31, 2024Jul 31, 2024
    • Main package repository for production Wolfi images
      C
      418000Updated Jul 10, 2024Jul 10, 2024
    • pipeline-sec-filings

      Public archive
      Preprocessing pipeline notebooks and API supporting text extraction from SEC documents
      Jupyter Notebook
      3514857Updated Jan 1, 2024Jan 1, 2024
    • Python
      8804Updated Oct 2, 2023Oct 2, 2023
    • Pipeline for extraction information from Army OERs
      Jupyter Notebook
      5816Updated Oct 1, 2023Oct 1, 2023
    • Pipeline for converting PDFs to raw text with PaddleOCR
      Jupyter Notebook
      72315Updated Aug 21, 2023Aug 21, 2023
    • langchain

      Public
      ⚡ Building applications with LLMs through composability ⚡
      Python
      21k800Updated Aug 18, 2023Aug 18, 2023