Skip to content
Change the repository type filter

All

    Repositories list

    • unstructured-api

      Public
      Python
      1788433110Updated Nov 24, 2025Nov 24, 2025
    • unstructured-python-client

      Public
      A Python client for the Unstructured Platform API
      Python
      19108143Updated Nov 24, 2025Nov 24, 2025
    • unstructured

      Public
      Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
      HTML
      1.1k13k18350Updated Nov 21, 2025Nov 21, 2025
    • docs

      Public
      Documentation for all Unstructured products and libraries
      MDX
      257010Updated Nov 21, 2025Nov 21, 2025
    • unstructured-inference

      Public
      Python
      721992313Updated Nov 21, 2025Nov 21, 2025
    • notebooks

      Public
      Jupyter Notebook
      0201Updated Nov 20, 2025Nov 20, 2025
    • unstructured-platform-plugins

      Public
      Python
      3603Updated Nov 19, 2025Nov 19, 2025
    • unstructured-ingest

      Public
      HTML
      511035727Updated Nov 18, 2025Nov 18, 2025
    • unstructured-eval-metrics

      Public
      Python
      0000Updated Nov 15, 2025Nov 15, 2025
    • A JavaScript/Typescript client for the Unstructured Platform API
      TypeScript
      175861Updated Nov 14, 2025Nov 14, 2025
    • rag-over-evolving-enterprise-knowledge

      Public
      Jupyter Notebook
      0000Updated Oct 6, 2025Oct 6, 2025
    • rag-over-hybrid-data-sources

      Public
      Two sources (S3, ElasticSearch) to RAG DB pipeline.
      Jupyter Notebook
      1101Updated Sep 15, 2025Sep 15, 2025
    • UNS-MCP

      Public
      Jupyter Notebook
      203701Updated Sep 9, 2025Sep 9, 2025
    • base-images

      Public
      Store Dockerfiles and Packer configs for images to use as a base to build upon
      Shell
      3511Updated Sep 8, 2025Sep 8, 2025
    • .github

      Public
      2021Updated Aug 20, 2025Aug 20, 2025
    • unstructured-mlk-archive-public

      Public
      HTML
      1700Updated Jul 23, 2025Jul 23, 2025
    • unstructured.PaddleOCR

      Public
      Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
      Python
      9.4k4000Updated Mar 17, 2025Mar 17, 2025
    • unstructured.pytesseract

      Public
      A Python wrapper for Google Tesseract
      Python
      747400Updated Mar 5, 2025Mar 5, 2025
    • azure-ai-hub-gateway-solution-accelerator

      Public
      Reference architecture that provides a set of guidelines and best practices for implementing a central AI API gateway to empower various line-of-business units in an organization to leverage Azure AI services
      Bicep
      106100Updated Nov 22, 2024Nov 22, 2024
    • aws-blog-post-example

      Public
      Script to accompany the AWS blog post on unstructured data ETL with Unstructured Ingest library
      Python
      0000Updated Oct 16, 2024Oct 16, 2024
    • Pairing Technical Challenge
      TypeScript
      0000Updated Sep 4, 2024Sep 4, 2024
    • FedRAMP formatted model cards
      0100Updated Aug 29, 2024Aug 29, 2024
    • danswer

      Public
      Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.
      Python
      2.2k1101Updated Aug 23, 2024Aug 23, 2024
    • JS Client Batch Processing
      JavaScript
      0000Updated Jul 31, 2024Jul 31, 2024
    • Main package repository for production Wolfi images
      C
      407000Updated Jul 10, 2024Jul 10, 2024
    • pipeline-sec-filings

      Public archive
      Preprocessing pipeline notebooks and API supporting text extraction from SEC documents
      Jupyter Notebook
      3614857Updated Jan 1, 2024Jan 1, 2024
    • Python
      8804Updated Oct 2, 2023Oct 2, 2023
    • Pipeline for extraction information from Army OERs
      Jupyter Notebook
      5816Updated Oct 1, 2023Oct 1, 2023
    • Pipeline for converting PDFs to raw text with PaddleOCR
      Jupyter Notebook
      72315Updated Aug 21, 2023Aug 21, 2023
    • langchain

      Public
      ⚡ Building applications with LLMs through composability ⚡
      Python
      20k800Updated Aug 18, 2023Aug 18, 2023