智能文档
A curated list of resources for Document Understanding (DU) topic
Convert PDF to markdown + JSON quickly with high accuracy
File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.
No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
Knowledge Agents and Management in the Cloud
Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
There can be more than Notion and Miro. AFFiNE(pronounced [ə‘fain]) is a next-gen knowledge base that brings planning, sorting and creating all together. Privacy first, open-source, customizable an…
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Data processing with ML, LLM and Vision LLM
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
A realtime serving engine for Data-Intensive Generative AI Applications
#1 Locally hosted web application that allows you to perform various operations on PDF files
🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications
A system for agentic LLM-powered data processing and ETL
Knowledge Table is an open-source package designed to simplify extracting and exploring structured data from unstructured documents.
OCR, layout analysis, reading order, table recognition in 90+ languages
PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/Docker/Zotero
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured …
Open-source platform for extracting structured data from documents using AI.
自然语言处理实验(sougou数据集),TF-IDF,文本分类、聚类、词向量、情感识别、关系抽取等
pingcap/autoflow is a Graph RAG based and conversational knowledge base tool built with TiDB Serverless Vector Storage. Demo: https://tidb.ai
This repository includes the official implementation of OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs.
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
BISHENG is an open LLM devops platform for next generation Enterprise AI applications. Powerful and comprehensive features include: GenAI workflow, RAG, Agent, Unified model management, Evaluation,…
PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation