HURIDOCS

All

37 repositories

uwazi
Public
Uwazi is a web-based, open-source solution for building and sharing document collections
open-source pdf data-science database ai documents non-profit
TypeScript
•
MIT License
•87•292•435•20•Updated Dec 18, 2025Dec 18, 2025
pdf-document-layout-analysis
Public
A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The service allows for the segmentation and classification of different parts of PDF pages, identifying the elements such as texts, titles, pictures, tables and so on.
Python
•
Apache License 2.0
•116•1k•4•8•Updated Dec 15, 2025Dec 15, 2025
python_uwazi_API
Public
Python API to interact with Uwazi
Python
•0•2•2•0•Updated Dec 12, 2025Dec 12, 2025
ML-Benchmarks
Public
Repository to store all the ML benchmarks
0•0•0•0•Updated Dec 12, 2025Dec 12, 2025
NER-in-docker
Public
NER-in-docker
Python
•0•4•0•7•Updated Nov 24, 2025Nov 24, 2025
preserve
Public
Preserve is a tool for capturing and saving online digital content. Integrated with Uwazi, Preserve captures content from websites, social media and communication platforms, and archives them with accompanying key metadata to ensure evidentiary value by establishing and demonstrating authenticity and chain of custody.
TypeScript
•
MIT License
•1•6•12•11•Updated Nov 18, 2025Nov 18, 2025
ml-cloud-connector
Public
ml-cloud-connector
Python
•
Apache License 2.0
•0•0•0•0•Updated Oct 21, 2025Oct 21, 2025
NER-in-uwazi
Public
NER-in-uwazi
Python
•
MIT License
•0•0•0•0•Updated Oct 20, 2025Oct 20, 2025
pdf-features
Public
pdf-features
Python
•0•1•0•0•Updated Oct 20, 2025Oct 20, 2025
pdf_metadata_extraction
Public
pdf_information_extraction
Python
•1•5•0•8•Updated Oct 10, 2025Oct 10, 2025
trainable-entity-extractor
Public
Trainable Entity Extractor
Python
•
Apache License 2.0
•0•4•0•7•Updated Oct 10, 2025Oct 10, 2025
queue-processor
Public
queue-processor
Python
•
Apache License 2.0
•0•0•0•0•Updated Oct 2, 2025Oct 2, 2025
dummy_extractor_services
Public
Python
•0•0•0•0•Updated Aug 29, 2025Aug 29, 2025
pdf-document-layout-analysis-async
Public
pdf-document-layout-analysis-async
Python
•0•1•0•5•Updated Aug 18, 2025Aug 18, 2025
uwazi-documentation
Public
HTML
•
MIT License
•3•2•6•0•Updated Jun 24, 2025Jun 24, 2025
docker-translation-service
Public
docker-translation-service
Python
•
Apache License 2.0
•0•0•0•6•Updated May 2, 2025May 2, 2025
pdf-labeled-data
Public
TypeScript
•
Apache License 2.0
•1•3•0•0•Updated Mar 18, 2025Mar 18, 2025
rison
Public
JavaScript
•
Apache License 2.0
•5•0•0•1•Updated Mar 11, 2025Mar 11, 2025
pdf-text-extraction
Public
This project aims to extract text from PDF files using the outputs generated by the pdf-document-layout-analysis service. By leveraging the segmentation and classification capabilities of the underlying analysis tool, this project automates the process of text extraction from PDF files.
Makefile
•
Apache License 2.0
•4•36•2•0•Updated Feb 3, 2025Feb 3, 2025
pdf-table-of-contents-extractor
Public
This project aims to extract Table of Contents (TOC) information from PDF files using the outputs generated by the pdf-document-layout-analysis service. By leveraging the segmentation and classification capabilities of the underlying analysis tool, this project automates the process of identifying and structuring the document's TOC.
Makefile
•
Apache License 2.0
•4•18•2•0•Updated Feb 3, 2025Feb 3, 2025
pdf_ocr_service
Public
An http service to OCR PDFs based on a redis queue.
Python
•
MIT License
•0•1•3•0•Updated Dec 13, 2024Dec 13, 2024
react-text-selection-handler
Public
text selection handling and highlighting
TypeScript
•
Apache License 2.0
•0•1•6•0•Updated Nov 14, 2024Nov 14, 2024
convert-to-pdf-service
Public
An http service to convert documents to PDF based on a redis queue.
Python
•
MIT License
•0•0•3•7•Updated Sep 19, 2024Sep 19, 2024
pdf-tokens-type-labeler
Public
Python
•3•3•1•6•Updated Jul 4, 2024Jul 4, 2024
pdf_paragraphs_extraction
Public
Python
•
MIT License
•6•51•1•4•Updated Jul 4, 2024Jul 4, 2024
pdf-reading-order
Public
Python
•2•15•0•0•Updated Apr 26, 2024Apr 26, 2024
uwazi-design
Public
0•4•0•0•Updated Jul 3, 2023Jul 3, 2023
topic-classification
Public
Python
•
MIT License
•4•5•10•4•Updated May 25, 2023May 25, 2023
twitter_crawler
Public
twitter crawler
Python
•0•1•0•1•Updated Apr 3, 2023Apr 3, 2023
semantic-search
Public
Python
•4•3•1•3•Updated Dec 27, 2022Dec 27, 2022