数据工具
Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI
The developer first cloud governance platform
A curated list of awesome ETL frameworks, libraries, and software.
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such …
Efficient data transformation and modeling framework that is backwards compatible with dbt.
Dozer is a real-time data movement tool that leverages CDC from various sources and moves data into various sinks.
Addax is a versatile open-source ETL tool that can seamlessly transfer data between various RDBMS and NoSQL databases, making it an ideal solution for data migration.
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
SQL Tools ( Dialect, Pagination, DDL dump, UrlParser, SqlStatementParser, WallFilter, BatchExecutor for Test) based Java. it is easy to integration into any ORM frameworks
Extensible SQL Lexer and Parser for Rust
[ICLR 2025] The First Multimodal Seach Engine Pipeline and Benchmark for LMMs
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.
🤖 MLE-Agent: Your intelligent companion for seamless AI engineering and research. 🔍 Integrate with arxiv and paper with code to provide better code/research plans 🧰 OpenAI, Anthropic, Ollama, etc s…
The Data Change Processing platform
Big data computing platform based on Spark <至轻云-超轻量级大数据计算平台/数据中台>
新一代实时计算底座,计算性能超越flink/spark 100倍,XL-LightHouse是一套支持超大数据量、支持超高并发的通用型流式大数据统计系统【同时支持单机版】。常见的应用场景包括:PV、UV统计;电商销售额、下单用户数统计;日志量统计;接口调用量、异常量、耗时情况统计;服务器运维监控等功能,系统支持多维度统计,支持各种复杂的条件筛选和逻辑判断,一键部署,一行代码接入,轻松实现业务…
🔎 Open source distributed and RESTful search engine.
pingcap/autoflow is a Graph RAG based and conversational knowledge base tool built with TiDB Serverless Vector Storage. Demo: https://tidb.ai
Ape Data Transfer Suite, written in Rust. Provides ultra-fast data replication between MySQL, PostgreSQL, Redis, MongoDB, Kafka and ClickHouse, ideal for disaster recovery (DR) and migration scenar…
A data visualization and analytics component, especially well-suited for large and/or streaming datasets.
SeekStorm - sub-millisecond full-text search library & multi-tenancy server in Rust
Rust-powered, Dependency-Free DataFrame Library for Node.js
Database diagrams editor that allows you to visualize and design your DB with a single query.
Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.