Skip to content

Releases: DataLabTechTV/datalab

v0.7.0

28 Aug 15:40

Choose a tag to compare

v0.7.0 (2025-08-28)

Bug Fixes

  • Lakehouse is now a singleton, to avoid initialization when running the help command (ca5a7ea)

  • Normalize loggers to use loguru via an intercept handler (d18f572)

  • Shift should be drift, and count plot should be stacked (62aefb2)

Chores

  • Add a default task that lists all just tasks (897c520)

  • Add missing help message and fix the one for ml monitor plot (7b17e14)

Features

  • Improve performance of REST API by moving Kafka payload queueing to the background (2fe859d)

Refactoring

  • Drop unused matplotlib imports (48b9e31)

  • Remove unused imports (f1f8a1f)


Detailed Changes: v0.6.0...v0.7.0

v0.6.0

25 Aug 15:30

Choose a tag to compare

v0.6.0 (2025-08-25)

Bug Fixes

  • Attempt to solve group coordinator errors (25e9cf1)

  • Capture asyncio cancel exception (5f5f07f)

  • Consumer task was meant to be awaited from inside the loop (0bede44)

  • Correct model uri scheme (a6c589b)

  • Dataframe was being forced through the model loaded using mlflow.pyfunc.load, so now we handle multiple input types (4d9541e)

  • Handle failed runs and drop unrequired columns from logged inputs (29f9259)

  • Kafka now runs and initializes properly (b94dfc4)

  • Mlflow healthcheck, switch to kafka's official image (cea3edf)

  • Model needs to be initialized every time, otherwise there is a memory leak (e53c85f)

  • Move mlflow.db to root since db directory didn't exist (3fed7f1)

  • Ollama will now default to CPU when GPU is not available (a13fd72)

This will, most likely, make it unusable, but at least it won't stop the other services from starting and working as expected.

  • Positive label probability selection (00d738e)

  • Queue logic incompatible with list logic, always flush in the end (9d59f90)

  • Requests cache was causing memory overload (b734e08)

  • Schema name, remove unused tasks (e1944fa)

  • Train/test split now separate from cross-validation (train only) (11b448e)

  • Transform failed when other datasets were not ingested (029e8c4)

  • Update to new lakehouse schema (1240dc9)

  • Update to new ml types and lakehouse schema (063d28b)

Chores

  • Add a second topic for updating inference results with user feedback (6abb221)

  • Add config for new stage catalog with secure storage (1009792)

  • Add config for pairs of topic and expected consumer group (6224c6c)

  • Add config for stage catalog with secure storage (7d3fbb9)

  • Add kafka config section (175474f)

  • Add name to each asyncio task (dad0fbe)

  • Create justfile with tasks from previous and upcoming videos (9289e70)

  • Delete unused test module (04964ef)

  • Reduce sample fraction (a9ab9a7)

  • Rename insert/update to result/feedback to match new event topics (4ae3ef1)

  • Setup mlflow service with sqlite and s3 (3a0f1ca)

  • deps: Add anyascii and inflection for a more robust sanitization, add just for task running, add xgboost for ML project (17cbf48)

  • deps: Add faker to create random dates (0814e54)

  • deps: Add fastapi and uvicorn for the ml server (5907a38)

  • deps: Add joblib to use Memory for caching (1676db1)

  • deps: Add kafka official library (2de36e8)

  • deps: Add mlflow (5d0b94a)

  • deps: Add pip so its version is properly detected by mlflow during model logging (63edc5a)

  • deps: Add scikit-learn (204411b)

  • deps: Add sentence transformers for text embedding (0a852d8)

  • deps: Downgrade from 3.0.3 to 3.0.2 due to mlflow compatibility (8e2d1bd)

  • deps: Replace confluent-kafka with aiokafka (a052aa8)

Features

  • Add 3-folds (9c58bfb)

  • Add create_at timestamp that defaults to the current date (6badaf2)

  • Add custom MLflow user for tracking (4818df8)

  • Add model logging to mlflow_end_run (54027f2)

  • Add monitor compute task (36d2b91)

  • Add monitor dataset to mlops etl pipeline (c227f62)

  • Add monitor plot task (27bd6bb)

  • Add reload option to use during development (5e175ad)

  • Add sample fraction parameter (813f031)

  • Add train and test tasks, update ETL task with transformation (54e1adb)

  • Apache Kafka server (b2acb5d)

  • Basic training pipelines and CLI command (e27cc5d)

  • Check for curl and add -f to ensure the task fails when status code is >= 400 (b02c5a1)

  • Default to 3-folds, since it is now supported (daf3db9)

  • Drop tasks for mlflow model server (images are too bloated) (c20ec10)

  • Enable artifacts proxy and install boto3 as a dependency (8e844a6)

  • End-to-end kafka producer/consumer implementation (c1fbbc6)

  • Endpoint to flush inference log, refactor inference request to handle A/B/n testing (8ba2a7e)

  • Feature pipelines for TF-IDF and sentence transformers (70c4182)

  • Feedback is now an array and created_at keeps track of time (19d0527)

  • Generic ml dataset loader function (f364327)

  • Health check endpoint, and refactor insert/update to results/feedback for clarity ([d00f612](https://github.com...

Read more

v0.5.0

05 Aug 16:07

Choose a tag to compare

v0.5.0 (2025-08-05)

Bug Fixes

  • Any positive ESI is now considered competition, and is separate from intensity (25844f1)

  • Log file relative path to cwd failed when not directly contained using Path (e4f5b62)

Chores

  • Commit notebook generated during video recording (454d0dd)

  • deps: Add adjustText to optionally fix rendering of overlapping node labels (36cbc33)

  • deps: Add geopandas to plot maps (62d5ef1)

  • deps: Add jupyterlab, matplotlib, and networkx for graph data science (e29c08f)

  • deps: Remove unneeded adjustText and add scipy back as a requirement for networkx layout computation (76ef5d4)

Features

  • Add CLI support for computing the CON score (8c94f6e)

  • Add edge arrows and node colors per label (ed56184)

  • Add graph analytics module, starting with a CON score (ff1f926)

  • Add graph transparency and improve labels (02dc859)

  • Add scale to arrow placement, add optional visualization weight (9190d2c)

  • Compare communities and components, study economical pressure (afceea8)

  • Competiton network analysis, including community and weak component analysis (62e54fd)

  • Create a basic graph theme matching DLT (3210fa5)

  • Dominating and weaker economy individual analysis (986a2d6)

  • Edge direction now based on common exports, from highest to lowest total amound (77325bd)

  • Improve graph plotting and add map plotting (266dfca)

  • Networkx graph plot helper to use with notebooks (a36b6c9)

  • Revisted the whole notebook, restructuring and adding depth where needed (6a3dcb1)

  • Script to easily convert Jupyter Notebooks to markdown (4b0c792)

  • Set label w/ prop per node type and render label wo/ overlapping (8c0b6fb)

  • Setup notebook for graph data science (1d96e63)

  • Support for loading Parquet into DuckLake from Python (4035f63)

  • Trade alignment analysis (80d5ef1)

  • Trade alignment analysis (cont) (da6e848)

Refactoring

  • Different score reset strategy (d4d7d9d)

  • No longer setting flags for dominating and weaker (d8013c4)

  • Remove unused import (65defb1)

  • Replace os.path ops with Path ops (84c73a9)

  • Use kuzu extension instead of kz (d815cef)

  • Use ref instead of hardcoded FQN (ba6de1a)


Detailed Changes: v0.4.0...v0.5.0

v0.4.0

16 Jul 19:03

Choose a tag to compare

v0.4.0 (2025-07-16)

Bug Fixes

  • Add missing schema configs for new econ comp models (c4daafb)

  • Edges needed to be defined based on node_id, which required these changes (398ba70)

  • Remove inexistent property (918f23a)

  • Remove not null tests where they were not required (43efc61)

  • Remove product parent relationship, as there is no multi-level data here (2d26651)

  • Remove repeated country pairs in reverse order (1f2f867)

  • Required aggregation per country and product, disregarding partner (635dc72)

  • Types and missing null strings (40a79d7)

Chores

  • Add cypher script to compute music_taste graph stats (7a0a48d)

  • Add env var for econ comp graph db (3e34e80)

  • Configs for analytics mart (40dee56)

  • Re-enable requests-cache with streaming (62c7dff)

  • Rename KuzuDBs to match new single-file format (0e797ae)

  • Simplify music taste graph stats script (5b964fb)

  • Upgrade explorer script to work with kuzu 0.11.0 (36f6cf7)

  • deps: Add humanize to print byte sizes in human-readable format (6238484)

  • deps: Add requests cache dep (b7c5fd5)

  • deps: Add tqdm dep for tracking download progress (5e2ba51)

  • deps: Bump up kuzu to 0.11.0 (74f2f4f)

  • deps: Bump up version inside uv.lock (7124ff4)

Documentation

  • Fill-in the missing schema models for analytics, and econ_comp nodes and edges (aa65fcd)

Features

  • Add model selection CLI option to test cmd (499bac0)

  • Aggregated view for 2020-2023 trade covering recent years (c579742)

  • Cli command to expunge/clean cache (f412b51)

  • Complete dataset template for The Atlas of Economic Complexity (6e2cb9c)

  • Country and product nodes, product-country export and import edges, and product parent edges (cca6d5c)

  • Country-country ESI calculation (0ca0346)

  • Datacite working downloader (bf09fb1)

  • Ingest country classification data (09c3ac7)

  • Logic changed to account for the last 3 years in data instead of a fixed range (8599498)

  • Move cache to shared level and add expunge function and requests cache (805511f)

  • Rename 2020-2023 to latest 3y and add schema for country-country metrics (af044f8)

  • Select top 5% ESI country-country relations for edges (3356e4f)

  • Skip cache for downloads and display progress bar (039e08a)

  • Split ingestion into multiple modules and add dataset templates (8e3c6b8)

  • Stage transformations for TAoEC (6e082e3)

  • Support for cache usage statistic printing (436391b)

  • Support for loading econ_comp graph (93396df)

Performance Improvements

  • Increase chunk size and make sure temp files are cleaned even when the script is stopped (39943df)

Refactoring

  • Log debug message containing produced context (5917a15)

  • Rename context to entities when referring to entity nodes (ff6e0df)

Testing

  • Ensure ESI is within a 0..1 range (d1ef5ce)

Detailed Changes: v0.3.0...v0.4.0

v0.3.0

08 Jul 17:05

Choose a tag to compare

v0.3.0 (2025-07-08)

Bug Fixes

  • Add error control to the GraphRAG chain (4f015ca)

Chores

  • deps: Add colorama to color error messages (389a8a1)

Features

  • Graph rag CLI options for interactive and direct querying (8f54d81)

Refactoring


Detailed Changes: v0.2.0...v0.3.0

v0.2.0

04 Jul 14:27

Choose a tag to compare

v0.2.0 (2025-07-04)

Bug Fixes

  • Correct logic for deleting vector index if exists (516b677)

Chores

  • Add missing word in prompt (2001d8d)

  • Container names will now use the default naming schema (6d267b8)

  • Ensure predictable table indexing order (4547bd3)

  • Graph retriever and context assembler class scaffolds (eae806d)

  • Make sure kuzudb-explorer is using a fixed image version (0.10.0 currently) (80c8aca)

  • Path combination and scaffolding for hydrating (1c7db62)

  • Prefix log message is now debug-level (de7d708)

  • Print version from pyproject.toml via CLI argument (2fa5b86)

  • Remove unused semantic-release config (1692e14)

This option was set in the wrong location, so it did nothing. We don't need it.

  • Replace default nomic-embed-text ollama model with phi4:latest (ee324f1)

  • Setup ollama service and add env var for default model install (4af078b)

  • deps: Add ollama dependency (4d1608d)

  • deps: Add pytest to dev deps and configure default CLI options (baabcd5)

  • deps: Langchain with ollama support, and a prompt helper library (4565ec9)

  • deps: Langchain-kuzu (eed603d)

  • deps: More-itertools (ecb7f9c)

Continuous Integration

  • Add missing version to semantic-version command (c6facd1)

  • Fix call to semantic release using a function (d577a45)

  • Fix changelog_file config location (b5bb8d7)

  • Fix pyproject.toml version setting for semantic release (db96d22)

  • Remove redundant build option, already set on pyproject.toml (e8f6d6b)

Documentation

  • Add knn method info to clarify the max_distance param (0fdf01f)

Features

  • Add file logging by default (and option to disable) (2f9a36e)

  • Add final answer pipeline and improve interactive mode (58bff5a)

  • Basic prompt for graph RAG and langchain scaffolding (50173de)

  • Combined knn step for context assembler (33b20ab)

  • Context assembly based on ANN, paths to neighbors, and random walks from neighbors (9323352)

  • Cypher friendly schema format (87f8171)

  • First working NER implementation based on langchain-kuzu (a743062)

  • Graphrag is now a LangChain Runnable and components became methods (cd04d33)

  • Knn query support (2bca4a0)

  • Knn, shortest paths sampler and random walk computation for context assembler (22d4f0a)

  • Kuzudb-explorer launcher script now handles different paths (4dc65a9)

  • Lazy singleton S3 resource and bucket connection (63388a1)

  • Ollama service with gemma3 and nomic-embed-text (83b68dd)

  • Path hydration and bulk description (97ea465)

  • Return paths as interleavings of node_id and rel label (17b790a)

  • Support for indexing embeddings (c687f81)

  • graph.ops: Automatically add a custom embeddings column to all node tables (1900f21)

Closes #2

  • graph.ops: Produce node schema with properties names and types (291d42f)

Performance Improvements

  • Migrated from KuzuQAChain to a custom strategy still based on langchain-kuzu (ebce585)

Refactoring

  • Change property match to WHERE cond and lower the temperature (f0f9198)

Testing

  • Correct paths_df fixture and add missing exclude_props (c167b0c)

  • Invoke test for GraphRAG runnable (f724224)

  • Move graph db check to global fixtures (d2963e3)

  • Print final chain output (40f2d14)

  • Setup ops and paths_df to test path_descriptions() (3f3c160)

  • Tests will only print logs to stderr and always use debug level (fafb3bf)


Detailed Changes: v0.1.0...v0.2.0

v0.1.0

25 Jun 09:35

Choose a tag to compare

v0.1.0 (2025-06-25)

Bug Fixes

  • Add node_id to all nodes (f927dcd)

  • Batch should be column, not parameters (73eeb9e)

  • Condition for ignoring files during deletion (9da0e0f)

The manifest.json was being deleted by mistake.

  • Correct name for placeholder models (fa07609)

feat: implement all missing edge models

  • Ducklake integration using dev version for upcoming dbt-duckdb 1.4.1 (effe0d7)

  • Duplicate alias for source_id and target_id columns (d6b6790)

  • Ensure tags are checked out (d833c89)

  • Generate sequential node ID globally for all nodes (8d019ac)

  • Genre loading queries (abc6833)

refactor: reorganize models into stage and marts

feat: support for edge loading (untested)

  • Genre nodes become a single table to ensure uniqueness (7ec7f03)

  • Incorrect S3 secret variable (6fa7394)

  • Missing description for playcount (c505c9c)

  • Missing node ID dataset-based prefix (bfecf9f)

  • Missing nodes prefix on ref table (58b04f5)

  • Missing underscore after prefix (3a0d6d9)

  • No longer defaulting to upstream dependencies (6d2d68a)

  • Regression introduced by removing key_parts (668a31c)

  • Removed extra bracket in log message (782bcc9)

  • Should be alias, not name (d03e079)

  • Should be list of list, not list of tuple (c3b7419)

  • Sqlite prefix missing (78bae87)

  • Switch to single table for genre nodes (6d5dd1f)

  • Update graph loading process based on new config schema (eebc677)

  • Update prune to use class prefix (49ca20f)

  • Using map instead of list per node embedding (e6f1caf)

fix: add missing schema alter to add embedding property to all nodes

  • Wrapper to copy from data mart via a temporary file (3cd0268)

  • Wrong column name in schema (af2a693)

  • Wrong filename case, should be RO, not ro (905a303)

  • Wrong model name in schema (588f3bc)

  • Wrong reference, missing schema prefix (701fb1d)

  • Wrong variable order in log message (40cb055)

Chores

  • Add description and pandas dep (b7c40d0)

  • Add DUCKLAKE_PATH to .env (3ab2bee)

  • Add kuzu as a dependency (f1e2a5c)

  • Add S3 prefix for exports (88ee16c)

  • Add solid background to diagram (7499f46)

  • Add torch, torch-sparse, and torch-geometric deps (5ddd0f8)

  • Better schema name organization for graphs (948759a)

  • Click and minio deps (c6e450f)

  • Default to eu-west-1, as MinIO also defaulted to it (6afa282)

  • Delete example models (7012b5b)

  • Fix version for python-semantic-release to match deps (3ff93d9)

  • Github dark mode background color (ca97d44)

  • Gitignore vscode directory (a2fcabd)

  • Initial log message for export (1207e93)

  • Initial log string is now a welcome string (0edefc6)

  • Make sure we start from 0.1.0, not 1.0.0 (cb2b7c5)

  • Remove unused dep (ba09c90)

  • Remove unused deps and update docs referring to them (7da0d24)

  • Replace with official GHA for python-semantic-release (7301029)

  • Script to launch temporary docker container with KuzuDB Explorer for a database (52b617a)

  • Setup dltctl CLI tool (replaces Makefile) (405d800)

  • Simplify node and edge schemas, using Gremlin-like notation (887dddc)

  • Solid background in individual rectangles (8f68204)

  • Switch to a multi-database marts config (1cf615c)

  • Temporarily removed (bd26438)

Schema was outdated and was blocking dbt run.

  • Update config to match multi-database marts (d47f96f)

  • Won't use the extra command in favor of one entry point (698a3d1)

  • deps: Move python-semantic-release to dev deps (9df3ed6)

Documentation

  • Add graph and shared (866b37c)

  • Add specification for exports pruning (e42acbf)

  • Dependency management development instructions (2f9393f)

  • Duckdb init script description (94b1959)

  • End-to-end documentation (190ef1b)

  • Fix section links (e00a537)

  • Latest.json is now manifest.json (6cdb057)

  • Remove suffix from info boxes ...

Read more