Releases: DataLabTechTV/datalab
v0.7.0
v0.7.0 (2025-08-28)
Bug Fixes
-
Lakehouse is now a singleton, to avoid initialization when running the help command (
ca5a7ea
) -
Normalize loggers to use loguru via an intercept handler (
d18f572
) -
Shift should be drift, and count plot should be stacked (
62aefb2
)
Chores
-
Add a default task that lists all just tasks (
897c520
) -
Add missing help message and fix the one for ml monitor plot (
7b17e14
)
Features
- Improve performance of REST API by moving Kafka payload queueing to the background (
2fe859d
)
Refactoring
Detailed Changes: v0.6.0...v0.7.0
v0.6.0
v0.6.0 (2025-08-25)
Bug Fixes
-
Attempt to solve group coordinator errors (
25e9cf1
) -
Capture asyncio cancel exception (
5f5f07f
) -
Consumer task was meant to be awaited from inside the loop (
0bede44
) -
Correct model uri scheme (
a6c589b
) -
Dataframe was being forced through the model loaded using mlflow.pyfunc.load, so now we handle multiple input types (
4d9541e
) -
Handle failed runs and drop unrequired columns from logged inputs (
29f9259
) -
Kafka now runs and initializes properly (
b94dfc4
) -
Mlflow healthcheck, switch to kafka's official image (
cea3edf
) -
Model needs to be initialized every time, otherwise there is a memory leak (
e53c85f
) -
Move mlflow.db to root since db directory didn't exist (
3fed7f1
) -
Ollama will now default to CPU when GPU is not available (
a13fd72
)
This will, most likely, make it unusable, but at least it won't stop the other services from starting and working as expected.
-
Positive label probability selection (
00d738e
) -
Queue logic incompatible with list logic, always flush in the end (
9d59f90
) -
Requests cache was causing memory overload (
b734e08
) -
Schema name, remove unused tasks (
e1944fa
) -
Train/test split now separate from cross-validation (train only) (
11b448e
) -
Transform failed when other datasets were not ingested (
029e8c4
) -
Update to new lakehouse schema (
1240dc9
) -
Update to new ml types and lakehouse schema (
063d28b
)
Chores
-
Add a second topic for updating inference results with user feedback (
6abb221
) -
Add config for new stage catalog with secure storage (
1009792
) -
Add config for pairs of topic and expected consumer group (
6224c6c
) -
Add config for stage catalog with secure storage (
7d3fbb9
) -
Add kafka config section (
175474f
) -
Add name to each asyncio task (
dad0fbe
) -
Create justfile with tasks from previous and upcoming videos (
9289e70
) -
Delete unused test module (
04964ef
) -
Reduce sample fraction (
a9ab9a7
) -
Rename insert/update to result/feedback to match new event topics (
4ae3ef1
) -
Setup mlflow service with sqlite and s3 (
3a0f1ca
) -
deps: Add anyascii and inflection for a more robust sanitization, add just for task running, add xgboost for ML project (
17cbf48
) -
deps: Add faker to create random dates (
0814e54
) -
deps: Add fastapi and uvicorn for the ml server (
5907a38
) -
deps: Add joblib to use Memory for caching (
1676db1
) -
deps: Add kafka official library (
2de36e8
) -
deps: Add mlflow (
5d0b94a
) -
deps: Add pip so its version is properly detected by mlflow during model logging (
63edc5a
) -
deps: Add scikit-learn (
204411b
) -
deps: Add sentence transformers for text embedding (
0a852d8
) -
deps: Downgrade from 3.0.3 to 3.0.2 due to mlflow compatibility (
8e2d1bd
) -
deps: Replace confluent-kafka with aiokafka (
a052aa8
)
Features
-
Add 3-folds (
9c58bfb
) -
Add create_at timestamp that defaults to the current date (
6badaf2
) -
Add custom MLflow user for tracking (
4818df8
) -
Add model logging to mlflow_end_run (
54027f2
) -
Add monitor compute task (
36d2b91
) -
Add monitor dataset to mlops etl pipeline (
c227f62
) -
Add monitor plot task (
27bd6bb
) -
Add reload option to use during development (
5e175ad
) -
Add sample fraction parameter (
813f031
) -
Add train and test tasks, update ETL task with transformation (
54e1adb
) -
Apache Kafka server (
b2acb5d
) -
Basic training pipelines and CLI command (
e27cc5d
) -
Check for curl and add -f to ensure the task fails when status code is >= 400 (
b02c5a1
) -
Default to 3-folds, since it is now supported (
daf3db9
) -
Drop tasks for mlflow model server (images are too bloated) (
c20ec10
) -
Enable artifacts proxy and install boto3 as a dependency (
8e844a6
) -
End-to-end kafka producer/consumer implementation (
c1fbbc6
) -
Endpoint to flush inference log, refactor inference request to handle A/B/n testing (
8ba2a7e
) -
Feature pipelines for TF-IDF and sentence transformers (
70c4182
) -
Feedback is now an array and created_at keeps track of time (
19d0527
) -
Generic ml dataset loader function (
f364327
) -
Health check endpoint, and refactor insert/update to results/feedback for clarity ([
d00f612
](https://github.com...
v0.5.0
v0.5.0 (2025-08-05)
Bug Fixes
-
Any positive ESI is now considered competition, and is separate from intensity (
25844f1
) -
Log file relative path to cwd failed when not directly contained using Path (
e4f5b62
)
Chores
-
Commit notebook generated during video recording (
454d0dd
) -
deps: Add adjustText to optionally fix rendering of overlapping node labels (
36cbc33
) -
deps: Add geopandas to plot maps (
62d5ef1
) -
deps: Add jupyterlab, matplotlib, and networkx for graph data science (
e29c08f
) -
deps: Remove unneeded adjustText and add scipy back as a requirement for networkx layout computation (
76ef5d4
)
Features
-
Add CLI support for computing the CON score (
8c94f6e
) -
Add edge arrows and node colors per label (
ed56184
) -
Add graph analytics module, starting with a CON score (
ff1f926
) -
Add graph transparency and improve labels (
02dc859
) -
Add scale to arrow placement, add optional visualization weight (
9190d2c
) -
Compare communities and components, study economical pressure (
afceea8
) -
Competiton network analysis, including community and weak component analysis (
62e54fd
) -
Create a basic graph theme matching DLT (
3210fa5
) -
Dominating and weaker economy individual analysis (
986a2d6
) -
Edge direction now based on common exports, from highest to lowest total amound (
77325bd
) -
Improve graph plotting and add map plotting (
266dfca
) -
Networkx graph plot helper to use with notebooks (
a36b6c9
) -
Revisted the whole notebook, restructuring and adding depth where needed (
6a3dcb1
) -
Script to easily convert Jupyter Notebooks to markdown (
4b0c792
) -
Set label w/ prop per node type and render label wo/ overlapping (
8c0b6fb
) -
Setup notebook for graph data science (
1d96e63
) -
Support for loading Parquet into DuckLake from Python (
4035f63
) -
Trade alignment analysis (
80d5ef1
) -
Trade alignment analysis (cont) (
da6e848
)
Refactoring
-
Different score reset strategy (
d4d7d9d
) -
No longer setting flags for dominating and weaker (
d8013c4
) -
Remove unused import (
65defb1
) -
Replace os.path ops with Path ops (
84c73a9
) -
Use kuzu extension instead of kz (
d815cef
) -
Use ref instead of hardcoded FQN (
ba6de1a
)
Detailed Changes: v0.4.0...v0.5.0
v0.4.0
v0.4.0 (2025-07-16)
Bug Fixes
-
Add missing schema configs for new econ comp models (
c4daafb
) -
Edges needed to be defined based on node_id, which required these changes (
398ba70
) -
Remove inexistent property (
918f23a
) -
Remove not null tests where they were not required (
43efc61
) -
Remove product parent relationship, as there is no multi-level data here (
2d26651
) -
Remove repeated country pairs in reverse order (
1f2f867
) -
Required aggregation per country and product, disregarding partner (
635dc72
) -
Types and missing null strings (
40a79d7
)
Chores
-
Add cypher script to compute music_taste graph stats (
7a0a48d
) -
Add env var for econ comp graph db (
3e34e80
) -
Configs for analytics mart (
40dee56
) -
Re-enable requests-cache with streaming (
62c7dff
) -
Rename KuzuDBs to match new single-file format (
0e797ae
) -
Simplify music taste graph stats script (
5b964fb
) -
Upgrade explorer script to work with kuzu 0.11.0 (
36f6cf7
) -
deps: Add humanize to print byte sizes in human-readable format (
6238484
) -
deps: Add requests cache dep (
b7c5fd5
) -
deps: Add tqdm dep for tracking download progress (
5e2ba51
) -
deps: Bump up kuzu to 0.11.0 (
74f2f4f
) -
deps: Bump up version inside uv.lock (
7124ff4
)
Documentation
- Fill-in the missing schema models for analytics, and econ_comp nodes and edges (
aa65fcd
)
Features
-
Add model selection CLI option to test cmd (
499bac0
) -
Aggregated view for 2020-2023 trade covering recent years (
c579742
) -
Cli command to expunge/clean cache (
f412b51
) -
Complete dataset template for The Atlas of Economic Complexity (
6e2cb9c
) -
Country and product nodes, product-country export and import edges, and product parent edges (
cca6d5c
) -
Country-country ESI calculation (
0ca0346
) -
Datacite working downloader (
bf09fb1
) -
Ingest country classification data (
09c3ac7
) -
Logic changed to account for the last 3 years in data instead of a fixed range (
8599498
) -
Move cache to shared level and add expunge function and requests cache (
805511f
) -
Rename 2020-2023 to latest 3y and add schema for country-country metrics (
af044f8
) -
Select top 5% ESI country-country relations for edges (
3356e4f
) -
Skip cache for downloads and display progress bar (
039e08a
) -
Split ingestion into multiple modules and add dataset templates (
8e3c6b8
) -
Stage transformations for TAoEC (
6e082e3
) -
Support for cache usage statistic printing (
436391b
) -
Support for loading econ_comp graph (
93396df
)
Performance Improvements
- Increase chunk size and make sure temp files are cleaned even when the script is stopped (
39943df
)
Refactoring
-
Log debug message containing produced context (
5917a15
) -
Rename context to entities when referring to entity nodes (
ff6e0df
)
Testing
- Ensure ESI is within a 0..1 range (
d1ef5ce
)
Detailed Changes: v0.3.0...v0.4.0
v0.3.0
v0.3.0 (2025-07-08)
Bug Fixes
- Add error control to the GraphRAG chain (
4f015ca
)
Chores
- deps: Add colorama to color error messages (
389a8a1
)
Features
- Graph rag CLI options for interactive and direct querying (
8f54d81
)
Refactoring
- Remove unused import (
c5bfb82
)
Detailed Changes: v0.2.0...v0.3.0
v0.2.0
v0.2.0 (2025-07-04)
Bug Fixes
- Correct logic for deleting vector index if exists (
516b677
)
Chores
-
Add missing word in prompt (
2001d8d
) -
Container names will now use the default naming schema (
6d267b8
) -
Ensure predictable table indexing order (
4547bd3
) -
Graph retriever and context assembler class scaffolds (
eae806d
) -
Make sure kuzudb-explorer is using a fixed image version (0.10.0 currently) (
80c8aca
) -
Path combination and scaffolding for hydrating (
1c7db62
) -
Prefix log message is now debug-level (
de7d708
) -
Print version from pyproject.toml via CLI argument (
2fa5b86
) -
Remove unused semantic-release config (
1692e14
)
This option was set in the wrong location, so it did nothing. We don't need it.
-
Replace default nomic-embed-text ollama model with phi4:latest (
ee324f1
) -
Setup ollama service and add env var for default model install (
4af078b
) -
deps: Add ollama dependency (
4d1608d
) -
deps: Add pytest to dev deps and configure default CLI options (
baabcd5
) -
deps: Langchain with ollama support, and a prompt helper library (
4565ec9
) -
deps: Langchain-kuzu (
eed603d
) -
deps: More-itertools (
ecb7f9c
)
Continuous Integration
-
Add missing version to semantic-version command (
c6facd1
) -
Fix call to semantic release using a function (
d577a45
) -
Fix changelog_file config location (
b5bb8d7
) -
Fix pyproject.toml version setting for semantic release (
db96d22
) -
Remove redundant build option, already set on pyproject.toml (
e8f6d6b
)
Documentation
- Add knn method info to clarify the max_distance param (
0fdf01f
)
Features
-
Add file logging by default (and option to disable) (
2f9a36e
) -
Add final answer pipeline and improve interactive mode (
58bff5a
) -
Basic prompt for graph RAG and langchain scaffolding (
50173de
) -
Combined knn step for context assembler (
33b20ab
) -
Context assembly based on ANN, paths to neighbors, and random walks from neighbors (
9323352
) -
Cypher friendly schema format (
87f8171
) -
First working NER implementation based on langchain-kuzu (
a743062
) -
Graphrag is now a LangChain Runnable and components became methods (
cd04d33
) -
Knn query support (
2bca4a0
) -
Knn, shortest paths sampler and random walk computation for context assembler (
22d4f0a
) -
Kuzudb-explorer launcher script now handles different paths (
4dc65a9
) -
Lazy singleton S3 resource and bucket connection (
63388a1
) -
Ollama service with gemma3 and nomic-embed-text (
83b68dd
) -
Path hydration and bulk description (
97ea465
) -
Return paths as interleavings of node_id and rel label (
17b790a
) -
Support for indexing embeddings (
c687f81
) -
graph.ops: Automatically add a custom embeddings column to all node tables (
1900f21
)
Closes #2
- graph.ops: Produce node schema with properties names and types (
291d42f
)
Performance Improvements
- Migrated from KuzuQAChain to a custom strategy still based on langchain-kuzu (
ebce585
)
Refactoring
- Change property match to WHERE cond and lower the temperature (
f0f9198
)
Testing
-
Correct paths_df fixture and add missing exclude_props (
c167b0c
) -
Invoke test for GraphRAG runnable (
f724224
) -
Move graph db check to global fixtures (
d2963e3
) -
Print final chain output (
40f2d14
) -
Setup ops and paths_df to test path_descriptions() (
3f3c160
) -
Tests will only print logs to stderr and always use debug level (
fafb3bf
)
Detailed Changes: v0.1.0...v0.2.0
v0.1.0
v0.1.0 (2025-06-25)
Bug Fixes
-
Add node_id to all nodes (
f927dcd
) -
Batch should be column, not parameters (
73eeb9e
) -
Condition for ignoring files during deletion (
9da0e0f
)
The manifest.json was being deleted by mistake.
- Correct name for placeholder models (
fa07609
)
feat: implement all missing edge models
-
Ducklake integration using dev version for upcoming dbt-duckdb 1.4.1 (
effe0d7
) -
Duplicate alias for source_id and target_id columns (
d6b6790
) -
Ensure tags are checked out (
d833c89
) -
Generate sequential node ID globally for all nodes (
8d019ac
) -
Genre loading queries (
abc6833
)
refactor: reorganize models into stage and marts
feat: support for edge loading (untested)
-
Genre nodes become a single table to ensure uniqueness (
7ec7f03
) -
Incorrect S3 secret variable (
6fa7394
) -
Missing description for playcount (
c505c9c
) -
Missing node ID dataset-based prefix (
bfecf9f
) -
Missing nodes prefix on ref table (
58b04f5
) -
Missing underscore after prefix (
3a0d6d9
) -
No longer defaulting to upstream dependencies (
6d2d68a
) -
Regression introduced by removing key_parts (
668a31c
) -
Removed extra bracket in log message (
782bcc9
) -
Should be alias, not name (
d03e079
) -
Should be list of list, not list of tuple (
c3b7419
) -
Sqlite prefix missing (
78bae87
) -
Switch to single table for genre nodes (
6d5dd1f
) -
Update graph loading process based on new config schema (
eebc677
) -
Update prune to use class prefix (
49ca20f
) -
Using map instead of list per node embedding (
e6f1caf
)
fix: add missing schema alter to add embedding property to all nodes
-
Wrapper to copy from data mart via a temporary file (
3cd0268
) -
Wrong column name in schema (
af2a693
) -
Wrong filename case, should be RO, not ro (
905a303
) -
Wrong model name in schema (
588f3bc
) -
Wrong reference, missing schema prefix (
701fb1d
) -
Wrong variable order in log message (
40cb055
)
Chores
-
Add description and pandas dep (
b7c40d0
) -
Add DUCKLAKE_PATH to .env (
3ab2bee
) -
Add kuzu as a dependency (
f1e2a5c
) -
Add S3 prefix for exports (
88ee16c
) -
Add solid background to diagram (
7499f46
) -
Add torch, torch-sparse, and torch-geometric deps (
5ddd0f8
) -
Better schema name organization for graphs (
948759a
) -
Click and minio deps (
c6e450f
) -
Default to eu-west-1, as MinIO also defaulted to it (
6afa282
) -
Delete example models (
7012b5b
) -
Fix version for python-semantic-release to match deps (
3ff93d9
) -
Github dark mode background color (
ca97d44
) -
Gitignore vscode directory (
a2fcabd
) -
Initial log message for export (
1207e93
) -
Initial log string is now a welcome string (
0edefc6
) -
Make sure we start from 0.1.0, not 1.0.0 (
cb2b7c5
) -
Remove unused dep (
ba09c90
) -
Remove unused deps and update docs referring to them (
7da0d24
) -
Replace with official GHA for python-semantic-release (
7301029
) -
Script to launch temporary docker container with KuzuDB Explorer for a database (
52b617a
) -
Setup dltctl CLI tool (replaces Makefile) (
405d800
) -
Simplify node and edge schemas, using Gremlin-like notation (
887dddc
) -
Solid background in individual rectangles (
8f68204
) -
Switch to a multi-database marts config (
1cf615c
) -
Temporarily removed (
bd26438
)
Schema was outdated and was blocking dbt run.
-
Update config to match multi-database marts (
d47f96f
) -
Won't use the extra command in favor of one entry point (
698a3d1
) -
deps: Move python-semantic-release to dev deps (
9df3ed6
)
Documentation
-
Add graph and shared (
866b37c
) -
Add specification for exports pruning (
e42acbf
) -
Dependency management development instructions (
2f9393f
) -
Duckdb init script description (
94b1959
) -
End-to-end documentation (
190ef1b
) -
Fix section links (
e00a537
) -
Latest.json is now manifest.json (
6cdb057
) -
Remove suffix from info boxes ...