Skip to content

Commit 0437835

Browse files
authored
Merge pull request #4 from KxSystems/add-similarity-search-tool
add similarity-search tool
2 parents 168cac5 + a922690 commit 0437835

File tree

12 files changed

+739
-392
lines changed

12 files changed

+739
-392
lines changed

README.md

Lines changed: 25 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ The server leverages a combination of curated resources, intelligent prompts, an
1616
- [MCP Server Installation](#mcp-server-installation)
1717
- [Transport Options](#transport-options)
1818
- [Command line Parameters](#command-line-parameters)
19+
- [Configure Embeddings](#configure-embeddings)
1920
- [Usage with Claude Desktop](#usage-with-claude-desktop)
2021
- [Prompts/Resources/Tools](#promptsresourcestools)
2122
- [Development](#development)
@@ -54,7 +55,7 @@ Before installing and running the KDB-X MCP Server, ensure you have met the foll
5455
- [Cloned this repo](#clone-the-repository)
5556
- A `KDB-X/KDB+` Service listening on a host and port that will be accessible to the MCP Server
5657
- See examples - [KDB-X Setup](#kdb-x-setup) / [KDB+ Setup](#kdb-setup)
57-
- KDB-X can be installed by signing up to the [kdb-x public preview](https://kdb-x.kx.com/sign-in) - see [KDB-X documentation](https://docs.kx.com/public-preview/kdb-x/home.htm) for supporting information
58+
- KDB-X can be installed by signing up to the [KDB-X public preview](https://kdb-x.kx.com/sign-in) - see [KDB-X documentation](https://docs.kx.com/public-preview/kdb-x/home.htm) for supporting information
5859
- Windows users can run the KDB-X MCP Server on Windows and connect to a local KDB-X database via WSL or remote KDB-X database running on Linux
5960
- Windows users can run a local KDB-X database by installing KDB-X on [WSL](https://learn.microsoft.com/en-us/windows/wsl/install), and use the default [streamable-http transport](#transport-options) when running the [KDB-X MCP Server](#run-the-server) - both share the same localhost network.
6061
- For details on KDB-X usage restrictions see [documentation](https://docs.kx.com/product/licensing/usage-restrictions.htm#kdb-x-personal-trial-download)
@@ -246,6 +247,22 @@ options:
246247
2. **Environment variables** (middle precedence) - `KDBX_MCP_TRANSPORT=streamable-http`, `KDBX_HOST=myhost`
247248
3. **Default values** (lowest precedence)
248249

250+
## Configure Embeddings
251+
252+
Before starting the KDB-X MCP Server, you must configure embedding models for your tables if you wish to use Similarity Search.
253+
The repository includes two ready-to-use embedding providers: OpenAI and SentenceTransformers.
254+
You can customize these implementations as needed, or add your own provider by following the steps outlined below.
255+
256+
1. Update Dependencies - Add your required embedding providers to `pyproject.toml` dependencies section.
257+
258+
2. Set Environment Variables - Configure required API keys for your chosen embedding providers if necessary (for example, set the environment variable `OPENAI_API_KEY` to use OpenAI's API)
259+
260+
3. Add New Provider - The file `src/mcp_server/utils/embeddings.py` defines the base class `EmbeddingProvider` for all embedding providers.
261+
To add a new provider, create a class in the same file that extends this base class and implements all required abstract methods.
262+
You can use the existing implementations of OpenAI and SentenceTransformers in the same file as templates — simply copy and modify them to suit your needs. To register your provider, use the `@register_provider` decorator above your class definition. It is not compulsory for the registered provider name to follow the provider's Python package name.
263+
264+
4. Configure Table Embeddings - Update the embeddings configuration file at `src/mcp_server/utils/embeddings.csv` with your actual database and table names, embedding providers and models. The name you provide at `embeddings.csv` should match the registered provider name specified in file `embeddings.py`.
265+
249266
## Usage with Claude Desktop
250267

251268
### Configure Claude Desktop
@@ -302,7 +319,8 @@ If you have pre-existing MCP servers see [example config with multiple mcp-serve
302319
"--directory",
303320
"/path/to/this/repo/",
304321
"run",
305-
"mcp_server"
322+
"mcp-server",
323+
"--stdio"
306324
]
307325
}
308326
}
@@ -311,7 +329,7 @@ If you have pre-existing MCP servers see [example config with multiple mcp-serve
311329

312330
**Note**
313331

314-
- Update your `<user>` to point to the absolute path of the uv executable - only required if `uv` is on your path
332+
- Update your `<user>` to point to the absolute path of the uv executable - only required if `uv` is not on your path
315333
- Update the `--directory` path to the absolute path of this repo
316334
- Currently `KDB-X` does not support Windows, meaning `stdio` is not an option for Windows users
317335
- Claude Desktop is responsible for starting/stopping the MCP server when using `stdio`
@@ -372,7 +390,7 @@ To enable Developer mode:
372390

373391
| Name | Purpose | Params | Return |
374392
|-------------------|------------------------------------------|-------------------------------------------------|------------------------------------------------|
375-
| kdbx_table_analysis | Generate a detailed analysis prompt for a specific table. | table_name: Name of the table to analyze<br> analysis_type (optional): Type of analysis options statistical, data_quality<br> sample_size (optional): Suggested sample size for data exploration | The generated table analysis prompt |
393+
| kdbx_table_analysis | Generate a detailed analysis prompt for a specific table. | `table_name`: Name of the table to analyze<br> `analysis_type` (optional): Type of analysis options statistical, data_quality<br> `sample_size` (optional): Suggested sample size for data exploration | The generated table analysis prompt |
376394

377395
### Resources
378396

@@ -385,7 +403,8 @@ To enable Developer mode:
385403

386404
| Name | Purpose | Params | Return |
387405
|-------------------|------------------------------------------|-------------------------------------------------|------------------------------------------------|
388-
| kdbx_run_sql_query | Execute SQL SELECT against KDB-X database | query (str): SQL SELECT query string to execute | JSON object with query results (max 1000 rows) |
406+
| kdbx_run_sql_query | Execute SQL SELECT against KDB-X database | `query`: SQL SELECT query string to execute | JSON object with query results (max 1000 rows) |
407+
| kdbx_similarity_search | Perform vector similarity search on a KDB-X table | `table_name`: Name of the table to search <br> `query`: Text query to convert to vector and search <br> `n` (optional): Number of results to return | Dictionary containing search result |
389408

390409
## Development
391410

@@ -441,7 +460,7 @@ If the MCP Server port is being used by another process you will need to specify
441460

442461
### Invalid transport
443462

444-
You can only specify `streamable-http`, `stdio.`
463+
You can only specify `streamable-http` or `stdio.`
445464

446465
### Missing tools/resources
447466

pyproject.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,10 @@ dependencies = [
88
"mcp[cli]>=1.2.0",
99
"pykx>=4.0.0.b2",
1010
"pydantic-settings",
11+
# Optional: Manage below packages for embedding support
12+
# "sentence_transformers",
13+
# "openai",
14+
# "tiktoken",
1115
]
1216

1317
[project.optional-dependencies]

screenshots/claude_resources.png

28.3 KB
Loading

screenshots/claude_tools.png

33.6 KB
Loading

src/mcp_server/resources/kdbx_database_tables.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
from typing import List
44
from mcp.types import TextContent
55
from mcp_server.utils.kdbx import get_kdb_connection
6+
from mcp_server.utils.embeddings_helpers import get_embedding_config
67
from mcp_server.server import config
78

89
logger = logging.getLogger(__name__)
@@ -45,7 +46,7 @@ async def kdbx_describe_table_impl(table: str) -> List[TextContent]:
4546

4647
output_lines.extend([
4748
f"\n Data Preview ({preview_size} records):",
48-
_format_data(preview_data)
49+
_format_data(preview_data, table)
4950
])
5051
else:
5152
output_lines.append("\n Table is empty - no data to preview")
@@ -94,7 +95,12 @@ async def kdbx_describe_tables_impl() -> List[TextContent]:
9495
)]
9596

9697

97-
def _format_data(data) -> str:
98+
def _format_data(data, table=None) -> str:
99+
if table:
100+
embeddings_column, _, _, _, _ = get_embedding_config(table)
101+
if embeddings_column and hasattr(data, "pop"):
102+
data.pop(embeddings_column, None)
103+
# data = data.drop(embeddings_column, axis=1, errors="ignore")
98104
if hasattr(data, 'to_string'):
99105
return data.to_string()
100106
return str(data)

src/mcp_server/server.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,9 @@ def _check_kdb_connection(self):
111111
if not conn('@[{2< count .s};(::);{0b}]').py():
112112
self.logger.error("KDB-X SQL interface check: FAILED - KDB-X service does not have the SQL interface loaded. Load it by running .s.init[] in your KDB-X Session")
113113
sys.exit(1)
114+
if not conn('@[{2< count .ai};(::);{0b}]').py():
115+
self.logger.error("KDB-X AI Libs check: FAILED - KDB-X service does not have the AI Libs loaded. Load it by running \l ai-libs/init.q in your KDB-X Session")
116+
sys.exit(1)
114117
else:
115118
self.logger.info("KDB-X SQL interface check: SUCCESS - SQL interface is loaded")
116119
conn.close()

src/mcp_server/settings.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,11 @@ class KDBConfig(BaseSettings):
1010
password: Optional[SecretStr] = ""
1111
timeout: Optional[int] = 1
1212
retry: Optional[int] = 2
13+
14+
# Similarity Search tool
15+
embedding_csv_path: str = "src/mcp_server/utils/embeddings.csv"
16+
metric: str = "CS"
17+
k: int = 5
1318

1419
class Config:
1520
env_prefix = 'KDBX_'
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
import logging
2+
from typing import Optional, Dict, Any, List
3+
from mcp_server.settings import KDBConfig
4+
from mcp_server.utils.embeddings import get_provider
5+
from mcp_server.utils.embeddings_helpers import get_embedding_config
6+
import numpy as np
7+
import pandas as pd
8+
import logging
9+
import pykx as kx
10+
import json
11+
from typing import Dict, Any
12+
from mcp_server.utils.kdbx import get_kdb_connection
13+
14+
config = KDBConfig()
15+
logger = logging.getLogger(__name__)
16+
17+
18+
# Normalizes the result from the search operation
19+
def normalize_result(df: Dict)-> Any:
20+
# serialize numpy ndarray type
21+
df = df.map(lambda x: x.tolist() if isinstance(x, np.ndarray) else x)
22+
# convert timespan type (KDB time type)
23+
for col_name, col_type in df.dtypes.items():
24+
timespan_type = str(col_type).lower().startswith("timedelta")
25+
duration_type = str(col_type).lower().startswith("duration")
26+
if timespan_type or duration_type:
27+
df[col_name] = (pd.Timestamp("1970-01-01") + df[col_name]).dt.time
28+
# convert to dict
29+
return df.to_dict('records') if hasattr(df, 'to_dict') else df
30+
31+
32+
async def kdbx_similarity_search_impl( table_name: str,
33+
query: str,
34+
n: Optional[int] = None) -> Dict[str, Any]:
35+
36+
try:
37+
if n is None:
38+
n = config.k
39+
40+
embeddings_column, embeddings_provider, embeddings_model, _, _ = get_embedding_config(table_name)
41+
42+
dense_provider = get_provider(embeddings_provider)
43+
query_vector = await dense_provider.dense_embed(query, embeddings_model)
44+
45+
# Build search parameters
46+
search_params = {
47+
"table" : table_name,
48+
"vcol" : embeddings_column,
49+
"qvec" : query_vector,
50+
"metric": config.metric,
51+
"n" : int(n),
52+
}
53+
54+
conn = get_kdb_connection()
55+
56+
result = conn('''{[args]
57+
c:args`vcol;
58+
$[(args`table) in .Q.pt;
59+
[
60+
res:raze{[d;args;tbl;c]
61+
vecs:?[tbl;enlist (=;.Q.pf;d);0b;(enlist c)!enlist c]c;
62+
if[not count vecs; :()];
63+
res:.ai.flat.search[vecs;args`qvec;args`n;args`metric];
64+
res:res@\:iasc res[1];
65+
`dist xcols update dist:res[0] from ?[tbl;((=;.Q.pf;d);(in;`i;res[1]));0b;()]
66+
}[;args;get args`table;c] each .Q.pv;
67+
![(args`n)#`dist xdesc res;();0b;enlist c]
68+
];
69+
[
70+
res:.ai.flat.search[?[args`table;();();c];args`qvec;args`n;args`metric];
71+
![(args`table) res[1];();0b;enlist c]
72+
]
73+
]}''', search_params)
74+
75+
result = normalize_result(result.pd())
76+
77+
return {
78+
"status": "success",
79+
"table": table_name,
80+
"recordsCount": len(result),
81+
"records": result
82+
}
83+
except Exception as e:
84+
logger.error(f"Error performing search on table {table_name}: {e}")
85+
return {
86+
"status": "error",
87+
"message": str(e),
88+
"table": table_name,
89+
}
90+
91+
92+
def register_tools(mcp_server):
93+
@mcp_server.tool()
94+
async def kdbx_similarity_search(table_name: str,
95+
query: str,
96+
n: Optional[int] = None) -> Dict[str, Any]:
97+
"""
98+
Perform vector similarity search on a KDB-X table.
99+
100+
Args:
101+
table_name: Name of the table to search
102+
query: Text query to convert to vector and search
103+
n (Optional[int], optional): Number of results to return
104+
105+
Returns:
106+
Dictionary containing search result.
107+
"""
108+
results = await kdbx_similarity_search_impl(
109+
table_name,
110+
query,
111+
n,
112+
)
113+
return results
114+
115+
return ["kdbx_similarity_search"]
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
table,embedding_column,embedding_provider,embedding_model,sparse_tokenizer_provider,sparse_tokenizer_model
2+
_example_table1,vecs,sentence_transformers,all-MiniLM-L12-v2,sentence_transformers,all-MiniLM-L12-v2

0 commit comments

Comments
 (0)