Overview
We want a tool that can search for a component using its name or the description. By default, we want to return at most 5 components. The output should contain the full component definitions including the inputs and outputs where available.
We approach is in the following way:
- use the service to fetch the component schema
- embed component name and description
- perform cosine similarity search between the search query and the embeddings
- return the 5 highest ranking definitions
- apply the same logic as in get_component_definition to extract definitions from the schema and to fetch component inputs and outputs
Here is some code from a different project for the semantic similarity search and the embeddings. Use a similar approach. The model we use should be injected into the tool via DI (hence the protocol):
import numpy as np
from typing import List, Tuple, Optional, Protocol, Any
from agentsense.models import ErrorContext, OperationError
from agentsense.path_utils import flatten_directory_paths
from agentsense.repository_backend import RepositoryBackend
class ModelProtocol(Protocol):
def encode(self, sentences: list[str] | str) -> np.ndarray[Any, Any]:
"""
Encodes a single or multiple sentences.
:param sentences: Single sentence or list of sentences to encode
:returns: Numpy array of encoded sentences
"""
...
class SimilarPathsErrorHandler:
"""
Error handler that provides similar path suggestions when a resource is not found.
This handler uses Model2Vec to find paths in the repository that are similar to
the requested path, and includes them as suggestions in the error response.
"""
def __init__(self, model: ModelProtocol, top_k: int = 5):
"""
Initialize the SimilarPathsErrorHandler.
:param model: The Model2Vec model to use
:param top_k: Number of similar paths to suggest
"""
self.top_k = top_k
self.model = model
@staticmethod
def normalize_path(path: str) -> str:
"""
Normalize a path for comparison by removing leading/trailing slashes,
converting to lowercase, etc.
:param path: The path to normalize
:returns: Normalized path string
"""
# Remove leading/trailing slashes and whitespace
normalized = path.strip().strip("/")
# Convert to lowercase for case-insensitive comparison
normalized = normalized.lower()
return normalized
def prepare_path_for_embedding(self, path: str) -> str:
"""
Prepare a path for embedding by replacing slashes with spaces.
This helps the model understand the path structure better.
:param path: The path to prepare
:returns: Prepared path string
"""
# Normalize first
normalized = self.normalize_path(path)
# Replace slashes with spaces
prepared = normalized.replace("/", " ")
return prepared
def find_similar_paths(self, target_path: str, all_paths: List[str]) -> List[Tuple[str, float]]:
"""
Find paths similar to the target path using Model2Vec embeddings.
:param target_path: The path to find similar paths for
:param all_paths: List of all available paths
:returns: List of tuples containing (path, similarity_score)
"""
# Normalize the target path
normalized_target = self.normalize_path(target_path)
# Filter out exact matches
filtered_paths = [p for p in all_paths if self.normalize_path(p) != normalized_target]
if not filtered_paths:
return []
# Prepare paths for embedding by replacing slashes with spaces
prepared_target = self.prepare_path_for_embedding(target_path)
prepared_paths = [self.prepare_path_for_embedding(p) for p in filtered_paths]
# Get the model
model = self.model
# Generate embeddings for all paths
target_embedding = model.encode(prepared_target)
path_embeddings = model.encode(prepared_paths)
# Calculate dot product similarities
# Reshape target_embedding to (1, embedding_dim)
target_embedding_reshaped = target_embedding.reshape(1, -1)
# Calculate dot product between target and all paths
# This gives us a similarity score for each path
similarities = np.dot(path_embeddings, target_embedding_reshaped.T).flatten()
# Create (path, similarity) pairs
path_similarities = list(zip(filtered_paths, similarities))
# Sort by similarity score in descending order
path_similarities.sort(key=lambda x: x[1], reverse=True)
# Return top N results
return path_similarities[: self.top_k]
Look at src/deepset_mcp/tools/haystack_service.py to figure out how the existing tools work.
Refactor the get_component_definition tool so that we can reuse some of the code in our search_component_definition tool.
Also add tests here: test/unit/tools/test_haystack_service.py
Fake the model when running the tests.
Overview
We want a tool that can search for a component using its name or the description. By default, we want to return at most 5 components. The output should contain the full component definitions including the inputs and outputs where available.
We approach is in the following way:
Here is some code from a different project for the semantic similarity search and the embeddings. Use a similar approach. The model we use should be injected into the tool via DI (hence the protocol):
import numpy as np
from typing import List, Tuple, Optional, Protocol, Any
from agentsense.models import ErrorContext, OperationError
from agentsense.path_utils import flatten_directory_paths
from agentsense.repository_backend import RepositoryBackend
class ModelProtocol(Protocol):
def encode(self, sentences: list[str] | str) -> np.ndarray[Any, Any]:
"""
Encodes a single or multiple sentences.
class SimilarPathsErrorHandler:
"""
Error handler that provides similar path suggestions when a resource is not found.
Look at src/deepset_mcp/tools/haystack_service.py to figure out how the existing tools work.
Refactor the get_component_definition tool so that we can reuse some of the code in our search_component_definition tool.
Also add tests here: test/unit/tools/test_haystack_service.py
Fake the model when running the tests.