Add tool to search component

**Overview**

We want a tool that can search for a component using its name or the description. By default, we want to return at most 5 components. The output should contain the full component definitions including the inputs and outputs where available.

We approach is in the following way:
- use the service to fetch the component schema
- embed component name and description
- perform cosine similarity search between the search query and the embeddings
- return the 5 highest ranking definitions
- apply the same logic as in get_component_definition to extract definitions from the schema and to fetch component inputs and outputs


Here is some code from a different project for the semantic similarity search and the embeddings. Use a similar approach. The model we use should be injected into the tool via DI (hence the protocol):

import numpy as np
from typing import List, Tuple, Optional, Protocol, Any
from agentsense.models import ErrorContext, OperationError
from agentsense.path_utils import flatten_directory_paths
from agentsense.repository_backend import RepositoryBackend


class ModelProtocol(Protocol):
    def encode(self, sentences: list[str] | str) -> np.ndarray[Any, Any]:
        """
        Encodes a single or multiple sentences.

        :param sentences: Single sentence or list of sentences to encode
        :returns: Numpy array of encoded sentences
        """
        ...


class SimilarPathsErrorHandler:
    """
    Error handler that provides similar path suggestions when a resource is not found.

    This handler uses Model2Vec to find paths in the repository that are similar to
    the requested path, and includes them as suggestions in the error response.
    """

    def __init__(self, model: ModelProtocol, top_k: int = 5):
        """
        Initialize the SimilarPathsErrorHandler.

        :param model: The Model2Vec model to use
        :param top_k: Number of similar paths to suggest
        """
        self.top_k = top_k
        self.model = model

    @staticmethod
    def normalize_path(path: str) -> str:
        """
        Normalize a path for comparison by removing leading/trailing slashes,
        converting to lowercase, etc.

        :param path: The path to normalize
        :returns: Normalized path string
        """
        # Remove leading/trailing slashes and whitespace
        normalized = path.strip().strip("/")
        # Convert to lowercase for case-insensitive comparison
        normalized = normalized.lower()
        return normalized

    def prepare_path_for_embedding(self, path: str) -> str:
        """
        Prepare a path for embedding by replacing slashes with spaces.
        This helps the model understand the path structure better.

        :param path: The path to prepare
        :returns: Prepared path string
        """
        # Normalize first
        normalized = self.normalize_path(path)
        # Replace slashes with spaces
        prepared = normalized.replace("/", " ")
        return prepared

    def find_similar_paths(self, target_path: str, all_paths: List[str]) -> List[Tuple[str, float]]:
        """
        Find paths similar to the target path using Model2Vec embeddings.

        :param target_path: The path to find similar paths for
        :param all_paths: List of all available paths
        :returns: List of tuples containing (path, similarity_score)
        """
        # Normalize the target path
        normalized_target = self.normalize_path(target_path)

        # Filter out exact matches
        filtered_paths = [p for p in all_paths if self.normalize_path(p) != normalized_target]

        if not filtered_paths:
            return []

        # Prepare paths for embedding by replacing slashes with spaces
        prepared_target = self.prepare_path_for_embedding(target_path)
        prepared_paths = [self.prepare_path_for_embedding(p) for p in filtered_paths]

        # Get the model
        model = self.model

        # Generate embeddings for all paths
        target_embedding = model.encode(prepared_target)
        path_embeddings = model.encode(prepared_paths)

        # Calculate dot product similarities
        # Reshape target_embedding to (1, embedding_dim)
        target_embedding_reshaped = target_embedding.reshape(1, -1)

        # Calculate dot product between target and all paths
        # This gives us a similarity score for each path
        similarities = np.dot(path_embeddings, target_embedding_reshaped.T).flatten()

        # Create (path, similarity) pairs
        path_similarities = list(zip(filtered_paths, similarities))

        # Sort by similarity score in descending order
        path_similarities.sort(key=lambda x: x[1], reverse=True)

        # Return top N results
        return path_similarities[: self.top_k]

Look at src/deepset_mcp/tools/haystack_service.py to figure out how the existing tools work.

Refactor the get_component_definition tool so that we can reuse some of the code in our search_component_definition tool.


Also add tests here: test/unit/tools/test_haystack_service.py

Fake the model when running the tests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tool to search component #32

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add tool to search component #32

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions