Skip to content

Add tools for component schema #22

@mathislucka

Description

@mathislucka

We want to add a tool function for interacting with haystack_service.get_component_schemas

Specifically, we want the following tool:

list_component_families(client: AsyncClientProtocol) -> str:
  """Lists all Haystack component families that are available on deepset."""
  # implementation missing

This tool will use the HaystackServiceResource to fetch the component schemas.

Then, it needs to parse the schema and extract the component families and descriptions.
Additionally, it needs to format these families with descriptions into a nice string that is consumable by an LLM.

We are expecting the following response structure from get_component_schemas:

{
  "component_schema": {
    "definitions": {"Components": {<component_name>: <component_definition>, ...}}
  }
}

Here is an example of a component definition:

{'description': 'Converts XLSX (Excel) files into Documents.

    Supports reading data from specific sheets or all sheets in the Excel file. If all sheets are read, a Document is
    created for each sheet. The content of the Document is the table which can be saved in CSV or Markdown format.

    ### Usage example

    ```python
    from haystack.components.converters.xlsx import XLSXToDocument

    converter = XLSXToDocument()
    results = converter.run(sources=["sample.xlsx"], meta={"date_added": datetime.now().isoformat()})
    documents = results["documents"]
    print(documents[0].content)
    # ",A,B
1,col_a,col_b
2,1.5,test
"
    ```', 'properties': {'init_parameters': {'properties': {'read_excel_kwargs': {'_annotation': 'typing.Optional[typing.Dict[str, typing.Any]]', 'default': None, 'description': 'Additional arguments to pass to `pandas.read_excel`.
            See https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html#pandas-read-excel', 'properties': {'_python_type': {...}}, 'type': ['null', 'object']}, 'sheet_name': {'_annotation': 'typing.Union[str, int, typing.List[typing.Union[str, int]], NoneType]', 'anyOf': [{...}, {...}, {...}, {...}], 'default': None, 'description': 'The name of the sheet to read. If None, all sheets are read.'}, 'store_full_path': {'_annotation': '<class 'bool'>', 'default': False, 'description': 'If True, the full path of the file is stored in the metadata of the document.
            If False, only the file name is stored.', 'type': ['boolean']}, 'table_format': {'_annotation': 'typing.Literal['csv', 'markdown']', 'default': 'csv', 'description': 'The format to convert the Excel file to.', 'enum': ['csv', 'markdown'], 'type': 'string'}, 'table_format_kwargs': {'_annotation': 'typing.Optional[typing.Dict[str, typing.Any]]', 'default': None, 'description': 'Additional keyword arguments to pass to the table format function.
            - If `table_format` is "csv", these arguments are passed to `pandas.DataFrame.to_csv`.
              See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html#pandas-dataframe-to-csv
            - If `table_format` is "markdown", these arguments are passed to `pandas.DataFrame.to_markdown`.
              See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_markdown.html#pandas-dataframe-to-markdown', 'properties': {'_python_type': {...}}, 'type': ['null', 'object']}}, 'required': [], 'type': 'object'}, 'type': {'component_frequency': 29, 'const': 'haystack.components.converters.xlsx.XLSXToDocument', 'family': 'converters', 'family_description': 'Convert data into a format your pipeline can query. Use a converter that matches your data type.', 'readme_link': 'https://docs.haystack.deepset.ai/docs/xlsxtodocument', 'type': 'string'}}, 'title': 'XLSXToDocument', 'type': 'object'}

From that definition we need to extract the family and family_description.

Many components can belong to the same family.

Steps:

  • look at src/deepset_mcp/tools to see how other tools are implemented, it also has good examples for formatting_utils
  • src/deepset_mcp/api/haystack_service/resource.py has the HaystackServiceResource that you will need to use
  • create the tool as specified above; handle errors (by returning strings)
  • also create tests for the tool; refer to test/unit/tools/test_pipeline.py for how to structure the tests; we use the same approach with a FakeClient and FakeHaystackServiceResource to isolate tool testing

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions