Add tools for component schema

We want to add a tool function for interacting with haystack_service.get_component_schemas

Specifically, we want the following tool:

```python
list_component_families(client: AsyncClientProtocol) -> str:
  """Lists all Haystack component families that are available on deepset."""
  # implementation missing
```

This tool will use the HaystackServiceResource to fetch the component schemas.

Then, it needs to parse the schema and extract the component families and descriptions.
Additionally, it needs to format these families with descriptions into a nice string that is consumable by an LLM.

We are expecting the following response structure from `get_component_schemas`:

```
{
  "component_schema": {
    "definitions": {"Components": {<component_name>: <component_definition>, ...}}
  }
}
```

Here is an example of a component definition:

```
{'description': 'Converts XLSX (Excel) files into Documents.

    Supports reading data from specific sheets or all sheets in the Excel file. If all sheets are read, a Document is
    created for each sheet. The content of the Document is the table which can be saved in CSV or Markdown format.

    ### Usage example

    ```python
    from haystack.components.converters.xlsx import XLSXToDocument

    converter = XLSXToDocument()
    results = converter.run(sources=["sample.xlsx"], meta={"date_added": datetime.now().isoformat()})
    documents = results["documents"]
    print(documents[0].content)
    # ",A,B
1,col_a,col_b
2,1.5,test
"
    ```', 'properties': {'init_parameters': {'properties': {'read_excel_kwargs': {'_annotation': 'typing.Optional[typing.Dict[str, typing.Any]]', 'default': None, 'description': 'Additional arguments to pass to `pandas.read_excel`.
            See https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html#pandas-read-excel', 'properties': {'_python_type': {...}}, 'type': ['null', 'object']}, 'sheet_name': {'_annotation': 'typing.Union[str, int, typing.List[typing.Union[str, int]], NoneType]', 'anyOf': [{...}, {...}, {...}, {...}], 'default': None, 'description': 'The name of the sheet to read. If None, all sheets are read.'}, 'store_full_path': {'_annotation': '<class 'bool'>', 'default': False, 'description': 'If True, the full path of the file is stored in the metadata of the document.
            If False, only the file name is stored.', 'type': ['boolean']}, 'table_format': {'_annotation': 'typing.Literal['csv', 'markdown']', 'default': 'csv', 'description': 'The format to convert the Excel file to.', 'enum': ['csv', 'markdown'], 'type': 'string'}, 'table_format_kwargs': {'_annotation': 'typing.Optional[typing.Dict[str, typing.Any]]', 'default': None, 'description': 'Additional keyword arguments to pass to the table format function.
            - If `table_format` is "csv", these arguments are passed to `pandas.DataFrame.to_csv`.
              See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html#pandas-dataframe-to-csv
            - If `table_format` is "markdown", these arguments are passed to `pandas.DataFrame.to_markdown`.
              See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_markdown.html#pandas-dataframe-to-markdown', 'properties': {'_python_type': {...}}, 'type': ['null', 'object']}}, 'required': [], 'type': 'object'}, 'type': {'component_frequency': 29, 'const': 'haystack.components.converters.xlsx.XLSXToDocument', 'family': 'converters', 'family_description': 'Convert data into a format your pipeline can query. Use a converter that matches your data type.', 'readme_link': 'https://docs.haystack.deepset.ai/docs/xlsxtodocument', 'type': 'string'}}, 'title': 'XLSXToDocument', 'type': 'object'}
```

From that definition we need to extract the family and family_description.

Many components can belong to the same family.


Steps:
- look at src/deepset_mcp/tools to see how other tools are implemented, it also has good examples for formatting_utils
- src/deepset_mcp/api/haystack_service/resource.py has the HaystackServiceResource that you will need to use
- create the tool as specified above; handle errors (by returning strings)
- also create tests for the tool; refer to test/unit/tools/test_pipeline.py for how to structure the tests; we use the same approach with a FakeClient and FakeHaystackServiceResource to isolate tool testing


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tools for component schema #22

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add tools for component schema #22

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions