Open
Description
Most clients are likely to be using the Python or Typescript SDKs and they're official so will be closely aligned with the spec. We should add some e2e tests to ensure that an example application (a test 'host') can connect and list/use tools, esp with some custom headers.
We could also use this for evaluation using LangEval (related to LangWatch), by having some standard questions, running them against multiple models with MCP tools enabled, and asserting that we obtain the correct answers within X tokens or API calls.
TODO:
- Require contributor approval before running actions on PRs
- Create a new cloud instance for tests
- Create a
tests
subdirectory - Create a Python package in there
- Add langeval
- Add tests
- Run in CI
Workflows to test:
- [docker-compose] what are the most recent log lines from Grafana?
- [docker-compose] what dashboard should I look at to see container CPU?
- [cloud]: who is on call for team X?
- [cloud]: what incidents are active?
- (add more)