LLM findings

This is a list of things we have discovered about specific models and setups during this project. It is likely that our use of Langchain, and our system prompt, is a factor in some of these discoveries.

One ongoing challenge is finding models that both reliably call the correct tools, and also return links in the correct format.
o4-mini was our default choice for a long time. Following the switch to LiteLLM it got significantly slower, to the point it was causing problems for users. This seems to have been a gradual decline following the LiteLLM switch - it didn't happen immediately.
We looked at a switch from Langchain to Vercel AI SDK. This would have simplified the streaming code, but we found it didn't work with calling MCP tools with the OpenAI reasoning models. It may be good to revisit this now that we're looking at models from other vendors.
Claude Sonnet 3.7 is our current default as this provide good accuracy combined with speed. Claude Sonnet 4.5, however, does not call tools.
We had Claude 3 Haiku as a 'fast mode' option for a while. This worked well with tool calls, but didn't provide links.
Gemini Flash 2.5 also seems to be a good performer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LLM findings

LLM findings

Uh oh!

Clone this wiki locally