The Bifrost chat uses an agentic loop: the model can call tools, the server runs them, and the results are fed back so the model can decide the next step or produce a final answer. This guide describes how that loop works and how Heimdall plugins hook into it (PrePrompt, PreExecute, PostExecute).
When you send a message in Bifrost:
- Request – Your message is sent with the current conversation history to the LLM (local GGUF, Ollama, or OpenAI).
- Model output – The model may return:
- Content only – A direct reply (e.g. “Here are the database stats…”). The loop ends.
- Tool calls – One or more tool invocations (e.g.
heimdall_watcher_querywith a Cypher string, orstore/linkif MCP tools are enabled).
- Execution – The server runs each tool in process (no external HTTP call for in-memory tools). Results are formatted and appended to the conversation as “tool” messages.
- Repeat – The model is called again with the updated history (user message, assistant tool calls, tool results). It can issue more tool calls or respond with content. This repeats until the model returns content only or a round limit is reached.
So the assistant can chain actions: e.g. run a query, then summarize; or store a fact, then link it to another node.
The model only sees tools that are registered for the request. Two sources are combined:
Plugins register actions (e.g. heimdall_watcher_query, heimdall_watcher_status). These are converted to the same shape as MCP tools (name, description, inputSchema) and passed to the model. The handler executes them by dispatching to the plugin that owns the action.
- Watcher plugin (bundled) – Query, status, metrics, health, discover, etc.
- Your plugins – Custom actions (e.g.
heimdall.myplugin.analyze) appear as tools with the same name and schema.
When MCP tools are enabled, the loop also gets store, recall, discover, link, task, tasks. These run in process against the same database (no separate MCP HTTP call). You can enable all of them or an allowlist (e.g. only store and link).
sequenceDiagram
participant User
participant Bifrost
participant Handler
participant PrePrompt
participant LLM
participant PreExecute
participant Tools
participant PostExecute
User->>Bifrost: Send message
Bifrost->>Handler: handleChatCompletions
Handler->>PrePrompt: CallPrePromptHooks (plugins)
PrePrompt-->>Handler: (optional) modify context / cancel
Handler->>LLM: GenerateWithTools(system + messages, tools)
alt Model returns content only
LLM-->>Handler: content
Handler->>Bifrost: Stream reply
Bifrost->>User: Show reply
else Model returns tool calls
LLM-->>Handler: tool_calls[]
loop For each tool call
Handler->>PreExecute: CallPreExecuteHooks (plugins)
PreExecute-->>Handler: (optional) modify params / cancel
Handler->>Tools: Execute (plugin action or MCP tool)
Tools-->>Handler: result
Handler->>PostExecute: CallPostExecuteHooks (plugins)
end
Handler->>LLM: Append tool results, call again
Note over Handler,LLM: Repeat until content-only or limit
end
Plugins can implement lifecycle hooks that run at specific points in the request. These run for every Bifrost chat request that reaches the handler (including those that use the agentic loop).
- When: Before the system prompt is finalized and sent to the LLM.
- Use: Inject context, add examples, or cancel the request (e.g. policy: “no write actions in this environment”).
- Data:
PromptContext– user message, messages, plugin data, token budget. Plugins can setPluginDatafor use in later hooks.
- When: Before each tool execution (plugin action or MCP tool).
- Use: Validate parameters, check permissions, modify params (e.g. add a default database), or cancel this tool call (e.g. “user cannot run delete”).
- Data:
PreExecuteContext– action/tool name, params, request ID, database router, etc. Plugins can returnModifiedParamsorAbortMessage.
- When: After each tool execution.
- Use: Log results, update internal state, send notifications to the user (e.g. “Query returned 5 rows”).
- Data:
PostExecuteContext– action, duration, success/error, result summary.
Notifications (e.g. NotifyInfo, NotifyError) from hooks are sent as streaming chunks so the user sees them inline in the chat.
| Aspect | Plugin actions | MCP tools (when enabled) |
|---|---|---|
| Examples | heimdall_watcher_query, heimdall_watcher_status |
store, recall, discover, link, task, tasks |
| Defined by | Heimdall plugins (Watcher, your .so) | NornicDB MCP server (built-in) |
| Execution | Dispatched to plugin’s action handler | Run in process by MCP server (store/recall/link etc.) |
| Enabled by | Loading the plugin (Heimdall enabled) | mcp_enable: true (+ optional allowlist) |
Both are presented to the model as tools with name, description, and inputSchema. The handler decides whether to call a plugin or the in-memory MCP runner based on the tool name.
- The HTTP request has a timeout (e.g. 300 seconds) so long-running agentic sessions don’t hang indefinitely. The server’s write timeout is tuned to allow multiple tool calls in one request.
- Enabling MCP tools in the agentic loop – Turn on store/recall/link etc. and use an allowlist.
- Heimdall AI Assistant – Configuration, providers, Bifrost UI.
- Heimdall Plugins – Implementing actions and lifecycle hooks.
- Event triggers and automatic remediation – Using database events to trigger the model and take actions.