Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions skills/developing-genkit-go/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ Load the appropriate reference based on what you need:
| Generation | [references/generation.md](references/generation.md) | `Generate`, `GenerateText`, `GenerateData`, streaming, output formats |
| Prompts | [references/prompts.md](references/prompts.md) | `DefinePrompt`, `DefineDataPrompt`, `.prompt` files, schemas |
| Tools | [references/tools.md](references/tools.md) | `DefineTool`, tool interrupts, `RestartWith`/`RespondWith` |
| Middleware | [references/middleware.md](references/middleware.md) | `ai.Middleware`, `ai.WithUse`, `Hooks` (Generate/Model/Tool), built-ins (`Retry`, `Fallback`, `ToolApproval`, `Filesystem`, `Skills`) |
| Flows & HTTP | [references/flows-and-http.md](references/flows-and-http.md) | `DefineFlow`, `DefineStreamingFlow`, `genkit.Handler`, HTTP serving |
| Model Providers | [references/providers.md](references/providers.md) | Google AI, Vertex AI, Anthropic, OpenAI-compatible, Ollama setup |

Expand Down Expand Up @@ -92,4 +93,5 @@ See [references/getting-started.md](references/getting-started.md) for full CLI
- **Use `jsonschema:"description=..."` struct tags on output types.** The model uses these descriptions to understand what each field should contain. Without them, structured output quality drops significantly.
- **Write good tool descriptions.** The model decides which tools to call based on their description string. Vague descriptions lead to missed or incorrect tool calls.
- **Use `.prompt` files for complex prompts.** They separate prompt content from Go code, support Handlebars templating, and can be iterated on without recompilation. Code-defined prompts are better for simple, single-line cases.
- **Reach for built-in middleware before writing one.** `Retry`, `Fallback`, `ToolApproval`, `Filesystem`, and `Skills` cover the common cross-cutting needs and compose with each other via `ai.WithUse`. See [references/middleware.md](references/middleware.md). When you do write custom middleware, allocate per-call state in closures captured by `New`, and guard anything that `WrapTool` mutates because tools may run concurrently.
- **Look up the latest model IDs.** Model names change frequently. Check provider documentation for current model IDs rather than relying on hardcoded names. See [references/providers.md](references/providers.md).
14 changes: 6 additions & 8 deletions skills/developing-genkit-go/references/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,13 @@ go mod init my-genkit-app
go get github.com/genkit-ai/genkit/go@latest
```

Add provider plugin(s) for the models you want to use:
```bash
go get github.com/genkit-ai/genkit/go/plugins/googlegenai # Google AI / Vertex AI
go get github.com/genkit-ai/genkit/go/plugins/anthropic # Anthropic Claude
go get github.com/genkit-ai/genkit/go/plugins/compat_oai # OpenAI-compatible
go get github.com/genkit-ai/genkit/go/plugins/ollama # Ollama (local)
```
Provider plugins ship in the same module under `plugins/`, so they don't need to be fetched separately. Just import the ones you want and run `go mod tidy` afterwards. The available plugins include:

After writing your code, run `go mod tidy` to resolve all dependencies.
- `plugins/googlegenai` for Google AI and Vertex AI
- `plugins/anthropic` for Anthropic Claude
- `plugins/compat_oai` for OpenAI-compatible APIs (OpenAI, Groq, xAI, etc.)
- `plugins/ollama` for local Ollama models
- `plugins/middleware` for the built-in middleware bundle (`Retry`, `Fallback`, `ToolApproval`, `Filesystem`, `Skills`)

## Initialization

Expand Down
363 changes: 363 additions & 0 deletions skills/developing-genkit-go/references/middleware.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,363 @@
# Middleware

Middleware wraps `Generate` calls to add cross-cutting behavior (retries, logging, fallback, gating, sandboxed tools) without touching the flow that uses it. Middleware composes, so a single `Generate` call can stack several behaviors. Built-ins ship in the `plugins/middleware` package; custom middleware is just a Go struct with two methods.

## The mental model

A middleware is a config struct that implements two methods:

```go
type Middleware interface {
Name() string // stable, registered identifier
New(ctx context.Context) (*Hooks, error) // produces a per-call hook bundle
}
```

The same struct value the user passes to `ai.WithUse` is the value the runtime calls `New` on. There is no separate factory parameter and no embedded base type. Per-call state goes in closures captured by `New`. Plugin-level state goes on unexported fields of the struct.

`New` is invoked once per `Generate` call. The returned `*Hooks` is reused across every iteration of the tool loop within that call.

```go
type Hooks struct {
Tools []Tool // injected for this call
WrapGenerate func(ctx, *GenerateParams, GenerateNext) ... // tool-loop iteration
WrapModel func(ctx, *ModelParams, ModelNext) ... // model API call
WrapTool func(ctx, *ToolParams, ToolNext) ... // tool execution
}
```

A nil hook is a pass-through. Implement only what the middleware needs.

## When each hook fires

A `Generate` call executes a tool loop: model produces output, any tool calls execute, results feed back into a new model call, repeat until the model stops. The hooks fire at three different layers of this loop:

| Hook | Fires | Sees |
| --- | --- | --- |
| `WrapGenerate` | Once per tool-loop iteration. `N` tool turns means `N+1` invocations. | The accumulated `ModelRequest`, the iteration index, the streaming callback, and `MessageIndex` (the next streamed-message slot). |
| `WrapModel` | Once per actual model API call, inside the iteration. | The `ModelRequest` about to go to the model and the streaming callback. |
| `WrapTool` | Once per tool execution. May run **concurrently** for parallel tool calls in the same iteration. | The `ToolRequest` and the resolved `Tool`. |

`WrapGenerate` is the right place for logic that needs to see the whole conversation (rewrites, system-prompt injection, message accumulation). `WrapModel` is the right place for logic about the model call itself (retry, fallback, caching). `WrapTool` is the right place for logic about a single tool execution (approval, sandboxing, logging).

## Composition order

`ai.WithUse(A, B, C)` expands to `A { B { C { actual } } }` at call time. Each layer's `next` continuation runs the next inner layer:

```go
ai.WithUse(
&middleware.Retry{MaxRetries: 3}, // outer: retries the whole inner stack
&middleware.Fallback{Models: ...}, // inner: tries fallback models on failure
)
// effective chain: Retry { Fallback { model } }
```

Order matters. `Retry` outside `Fallback` retries the entire fallback cascade as a unit. Swapped, you'd retry the primary first and only fall back after exhausting retries.

## Per-call state

State that should be shared across the hooks of a single `Generate` call lives in closures captured by `New`. Each `Generate` call gets a fresh `Hooks` bundle, so nothing leaks between calls.

```go
type Counter struct{}

func (Counter) Name() string { return "mine/counter" }

func (Counter) New(ctx context.Context) (*ai.Hooks, error) {
var modelCalls int
return &ai.Hooks{
WrapModel: func(ctx context.Context, p *ai.ModelParams, next ai.ModelNext) (*ai.ModelResponse, error) {
modelCalls++
return next(ctx, p)
},
WrapGenerate: func(ctx context.Context, p *ai.GenerateParams, next ai.GenerateNext) (*ai.ModelResponse, error) {
// The same modelCalls variable is visible here because both closures
// capture it from the enclosing New scope.
resp, err := next(ctx, p)
if err == nil {
log.Printf("iteration %d: %d model calls so far", p.Iteration, modelCalls)
}
return resp, err
},
}, nil
}
```

`WrapTool` may be invoked concurrently for parallel tool calls in the same iteration. Any state it mutates must be guarded:

```go
func (Counter) New(ctx context.Context) (*ai.Hooks, error) {
var (
mu sync.Mutex
toolCalls int
)
return &ai.Hooks{
WrapTool: func(ctx context.Context, p *ai.ToolParams, next ai.ToolNext) (*ai.MultipartToolResponse, error) {
mu.Lock()
toolCalls++
mu.Unlock()
return next(ctx, p)
},
}, nil
}
```

`WrapGenerate` and `WrapModel` are not called concurrently within a single `Generate` call.

## Plugin-level state

When a middleware needs resources its config can't carry as JSON (an HTTP client, a database handle, a logger), put them on **unexported** fields of the config struct. The plugin sets them on a prototype, and `ai.NewMiddleware` captures that prototype in a closure that value-copies it across JSON-dispatched invocations:

```go
type Logger struct {
Prefix string `json:"prefix,omitempty"`
out io.Writer // unexported; preserved across JSON dispatch via value-copy
}

func (Logger) Name() string { return "mine/logger" }

func (l Logger) New(ctx context.Context) (*ai.Hooks, error) {
return &ai.Hooks{
WrapModel: func(ctx context.Context, p *ai.ModelParams, next ai.ModelNext) (*ai.ModelResponse, error) {
start := time.Now()
resp, err := next(ctx, p)
fmt.Fprintf(l.out, "%s model call took %s\n", l.Prefix, time.Since(start))
return resp, err
},
}, nil
}

type LoggerPlugin struct{ Out io.Writer }

func (p *LoggerPlugin) Name() string { return "logger" }
func (p *LoggerPlugin) Init(ctx context.Context) []api.Action { return nil }

func (p *LoggerPlugin) Middlewares(ctx context.Context) ([]*ai.MiddlewareDesc, error) {
return []*ai.MiddlewareDesc{
ai.NewMiddleware("logs model call latency", Logger{out: p.Out}),
}, nil
}
```

The Dev UI and other-runtime callers send JSON config; the prototype's value copy preserves `out` (unexported, not in JSON) while `Prefix` is overridden by the unmarshaled config.

## Composition with WithUse

```go
response, _ := genkit.Generate(ctx, g,
ai.WithModelName("googleai/gemini-flash-latest"),
ai.WithPrompt("Explain quantum computing."),
ai.WithUse(
Logger{Prefix: "[trace]"},
&middleware.Retry{MaxRetries: 3},
),
)
```

No registration is required for pure-Go use. `WithUse` calls each value's `New` directly on a fast path; the registry is only consulted for JSON-dispatched calls (Dev UI or cross-runtime). Registration is what makes a middleware visible to the Dev UI and addressable by name.

## Inline middleware

For ad-hoc middleware that does not need Dev UI visibility or a named type, use `ai.MiddlewareFunc`:

```go
ai.WithUse(ai.MiddlewareFunc(func(ctx context.Context) (*ai.Hooks, error) {
return &ai.Hooks{
WrapModel: func(ctx context.Context, p *ai.ModelParams, next ai.ModelNext) (*ai.ModelResponse, error) {
log.Printf("model call: %s", p.Request.Messages[len(p.Request.Messages)-1].Text())
return next(ctx, p)
},
}, nil
}))
```

The adapter satisfies `Middleware` with a placeholder `Name()` of `"inline"`. Inline middleware is resolved on the fast path and never touches the registry, so the placeholder name is fine.

## Application-owned middleware

Use `genkit.DefineMiddleware` to register a middleware your application owns directly. Registration surfaces it in the Dev UI and lets cross-runtime callers reference it by name:

```go
genkit.DefineMiddleware(g, "logs model call latency", Logger{out: os.Stderr})

// Lookup by name (mostly for inspection / cross-runtime dispatch).
desc := genkit.LookupMiddleware(g, "mine/logger")
```

For application code, `DefineMiddleware` is the typical entry point. For plugin authors, `ai.NewMiddleware` (no registration) plus `MiddlewarePlugin.Middlewares()` is the typical entry point. `genkit.Init` registers the returned descriptors automatically.

## Built-in middleware

The `plugins/middleware` package bundles five production-ready implementations. Register the plugin once during `Init` to make them visible to the Dev UI:

```go
import "github.com/genkit-ai/genkit/go/plugins/middleware"

g := genkit.Init(ctx, genkit.WithPlugins(
&googlegenai.GoogleAI{},
&middleware.Middleware{},
))
```

### `Retry`

Retries failed model API calls with exponential backoff and jitter. Hooks `WrapModel`.

```go
ai.WithUse(&middleware.Retry{
MaxRetries: 3, // default 3
InitialDelayMs: 1000, // default 1000
MaxDelayMs: 60000, // default 60000
BackoffFactor: 2, // default 2
NoJitter: false, // default false
// Statuses (default: UNAVAILABLE, DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED, ABORTED, INTERNAL)
// Statuses: []core.StatusName{core.UNAVAILABLE, core.RESOURCE_EXHAUSTED},
})
```

Non-`GenkitError` errors (network, parse, etc.) are always retried. `GenkitError` errors are retried only if their status is in `Statuses`. The backoff respects `ctx.Done()`: a canceled context aborts the retry loop with the last error.

### `Fallback`

Tries alternative models when the primary fails with a fallback-eligible status. Hooks `WrapModel`.

```go
ai.WithUse(&middleware.Fallback{
Models: []ai.ModelRef{
googlegenai.ModelRef("googleai/gemini-flash-latest", nil),
googlegenai.ModelRef("vertexai/gemini-flash-latest", nil),
},
// default Statuses: UNAVAILABLE, DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED,
// ABORTED, INTERNAL, NOT_FOUND, UNIMPLEMENTED
})
```

Each fallback model uses its own `ModelRef.Config()` verbatim; the original request's config is **not** inherited. Compose with `Retry` outside to retry the whole cascade, or inside to retry just the primary before falling back.

### `ToolApproval`

Interrupts any tool call not in the allow list, exposing approval as a human-in-the-loop step. Hooks `WrapTool`.

```go
ai.WithUse(&middleware.ToolApproval{
AllowedTools: []string{"lookup", "search"}, // anything else triggers an interrupt
})
```

The interrupt rides on the existing tool-interrupt machinery. Approve a call by setting `toolApproved: true` in the resume metadata when restarting:

```go
restart, _ := tool.Restart(interruptPart, &ai.RestartOptions{
ResumedMetadata: map[string]any{"toolApproved": true},
})
genkit.Generate(ctx, g, ai.WithMessages(resp.History()...), ai.WithToolRestarts(restart))
```

A bare resume without that flag is **not** treated as approval, so unrelated resume flows can't bypass gating.

### `Filesystem`

Grants the model scoped file access under a single root directory via `list_files`, `read_file`, plus `write_file` and `search_and_replace` when writes are enabled. Hooks `WrapGenerate` and `WrapTool` and contributes `Tools`.

```go
ai.WithUse(&middleware.Filesystem{
RootDir: "./workspace",
AllowWriteAccess: true,
ToolNamePrefix: "", // set distinct prefixes if attaching multiple Filesystem middlewares
})
```

Path safety is enforced by `os.Root` (Go 1.25+), which rejects any path resolving outside the root, including via `..`, absolute paths, or symlinks. `read_file` returns its content as a queued user message on the next turn (not as the tool's direct output) so binary types like images can be inlined as media parts.
Comment thread
apascal07 marked this conversation as resolved.

### `Skills`

Exposes a local library of `SKILL.md` files as loadable system instructions. Hooks `WrapGenerate` and contributes a `use_skill` tool.

```go
ai.WithUse(&middleware.Skills{SkillPaths: []string{"skills"}}) // default: ["skills"]
```

A skill is a directory containing `SKILL.md`, optionally with YAML frontmatter (`name`, `description`). The middleware injects a system prompt listing available skills, and the model calls `use_skill("name")` to pull the skill body into the conversation on demand. Heavier persona instructions stay off the hot path until actually loaded.

## Practical patterns

### Streaming-aware middleware

If your `WrapGenerate` or `WrapModel` hook emits its own messages (injected user content, system updates), use the streaming callback and the `MessageIndex` cursor in `GenerateParams`:

```go
WrapGenerate: func(ctx context.Context, p *ai.GenerateParams, next ai.GenerateNext) (*ai.ModelResponse, error) {
if p.Callback != nil {
_ = p.Callback(ctx, &ai.ModelResponseChunk{
Role: ai.RoleUser,
Index: p.MessageIndex,
Content: []*ai.Part{ai.NewTextPart("[injected context]")},
})
p.MessageIndex++ // advance so downstream middleware and the model see the shifted index
}
p.Request.Messages = append(p.Request.Messages, ai.NewUserMessage(ai.NewTextPart("[injected context]")))
return next(ctx, p)
},
```

`Filesystem` does this to deliver `read_file` content to the model while preserving streamed-chunk ordering.

### Adding tools from middleware

`Hooks.Tools` registers extra tools for the duration of the call without the user wiring them through `ai.WithTools`. Useful when the middleware's hooks and tools work as a pair (e.g., `Filesystem`'s read/write tools, `Skills`'s `use_skill` tool):

```go
return &ai.Hooks{
Tools: []ai.Tool{myTool},
WrapTool: myInterceptor, // intercepts both myTool and any user-supplied tools
}, nil
```

Duplicate tool names across user-supplied tools and middleware-contributed tools error out at call setup; the call won't run.

### Interrupting from `WrapTool`

`ai.NewToolInterruptError` is exported precisely so `WrapTool` hooks can interrupt without constructing a `ToolContext`:

```go
WrapTool: func(ctx context.Context, p *ai.ToolParams, next ai.ToolNext) (*ai.MultipartToolResponse, error) {
if shouldGate(p.Tool.Name()) {
return nil, ai.NewToolInterruptError(map[string]any{
"message": "needs approval",
})
}
return next(ctx, p)
},
```

`ToolApproval` uses this pattern.

### Modifying the request safely

`p.Request` is the live request for the iteration. Mutating it in place affects later layers. If the change should be visible only to the inner layer, copy first:

```go
WrapModel: func(ctx context.Context, p *ai.ModelParams, next ai.ModelNext) (*ai.ModelResponse, error) {
req := *p.Request
req.Messages = append([]*ai.Message(nil), p.Request.Messages...)
req.Messages = append(req.Messages, extraSystemMessage)
p.Request = &req
return next(ctx, p)
},
```

`Skills.injectSkillsPrompt` shows the same pattern for `ModelRequest` cloning.

### Idempotent re-injection across iterations

`WrapGenerate` runs once per tool-loop iteration. If you inject content into the request, you'll inject it on every iteration unless you mark and detect what you've already added. `Skills` tags its system prompt part with metadata (`skills-instructions: true`) and refreshes that one part in place rather than appending a new one each turn.

## Migration note

The legacy `ai.ModelMiddleware` / `ai.WithMiddleware` API is preserved and marked deprecated. Prefer `ai.Middleware` / `ai.WithUse`, which adds `WrapGenerate` and `WrapTool` hooks plus `Hooks.Tools` for dynamically injected tools.

## See also

- [`tools.md`](tools.md) for tool definition, interrupt/restart machinery used by `ToolApproval`.
- Sample sources under `go/samples/basic-middleware/`: `retry-fallback`, `filesystem`, `skills`.
- The `plugins/middleware` package source for reference implementations.