-
Notifications
You must be signed in to change notification settings - Fork 3
feat(go): add middleware reference to Genkit Go skill #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
040acd0
feat(go): add middleware reference to Genkit Go skill
apascal07 498ef10
Update skills/developing-genkit-go/references/middleware.md
apascal07 c3cd4e2
Update skills/developing-genkit-go/references/middleware.md
apascal07 5ef1c05
unwound wrong fixes
apascal07 8d61723
Merge branch 'ap/go-skill-middleware' of https://github.com/genkit-ai…
apascal07 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,363 @@ | ||
| # Middleware | ||
|
|
||
| Middleware wraps `Generate` calls to add cross-cutting behavior (retries, logging, fallback, gating, sandboxed tools) without touching the flow that uses it. Middleware composes, so a single `Generate` call can stack several behaviors. Built-ins ship in the `plugins/middleware` package; custom middleware is just a Go struct with two methods. | ||
|
|
||
| ## The mental model | ||
|
|
||
| A middleware is a config struct that implements two methods: | ||
|
|
||
| ```go | ||
| type Middleware interface { | ||
| Name() string // stable, registered identifier | ||
| New(ctx context.Context) (*Hooks, error) // produces a per-call hook bundle | ||
| } | ||
| ``` | ||
|
|
||
| The same struct value the user passes to `ai.WithUse` is the value the runtime calls `New` on. There is no separate factory parameter and no embedded base type. Per-call state goes in closures captured by `New`. Plugin-level state goes on unexported fields of the struct. | ||
|
|
||
| `New` is invoked once per `Generate` call. The returned `*Hooks` is reused across every iteration of the tool loop within that call. | ||
|
|
||
| ```go | ||
| type Hooks struct { | ||
| Tools []Tool // injected for this call | ||
| WrapGenerate func(ctx, *GenerateParams, GenerateNext) ... // tool-loop iteration | ||
| WrapModel func(ctx, *ModelParams, ModelNext) ... // model API call | ||
| WrapTool func(ctx, *ToolParams, ToolNext) ... // tool execution | ||
| } | ||
| ``` | ||
|
|
||
| A nil hook is a pass-through. Implement only what the middleware needs. | ||
|
|
||
| ## When each hook fires | ||
|
|
||
| A `Generate` call executes a tool loop: model produces output, any tool calls execute, results feed back into a new model call, repeat until the model stops. The hooks fire at three different layers of this loop: | ||
|
|
||
| | Hook | Fires | Sees | | ||
| | --- | --- | --- | | ||
| | `WrapGenerate` | Once per tool-loop iteration. `N` tool turns means `N+1` invocations. | The accumulated `ModelRequest`, the iteration index, the streaming callback, and `MessageIndex` (the next streamed-message slot). | | ||
| | `WrapModel` | Once per actual model API call, inside the iteration. | The `ModelRequest` about to go to the model and the streaming callback. | | ||
| | `WrapTool` | Once per tool execution. May run **concurrently** for parallel tool calls in the same iteration. | The `ToolRequest` and the resolved `Tool`. | | ||
|
|
||
| `WrapGenerate` is the right place for logic that needs to see the whole conversation (rewrites, system-prompt injection, message accumulation). `WrapModel` is the right place for logic about the model call itself (retry, fallback, caching). `WrapTool` is the right place for logic about a single tool execution (approval, sandboxing, logging). | ||
|
|
||
| ## Composition order | ||
|
|
||
| `ai.WithUse(A, B, C)` expands to `A { B { C { actual } } }` at call time. Each layer's `next` continuation runs the next inner layer: | ||
|
|
||
| ```go | ||
| ai.WithUse( | ||
| &middleware.Retry{MaxRetries: 3}, // outer: retries the whole inner stack | ||
| &middleware.Fallback{Models: ...}, // inner: tries fallback models on failure | ||
| ) | ||
| // effective chain: Retry { Fallback { model } } | ||
| ``` | ||
|
|
||
| Order matters. `Retry` outside `Fallback` retries the entire fallback cascade as a unit. Swapped, you'd retry the primary first and only fall back after exhausting retries. | ||
|
|
||
| ## Per-call state | ||
|
|
||
| State that should be shared across the hooks of a single `Generate` call lives in closures captured by `New`. Each `Generate` call gets a fresh `Hooks` bundle, so nothing leaks between calls. | ||
|
|
||
| ```go | ||
| type Counter struct{} | ||
|
|
||
| func (Counter) Name() string { return "mine/counter" } | ||
|
|
||
| func (Counter) New(ctx context.Context) (*ai.Hooks, error) { | ||
| var modelCalls int | ||
| return &ai.Hooks{ | ||
| WrapModel: func(ctx context.Context, p *ai.ModelParams, next ai.ModelNext) (*ai.ModelResponse, error) { | ||
| modelCalls++ | ||
| return next(ctx, p) | ||
| }, | ||
| WrapGenerate: func(ctx context.Context, p *ai.GenerateParams, next ai.GenerateNext) (*ai.ModelResponse, error) { | ||
| // The same modelCalls variable is visible here because both closures | ||
| // capture it from the enclosing New scope. | ||
| resp, err := next(ctx, p) | ||
| if err == nil { | ||
| log.Printf("iteration %d: %d model calls so far", p.Iteration, modelCalls) | ||
| } | ||
| return resp, err | ||
| }, | ||
| }, nil | ||
| } | ||
| ``` | ||
|
|
||
| `WrapTool` may be invoked concurrently for parallel tool calls in the same iteration. Any state it mutates must be guarded: | ||
|
|
||
| ```go | ||
| func (Counter) New(ctx context.Context) (*ai.Hooks, error) { | ||
| var ( | ||
| mu sync.Mutex | ||
| toolCalls int | ||
| ) | ||
| return &ai.Hooks{ | ||
| WrapTool: func(ctx context.Context, p *ai.ToolParams, next ai.ToolNext) (*ai.MultipartToolResponse, error) { | ||
| mu.Lock() | ||
| toolCalls++ | ||
| mu.Unlock() | ||
| return next(ctx, p) | ||
| }, | ||
| }, nil | ||
| } | ||
| ``` | ||
|
|
||
| `WrapGenerate` and `WrapModel` are not called concurrently within a single `Generate` call. | ||
|
|
||
| ## Plugin-level state | ||
|
|
||
| When a middleware needs resources its config can't carry as JSON (an HTTP client, a database handle, a logger), put them on **unexported** fields of the config struct. The plugin sets them on a prototype, and `ai.NewMiddleware` captures that prototype in a closure that value-copies it across JSON-dispatched invocations: | ||
|
|
||
| ```go | ||
| type Logger struct { | ||
| Prefix string `json:"prefix,omitempty"` | ||
| out io.Writer // unexported; preserved across JSON dispatch via value-copy | ||
| } | ||
|
|
||
| func (Logger) Name() string { return "mine/logger" } | ||
|
|
||
| func (l Logger) New(ctx context.Context) (*ai.Hooks, error) { | ||
| return &ai.Hooks{ | ||
| WrapModel: func(ctx context.Context, p *ai.ModelParams, next ai.ModelNext) (*ai.ModelResponse, error) { | ||
| start := time.Now() | ||
| resp, err := next(ctx, p) | ||
| fmt.Fprintf(l.out, "%s model call took %s\n", l.Prefix, time.Since(start)) | ||
| return resp, err | ||
| }, | ||
| }, nil | ||
| } | ||
|
|
||
| type LoggerPlugin struct{ Out io.Writer } | ||
|
|
||
| func (p *LoggerPlugin) Name() string { return "logger" } | ||
| func (p *LoggerPlugin) Init(ctx context.Context) []api.Action { return nil } | ||
|
|
||
| func (p *LoggerPlugin) Middlewares(ctx context.Context) ([]*ai.MiddlewareDesc, error) { | ||
| return []*ai.MiddlewareDesc{ | ||
| ai.NewMiddleware("logs model call latency", Logger{out: p.Out}), | ||
| }, nil | ||
| } | ||
| ``` | ||
|
|
||
| The Dev UI and other-runtime callers send JSON config; the prototype's value copy preserves `out` (unexported, not in JSON) while `Prefix` is overridden by the unmarshaled config. | ||
|
|
||
| ## Composition with WithUse | ||
|
|
||
| ```go | ||
| response, _ := genkit.Generate(ctx, g, | ||
| ai.WithModelName("googleai/gemini-flash-latest"), | ||
| ai.WithPrompt("Explain quantum computing."), | ||
| ai.WithUse( | ||
| Logger{Prefix: "[trace]"}, | ||
| &middleware.Retry{MaxRetries: 3}, | ||
| ), | ||
| ) | ||
| ``` | ||
|
|
||
| No registration is required for pure-Go use. `WithUse` calls each value's `New` directly on a fast path; the registry is only consulted for JSON-dispatched calls (Dev UI or cross-runtime). Registration is what makes a middleware visible to the Dev UI and addressable by name. | ||
|
|
||
| ## Inline middleware | ||
|
|
||
| For ad-hoc middleware that does not need Dev UI visibility or a named type, use `ai.MiddlewareFunc`: | ||
|
|
||
| ```go | ||
| ai.WithUse(ai.MiddlewareFunc(func(ctx context.Context) (*ai.Hooks, error) { | ||
| return &ai.Hooks{ | ||
| WrapModel: func(ctx context.Context, p *ai.ModelParams, next ai.ModelNext) (*ai.ModelResponse, error) { | ||
| log.Printf("model call: %s", p.Request.Messages[len(p.Request.Messages)-1].Text()) | ||
| return next(ctx, p) | ||
| }, | ||
| }, nil | ||
| })) | ||
| ``` | ||
|
|
||
| The adapter satisfies `Middleware` with a placeholder `Name()` of `"inline"`. Inline middleware is resolved on the fast path and never touches the registry, so the placeholder name is fine. | ||
|
|
||
| ## Application-owned middleware | ||
|
|
||
| Use `genkit.DefineMiddleware` to register a middleware your application owns directly. Registration surfaces it in the Dev UI and lets cross-runtime callers reference it by name: | ||
|
|
||
| ```go | ||
| genkit.DefineMiddleware(g, "logs model call latency", Logger{out: os.Stderr}) | ||
|
|
||
| // Lookup by name (mostly for inspection / cross-runtime dispatch). | ||
| desc := genkit.LookupMiddleware(g, "mine/logger") | ||
| ``` | ||
|
|
||
| For application code, `DefineMiddleware` is the typical entry point. For plugin authors, `ai.NewMiddleware` (no registration) plus `MiddlewarePlugin.Middlewares()` is the typical entry point. `genkit.Init` registers the returned descriptors automatically. | ||
|
|
||
| ## Built-in middleware | ||
|
|
||
| The `plugins/middleware` package bundles five production-ready implementations. Register the plugin once during `Init` to make them visible to the Dev UI: | ||
|
|
||
| ```go | ||
| import "github.com/genkit-ai/genkit/go/plugins/middleware" | ||
|
|
||
| g := genkit.Init(ctx, genkit.WithPlugins( | ||
| &googlegenai.GoogleAI{}, | ||
| &middleware.Middleware{}, | ||
| )) | ||
| ``` | ||
|
|
||
| ### `Retry` | ||
|
|
||
| Retries failed model API calls with exponential backoff and jitter. Hooks `WrapModel`. | ||
|
|
||
| ```go | ||
| ai.WithUse(&middleware.Retry{ | ||
| MaxRetries: 3, // default 3 | ||
| InitialDelayMs: 1000, // default 1000 | ||
| MaxDelayMs: 60000, // default 60000 | ||
| BackoffFactor: 2, // default 2 | ||
| NoJitter: false, // default false | ||
| // Statuses (default: UNAVAILABLE, DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED, ABORTED, INTERNAL) | ||
| // Statuses: []core.StatusName{core.UNAVAILABLE, core.RESOURCE_EXHAUSTED}, | ||
| }) | ||
| ``` | ||
|
|
||
| Non-`GenkitError` errors (network, parse, etc.) are always retried. `GenkitError` errors are retried only if their status is in `Statuses`. The backoff respects `ctx.Done()`: a canceled context aborts the retry loop with the last error. | ||
|
|
||
| ### `Fallback` | ||
|
|
||
| Tries alternative models when the primary fails with a fallback-eligible status. Hooks `WrapModel`. | ||
|
|
||
| ```go | ||
| ai.WithUse(&middleware.Fallback{ | ||
| Models: []ai.ModelRef{ | ||
| googlegenai.ModelRef("googleai/gemini-flash-latest", nil), | ||
| googlegenai.ModelRef("vertexai/gemini-flash-latest", nil), | ||
| }, | ||
| // default Statuses: UNAVAILABLE, DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED, | ||
| // ABORTED, INTERNAL, NOT_FOUND, UNIMPLEMENTED | ||
| }) | ||
| ``` | ||
|
|
||
| Each fallback model uses its own `ModelRef.Config()` verbatim; the original request's config is **not** inherited. Compose with `Retry` outside to retry the whole cascade, or inside to retry just the primary before falling back. | ||
|
|
||
| ### `ToolApproval` | ||
|
|
||
| Interrupts any tool call not in the allow list, exposing approval as a human-in-the-loop step. Hooks `WrapTool`. | ||
|
|
||
| ```go | ||
| ai.WithUse(&middleware.ToolApproval{ | ||
| AllowedTools: []string{"lookup", "search"}, // anything else triggers an interrupt | ||
| }) | ||
| ``` | ||
|
|
||
| The interrupt rides on the existing tool-interrupt machinery. Approve a call by setting `toolApproved: true` in the resume metadata when restarting: | ||
|
|
||
| ```go | ||
| restart, _ := tool.Restart(interruptPart, &ai.RestartOptions{ | ||
| ResumedMetadata: map[string]any{"toolApproved": true}, | ||
| }) | ||
| genkit.Generate(ctx, g, ai.WithMessages(resp.History()...), ai.WithToolRestarts(restart)) | ||
| ``` | ||
|
|
||
| A bare resume without that flag is **not** treated as approval, so unrelated resume flows can't bypass gating. | ||
|
|
||
| ### `Filesystem` | ||
|
|
||
| Grants the model scoped file access under a single root directory via `list_files`, `read_file`, plus `write_file` and `search_and_replace` when writes are enabled. Hooks `WrapGenerate` and `WrapTool` and contributes `Tools`. | ||
|
|
||
| ```go | ||
| ai.WithUse(&middleware.Filesystem{ | ||
| RootDir: "./workspace", | ||
| AllowWriteAccess: true, | ||
| ToolNamePrefix: "", // set distinct prefixes if attaching multiple Filesystem middlewares | ||
| }) | ||
| ``` | ||
|
|
||
| Path safety is enforced by `os.Root` (Go 1.25+), which rejects any path resolving outside the root, including via `..`, absolute paths, or symlinks. `read_file` returns its content as a queued user message on the next turn (not as the tool's direct output) so binary types like images can be inlined as media parts. | ||
|
|
||
| ### `Skills` | ||
|
|
||
| Exposes a local library of `SKILL.md` files as loadable system instructions. Hooks `WrapGenerate` and contributes a `use_skill` tool. | ||
|
|
||
| ```go | ||
| ai.WithUse(&middleware.Skills{SkillPaths: []string{"skills"}}) // default: ["skills"] | ||
| ``` | ||
|
|
||
| A skill is a directory containing `SKILL.md`, optionally with YAML frontmatter (`name`, `description`). The middleware injects a system prompt listing available skills, and the model calls `use_skill("name")` to pull the skill body into the conversation on demand. Heavier persona instructions stay off the hot path until actually loaded. | ||
|
|
||
| ## Practical patterns | ||
|
|
||
| ### Streaming-aware middleware | ||
|
|
||
| If your `WrapGenerate` or `WrapModel` hook emits its own messages (injected user content, system updates), use the streaming callback and the `MessageIndex` cursor in `GenerateParams`: | ||
|
|
||
| ```go | ||
| WrapGenerate: func(ctx context.Context, p *ai.GenerateParams, next ai.GenerateNext) (*ai.ModelResponse, error) { | ||
| if p.Callback != nil { | ||
| _ = p.Callback(ctx, &ai.ModelResponseChunk{ | ||
| Role: ai.RoleUser, | ||
| Index: p.MessageIndex, | ||
| Content: []*ai.Part{ai.NewTextPart("[injected context]")}, | ||
| }) | ||
| p.MessageIndex++ // advance so downstream middleware and the model see the shifted index | ||
| } | ||
| p.Request.Messages = append(p.Request.Messages, ai.NewUserMessage(ai.NewTextPart("[injected context]"))) | ||
| return next(ctx, p) | ||
| }, | ||
| ``` | ||
|
|
||
| `Filesystem` does this to deliver `read_file` content to the model while preserving streamed-chunk ordering. | ||
|
|
||
| ### Adding tools from middleware | ||
|
|
||
| `Hooks.Tools` registers extra tools for the duration of the call without the user wiring them through `ai.WithTools`. Useful when the middleware's hooks and tools work as a pair (e.g., `Filesystem`'s read/write tools, `Skills`'s `use_skill` tool): | ||
|
|
||
| ```go | ||
| return &ai.Hooks{ | ||
| Tools: []ai.Tool{myTool}, | ||
| WrapTool: myInterceptor, // intercepts both myTool and any user-supplied tools | ||
| }, nil | ||
| ``` | ||
|
|
||
| Duplicate tool names across user-supplied tools and middleware-contributed tools error out at call setup; the call won't run. | ||
|
|
||
| ### Interrupting from `WrapTool` | ||
|
|
||
| `ai.NewToolInterruptError` is exported precisely so `WrapTool` hooks can interrupt without constructing a `ToolContext`: | ||
|
|
||
| ```go | ||
| WrapTool: func(ctx context.Context, p *ai.ToolParams, next ai.ToolNext) (*ai.MultipartToolResponse, error) { | ||
| if shouldGate(p.Tool.Name()) { | ||
| return nil, ai.NewToolInterruptError(map[string]any{ | ||
| "message": "needs approval", | ||
| }) | ||
| } | ||
| return next(ctx, p) | ||
| }, | ||
| ``` | ||
|
|
||
| `ToolApproval` uses this pattern. | ||
|
|
||
| ### Modifying the request safely | ||
|
|
||
| `p.Request` is the live request for the iteration. Mutating it in place affects later layers. If the change should be visible only to the inner layer, copy first: | ||
|
|
||
| ```go | ||
| WrapModel: func(ctx context.Context, p *ai.ModelParams, next ai.ModelNext) (*ai.ModelResponse, error) { | ||
| req := *p.Request | ||
| req.Messages = append([]*ai.Message(nil), p.Request.Messages...) | ||
| req.Messages = append(req.Messages, extraSystemMessage) | ||
| p.Request = &req | ||
| return next(ctx, p) | ||
| }, | ||
| ``` | ||
|
|
||
| `Skills.injectSkillsPrompt` shows the same pattern for `ModelRequest` cloning. | ||
|
|
||
| ### Idempotent re-injection across iterations | ||
|
|
||
| `WrapGenerate` runs once per tool-loop iteration. If you inject content into the request, you'll inject it on every iteration unless you mark and detect what you've already added. `Skills` tags its system prompt part with metadata (`skills-instructions: true`) and refreshes that one part in place rather than appending a new one each turn. | ||
|
|
||
| ## Migration note | ||
|
|
||
| The legacy `ai.ModelMiddleware` / `ai.WithMiddleware` API is preserved and marked deprecated. Prefer `ai.Middleware` / `ai.WithUse`, which adds `WrapGenerate` and `WrapTool` hooks plus `Hooks.Tools` for dynamically injected tools. | ||
|
|
||
| ## See also | ||
|
|
||
| - [`tools.md`](tools.md) for tool definition, interrupt/restart machinery used by `ToolApproval`. | ||
| - Sample sources under `go/samples/basic-middleware/`: `retry-fallback`, `filesystem`, `skills`. | ||
| - The `plugins/middleware` package source for reference implementations. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.