Skip to content

Commit ce96f10

Browse files
tylerhogarthcursoragentankur-archclaude
authored
feat(blog): add Gremlin post on autonomous task-to-PR workflow (#7936)
* feat(blog): add Gremlin post on autonomous task-to-PR workflow Adds Tyler Hogarth's post on Gremlin: Mastra orchestration, OpenCode sandboxing, and turning Sentry, Linear, and Slack work into reviewable PRs. * feat(blog): expand Gremlin post and add hero artwork Reframe product integration around Fix with AI and link the Compute beta launch. Replace the placeholder SVG with cropped 1024x537 hero and meta images for cards and social previews. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(blog): update Gremlin post hero image Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(blog): remove outdated Gremlin meta cover image Drop the previous goblin cover (meta.png) and the metaImagePath override so social/OG and RSS fall back to the updated hero image. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(blog): pin posts to the featured slot Add a `pinned` frontmatter flag. Pinned posts are hoisted ahead of the chronological feed so the latest pinned post takes the featured slot and the top of the list, instead of the most recent post by date. Pin the Prisma Compute launch post. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(blog): prevent series chip overflow on mobile The series shelf chip row overflowed the viewport on narrow screens because min-width:0 did not propagate through the flex chain, so long series titles never truncated. Add min-w-0 to the chip list and max-w-full to each item so titles truncate cleanly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Ankur Datta <64993082+ankur-arch@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent f13d426 commit ce96f10

6 files changed

Lines changed: 191 additions & 3 deletions

File tree

Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
---
2+
title: "Gremlin: turning open tasks into pull requests"
3+
slug: "gremlin-turning-open-tasks-into-pull-requests"
4+
date: "2026-06-09"
5+
authors:
6+
- "Tyler Hogarth"
7+
metaTitle: "Gremlin: turning open tasks into pull requests"
8+
metaDescription: "Gremlin is an autonomous engineering agent for small, well-scoped tasks. It uses Mastra for orchestration and OpenCode in a sandbox to turn Sentry issues, Linear tasks, and Slack instructions into reviewable pull requests."
9+
heroImagePath: "/gremlin-turning-open-tasks-into-pull-requests/imgs/hero.png"
10+
heroImageAlt: "Gremlin Cloud Agent illustration with infrastructure icons, terminal output, and deployment status panels"
11+
tags: ["ai"]
12+
---
13+
14+
Every engineering organisation has a long tail of small problems.
15+
16+
A noisy Sentry issue. A flaky edge case. A minor performance regression. A bug that is understood, annoying, and never quite important enough to interrupt the current roadmap.
17+
18+
Individually, these issues are small. Collectively, they create drag. They interrupt engineers, pollute error dashboards, and accumulate into operational debt. The cost is not just fixing them. It is the context switch: reproduce the failure, find the right code, make a change, validate it, and shepherd the fix through review.
19+
20+
**Gremlin** is my attempt to reduce that friction. It is an autonomous engineering agent scoped to small and medium-sized tasks: identify a well-scoped issue, gather context, make a targeted code change, and open a reviewable pull request.
21+
22+
The goal is to remove the low-leverage interruptions that keep engineers from larger, higher judgement work.
23+
24+
## Why start with small tasks?
25+
26+
"Build an AI engineer" is not a useful product requirement. It is too vague, too unconstrained. [Devin](https://devin.ai/) is pushing in that direction at scale, and the results are impressive. I am not setting out to rebuild an entire product like that. Starting small and focused helps me build something relevant to my team and to Prisma's users.
27+
28+
Real engineering work is messy: codebases have conventions, permissions, runtime assumptions, deployment constraints, and review expectations.
29+
30+
Gremlin starts with a more practical question: what work is valuable, repetitive enough to justify automation, and bounded enough that an agent can safely attempt it?
31+
32+
[Sentry](https://sentry.io/) issues are a good first target. They carry a concrete failure signal: stack traces, affected files, runtime context, frequency, environment, and sometimes user impact. Many are not architectural problems. They are localised defects or rough edges that require judgement but not deep product strategy.
33+
34+
[Sentry Seer](https://sentry.io/lp/seer/) already exists: an AI layer that investigates production errors and opens pull requests. That capability is real, but it lives inside a single surface, the Sentry user interface. Gremlin takes a different angle: orchestration across the systems where engineering work actually happens, not just where the error was reported.
35+
36+
Gremlin can inspect the issue, connect it to the relevant code, generate a fix, run checks where available, and raise a pull request for review. Even when the PR is not perfect, it can shrink the initial investigation from forty-five minutes to five.
37+
38+
The first version of Gremlin is not a general-purpose autonomous developer. It is a focused system for converting well-scoped engineering signals into concrete, reviewable patches.
39+
40+
## The core workflow
41+
42+
Tasks enter Gremlin from more than one place. A Sentry issue can trigger a run when it matches scope criteria. Anyone can also assign work manually through Linear or Slack when they have something well-bounded.
43+
44+
At a high level, Gremlin follows a simple loop:
45+
46+
```mermaid
47+
flowchart TD
48+
S["Sentry issue"] --> B["Gather context"]
49+
L["Linear task"] --> B
50+
SL["Slack thread"] --> B
51+
B --> C["Sandbox environment"]
52+
C --> D["Agent implements fix"]
53+
D --> E["Validate"]
54+
E --> F["Open PR"]
55+
```
56+
57+
The interesting part is not any single step. It is the orchestration between them.
58+
59+
A coding agent by itself is not enough. It needs the right repository state, credentials, issue context, permissions, runtime constraints, and guardrails. Without that, the agent either cannot act or acts in ways that are too brittle to trust.
60+
61+
That is where **[Mastra](https://mastra.ai/docs)** comes in.
62+
63+
## Mastra as the orchestration layer
64+
65+
Gremlin uses [Mastra](https://mastra.ai/docs) as the orchestration layer around the coding agent. The agent does not receive a vague instruction and operate freely. Mastra prepares the task, controls the environment, injects credentials and context, and defines the boundaries in which the agent can operate.
66+
67+
The architecture separates responsibilities:
68+
69+
- **Mastra** handles orchestration, task setup, credential injection, access validation, and workflow control.
70+
- **[OpenCode](https://opencode.ai/)** runs as the sandboxed coding agent inside that environment. It is the open-source agent that performs the implementation work: navigating the repo, editing files, and running checks. See the [OpenCode docs](https://opencode.ai/docs) for how it is configured and extended.
71+
- The **model provider** is exchangeable. OpenCode supports many LLM providers; Gremlin can swap models without changing the orchestration layer or the workflow around the PR.
72+
73+
This lets Gremlin use an autonomous coding loop while preserving a clear operational boundary. The system can attempt fixes without granting unconstrained access to production systems or sensitive infrastructure.
74+
75+
## Why sandboxing alone was not enough
76+
77+
An early instinct was to treat Gremlin primarily as a sandboxing problem. Give the agent a repo, give it the issue, let it work.
78+
79+
In practice, that framing was incomplete. Sandboxing controls *where* the agent runs. It does not solve the workflow problem. The agent still needs to know what task it is solving, what code it can access, which secrets it may use, which checks matter, how to authenticate to internal systems, and how to report its work.
80+
81+
Isolation is necessary, but not sufficient. The harder problem is orchestration: converting operational signals into well-formed engineering tasks, setting up the environment correctly, enforcing access boundaries, and routing the result back into normal engineering workflows.
82+
83+
That is why Mastra became central to the architecture.
84+
85+
## What Gremlin does today
86+
87+
Gremlin is scoped around small to medium-sized engineering tasks with clear, bounded scope.
88+
89+
**Automated Sentry fixes** are the clearest automated entry point. When an issue matches scope criteria, Gremlin collects context, reasons about the failure, makes a targeted code change, and opens a PR.
90+
91+
**Instructions through Linear or Slack** covers the rest of the tasks. The instruction could cover things like add a field to an API response, fix a button alignment issue, correct copy on a settings page. The scope is narrow; the acceptance criteria are implicit or stated in the message.
92+
93+
That second path matters for the work people spot in passing:
94+
95+
- A bug an engineer or agent notices during implementation, logged without pulling either off the task they are already on
96+
- A styling or copy issue anyone flags in Slack
97+
- A regression with a clear repro that is not worth a context switch right now
98+
- Cleanup with an obvious before and after
99+
100+
These are not architectural problems. They are real, localised changes where the cost of picking them up manually is the problem, not the difficulty of the fix itself.
101+
102+
Gremlin should not begin by making broad architectural decisions, rewriting major subsystems, or interpreting ambiguous product requirements. Those tasks require deeper context, stakeholder alignment, and trade-off analysis. They may become partially automatable later, but they are not the right starting point.
103+
104+
The correct initial target is well-scoped work: real, valuable, and bounded enough that an engineer can tell when it is done.
105+
106+
## The pull request as the interface
107+
108+
One of the most important product decisions is that Gremlin's output is a pull request.
109+
110+
Engineers and agents already know how to review PRs. CI systems already know how to validate them. Code owners, branch protections, comments, and review workflows already exist. A separate process for agent-generated work would add friction, not remove it.
111+
112+
A Gremlin PR should explain:
113+
114+
- What issue triggered the change
115+
- What the agent changed
116+
- Why the change is expected to fix the issue
117+
- What validation was performed
118+
- What uncertainty remains
119+
120+
That last point matters. Trustworthy automation surfaces uncertainty clearly. A good Gremlin PR is not just a patch. It is a review artifact that helps an engineer and agent decide whether the change is safe.
121+
122+
## Current boundaries
123+
124+
Gremlin is not intended to handle every engineering task today. It works best when the task is well-scoped, the relevant context is available, and the expected change is localised. It is less appropriate for work that requires open-ended product judgement, large-scale refactoring, unclear ownership, or deep cross-system design.
125+
126+
This is not a weakness of the approach. It is part of making the system useful. By defining the boundary clearly, Gremlin can be evaluated honestly, improved against real tasks rather than hypothetical ones, and earn trust in a limited domain before expanding.
127+
128+
Engineering teams will not adopt an autonomous agent because it is impressive in a demo. They will adopt it if it repeatedly saves them time without creating hidden risk.
129+
130+
## What I learned building it
131+
132+
One lesson is that the agent is only one part of the product. The surrounding system matters just as much.
133+
134+
A capable coding model still needs task framing, context retrieval, environment setup, permissions, validation, and output formatting. Without those, even a strong agent produces inconsistent results. With them, the same agent becomes much more useful.
135+
136+
The other lesson is about the economics of small work, not the difficulty of small fixes. When the context is clear, a bounded bug usually does not need much judgement. The problem is that picking it up still has a cost, and that cost is rarely worth paying in the moment.
137+
138+
Two failure modes show up repeatedly.
139+
140+
**Small issues accumulate because fixing them one at a time never pays off.** A typo in an error message, a missing null check, a button that misaligns on mobile. Each is the kind of paper cut that takes ten minutes if you stop what you are doing, reproduce it, branch, validate, and open a PR. So they wait. They pile up. The dashboard gets noisier. The product gets rougher at the edges. Gremlin changes the math by moving that work off the engineer's machine and into a sandbox where it can run without interrupting anything else.
141+
142+
**Side quests delay the work that actually matters.** An engineer spots something small while implementing a feature and decides to fix it now because it is right there. That fix fails tests, introduces a regression, or expands scope in a way that has nothing to do with the feature they were shipping. I have seen features stall or fail to merge because of a handful of unrelated fixes bundled into the same PR. Gremlin is a way to offload those tangents: log the issue, assign it, let it run elsewhere, review the PR when it is ready.
143+
144+
The product challenge is not getting an agent to write code. It is getting an agent to absorb work that is cheap to describe but expensive to pick up in the middle of something else.
145+
146+
## From tasks to projects
147+
148+
The longer-term vision for Gremlin goes beyond the long tail. Once the system can reliably handle bounded tasks, the next step is picking up work that is already well defined elsewhere and executing it in a sandbox.
149+
150+
At Prisma, that definition already exists. As I wrote in [Agentic Engineering at Prisma](/agentic-engineering-at-prisma), projects run through [Drive and the Maker](/drive-and-the-maker): upfront specs, milestones, acceptance criteria, and test coverage mapped before implementation starts. [Agent skills](https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview) already encode much of that process. Gremlin does not need to invent the planning layer. The future goal is to take tasks and milestones that are already specified through that process and turn them into actual changes without the engineer running another agent locally on their laptop.
151+
152+
That matters because engineers are already hitting local machine resource limits. Parallel agents, parallel workstreams, a feature branch in one window and a review in another: the execution load is moving faster than a single machine can comfortably carry. Gremlin moves execution off the engineer's machine and into an isolated environment where work can run in the background. The engineer stays focused on judgement, review, and the work that still needs them in the loop.
153+
154+
This is a different problem from fixing a Sentry issue or a Slack instruction. It requires tracking state across multiple PRs, respecting dependencies between milestones, and knowing when not to proceed. But the path starts with reliable execution on smaller tasks. Gremlin's current architecture is designed with that progression in mind. Mastra provides the orchestration layer needed to move from one-off task execution towards longer-running workflows.
155+
156+
## Fix with AI
157+
158+
The most interesting product direction for Gremlin is not a separate agent console. It is **Fix with AI**: a button in the places where Prisma already surfaces a problem, wired to orchestration that returns a pull request instead of a prompt to paste elsewhere.
159+
160+
That pattern only works if something like Gremlin sits behind it. Detection alone is not enough. The product has to gather context, spin up a sandbox, attempt a bounded fix, validate what it can, and route the result back into normal review workflows. Without that layer, "Fix with AI" is just another copyable prompt and the friction moves, it does not disappear.
161+
162+
Prisma is already moving in this direction. With the [launch of Prisma Compute in public beta](https://www.prisma.io/blog/launching-prisma-compute-public-beta), the platform is built around an agentic loop: build, deploy, read logs, fix, and redeploy, with app and database on the same infrastructure. The hard part is no longer only writing code. It is everything after: chasing build output, feeding log context back into the agent, and keeping that loop inside one place instead of jumping between dashboards.
163+
164+
Gremlin is how we extend that loop outward, into the repositories and review processes teams already use.
165+
166+
### Query Insights today
167+
168+
[Query Insights](https://www.prisma.io/docs/query-insights) is the clearest existing example. It is built into Prisma Postgres and surfaces slow queries, expensive reads, and repeated statement shapes. When a query group is hurting performance, the console already answers a useful question: what should change next?
169+
170+
Today that answer often arrives as AI-generated analysis and a copyable prompt. The engineer still switches to an editor, applies the fix, runs checks, and opens a PR. **Fix with AI** closes that gap. Query Insights detects the issue and passes structured context to Gremlin; Gremlin returns a reviewable pull request. The engineer's interface becomes the PR, not another surface to babysit.
171+
172+
## What comes next
173+
174+
Autonomous engineering systems do not need to start by replacing large parts of the software development lifecycle. A more practical path is to begin where the pain is concrete and the scope is bounded.
175+
176+
Gremlin starts with the long tail of well-scoped work: Sentry noise, Slack and Linear instructions, and the fixes nobody picks up because the interrupt cost is too high. By combining Mastra's orchestration layer with OpenCode, Gremlin turns that work into reviewable pull requests while preserving clear boundaries and leaving the merge decision where it belongs: with the engineer reviewing the PR.
177+
178+
The current focus is small to medium-sized tasks. The longer-term direction is larger project execution, **Fix with AI** wired into more Prisma surfaces, and proactive operational quality across [Compute](https://www.prisma.io/blog/launching-prisma-compute-public-beta) and Postgres. The core principle remains the same: agents are most useful when they are embedded into real workflows, constrained by clear boundaries, and evaluated by the amount of friction they remove.

apps/blog/content/blog/launching-prisma-compute-public-beta/index.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ metaDescription: "Prisma Compute is now in public beta: TypeScript app hosting t
99
heroImagePath: "/launching-prisma-compute-public-beta/imgs/hero.png"
1010
heroImageAlt: "Prisma Compute, now in public beta"
1111
metaImagePath: "/launching-prisma-compute-public-beta/imgs/meta.png"
12+
pinned: true
1213
tags:
1314
- "announcement"
1415
- "platform"
2.11 MB
Loading

apps/blog/source.config.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ export const blogPosts = defineCollections({
2121
metaImagePath: z.string().optional(),
2222
series: z.string().optional(),
2323
seriesIndex: z.number().int().positive().optional(),
24+
pinned: z.boolean().optional(),
2425
prev: z.string().optional(),
2526
next: z.string().optional(),
2627
tags: z

apps/blog/src/app/(blog)/page.tsx

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ export async function generateMetadata(): Promise<Metadata> {
4949
}
5050

5151
export default async function BlogHome() {
52-
const posts = blog.getPages().sort((a, b) => {
52+
const sortedByDate = blog.getPages().sort((a, b) => {
5353
const aTime =
5454
a.data.date instanceof Date
5555
? a.data.date.getTime()
@@ -61,6 +61,14 @@ export default async function BlogHome() {
6161
return bTime - aTime;
6262
});
6363

64+
// Pinned posts are surfaced ahead of the chronological feed so the latest
65+
// pinned post takes the featured slot (and the top of the list) instead of
66+
// the most recent post by date. The date sort above is stable, so pinned
67+
// posts keep their newest-first order among themselves.
68+
const isPinned = (post: (typeof sortedByDate)[number]): boolean =>
69+
(post.data as { pinned?: boolean }).pinned === true;
70+
const posts = [...sortedByDate.filter(isPinned), ...sortedByDate.filter((p) => !isPinned(p))];
71+
6472
const getAllAuthors = (post: (typeof posts)[number]): string[] => {
6573
const data = post.data as any;
6674
const authors = Array.isArray(data?.authors) ? data.authors : [];

apps/blog/src/components/SeriesShelf.tsx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -117,9 +117,9 @@ export function FeaturedSeriesShelf({ series }: { series: SeriesShelfItem[] }) {
117117
<span className="shrink-0 text-xs uppercase tracking-wide font-semibold text-foreground-neutral-weak">
118118
Series
119119
</span>
120-
<ul className="flex flex-wrap items-center gap-2">
120+
<ul className="flex min-w-0 flex-wrap items-center gap-2">
121121
{chips.map((item) => (
122-
<li key={item.key} className="min-w-0">
122+
<li key={item.key} className="min-w-0 max-w-full">
123123
<SeriesChip item={item} />
124124
</li>
125125
))}

0 commit comments

Comments
 (0)