Skip to content

Feat: Durable Execution Revamp#2954

Draft
mrkaye97 wants to merge 9 commits intomainfrom
feat-durable-execution
Draft

Feat: Durable Execution Revamp#2954
mrkaye97 wants to merge 9 commits intomainfrom
feat-durable-execution

Conversation

@mrkaye97
Copy link
Contributor

@mrkaye97 mrkaye97 commented Feb 5, 2026

Description

WIP - Long-living branch for durable execution work

Type of change

  • New feature (non-breaking change which adds functionality)

@mrkaye97 mrkaye97 marked this pull request as draft February 5, 2026 14:47
@vercel
Copy link

vercel bot commented Feb 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hatchet-docs Ready Ready Preview, Comment Feb 17, 2026 1:48pm

Request Review

IsSatisfied bool
}

type DurableEventsRepository interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need a method for creating a new branch from a specific event id

})
}

func (r *durableEventsRepository) CreateEventLogEntries(ctx context.Context, opts []CreateEventLogEntryOpts) ([]*sqlcv1.V1DurableEventLogEntry, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Biggest concern here is that we need to lock the file row in an exclusive mode and we need serial updates based on the latest read. In other words we can't just use latest_node_id + 1 to generate these log entries because concurrent writes will read the same latest_node_id.

These cases should be exceedingly rare though, so perhaps we're ok with an advisory TryLock and a simple retry? If we're getting a lot of concurrent writes to these tables we're doing something wrong (should be serialized by the Durable listener anyway)

* feat: add new models

* feat: migration

* chore: generate

* fix: linter

* fix: couple types

* Feat: Initial work on CRUD operations for durable events (#2943)

* feat: initial query work

* feat: first pass at durable events repo + queries

* feat: add new payload type for durable event data

* chore: gen

* fix: payload key

* fix: lint
* feat: initial protos

* chore: lint

* fix: work on improving naming

* chore: rename session id to invocation count

* feat: scaffold implementation of durabletask rpc

* fix: one more session rename

* feat: initial work on the server scaffolding

* chore: gen protos for python

* feat: initial durable task client

* feat: initial durable context work for python

* fix: pass client through to runner

* fix: clean up type checking errors

* fix: cruft

* feat: initial work wiring up durable events

* fix: get -> getorcreate

* feat: query + wiring for updating latest node id

* fix: simplify, bump latest node ids in the same query

* chore: note

* feat: wire up sleeps with internal signal matches

* chore: gen

* fix: callback data writes

* feat: cache previous events

* fix: wire up external id writes

* feat: got sleeps sorta working!

* fix: tenant and external id wiring

* chore: comments

* fix: clean up some types a bit

* feat: add run triggering params to proto to allow for spawning children

* feat: first pass at child spawning

* feat: start wiring up child spawning

* fix: use `triggerWriter` for spawn

* feat: update trigger proto def

* chore: regen python

* feat: start wiring up spawning correctly with all opts

* refactor: share trigger code

* chore: remove log lines, lint

* fix: add triggered run external id

* feat: start wiring up child key storage better

* chore: gen again

* fix: gen, colname

* fix: trigger opts panicking

* hack: get things working for now

* feat: shared rpc message

* chore: fix imports

* feat: add tenant id to tables

* fix: improve ingest logic

* refactor: shared trigger opt type

* fix: send tenant id through everywhere

* chore: fix log file insert on conflict

* fix: repo

* fix: generate external id upstream

* feat: add columns to the match

* feat: first pass at durable waits on the controllers instead of the dispatcher

* fix: types

* feat: wire up callbacks

* fix: invoc counts

* fix: typing, lint

* driveby: more constants for message ids

* refactor: struct for callback keys everywhere

* fix: bugs, passing tests

* fix: return errnorows

* fix: schema

* fix: remove current callback flow

* feat: new message types

* fix: remove key from callback model

* fix: rm unused queries

* refactor: start reworking flow

* fix: start working on feedback

* fix: query

* fix: wire up external ids

* revert: drive by

* refactor: rm extra interface

* chore: move listener, lint

* refactor: remove old listener, rename

* refactor: consolidate migrations

* fix: immediately send already-satisfied callbacks

* fix: union

* chore: rm unused queries

* fix: check if entry already exists before re-spawning / signaling

* fix: node id incrementation

* fix: rm json dump

* fix: don't pass node id

* fix: store latest invocation, update query

* fix: upsert logic

* Revert "fix: upsert logic"

This reverts commit cf7c609.

* fix: change logic slightly

* fix: split up get and create queries

* fix: err

* fix: pass node ids around properly

* fix: invocation handling

* fix: callback bug

* fix: naming

* fix: rm cruft method, dynamic kind

* fix: wire up memo payload and kind stuff

* fix: propagate trigger opts

* fix: child spawn signaling + olap wiring

* fix: extract output method

* feat: improve test coverage a bit

* fix: child spawning

* feat: another test

* fix: query fixes, overwrite

* fix: match bug

* fix: proto indexes, regen

* fix: eviction comment

* fix: warning for non-async durable tasks

* fix: rm contracts import

* fix: basic locking, rm sync durable tasks

* fix: invocation counts, etc.

* chore: add fixme

* fix: rm unused invocation count param from callback response

* fix: rm dispatcher id from the callback

* fix: di test

* Revert "fix: rm dispatcher id from the callback"

This reverts commit 26e6c82.

* fix: migration

* fix: use optimistictx

* fix: lift grpc codes out of trigger repo

* fix: span names

* fix: rm comment

* fix: consolidate kind types, batching, not-null kinds

* fix: null bug

* fix: satisfied claim bug, simplify queries

* fix: add back payload storage

* fix: match bug, simplification

* fix: factor out trigger opts to the dispatcher level

* fix: factor out conditions

* fix: rm unused structs

* fix: rm dupes

* fix: migration

* refactor: switch case helpers

* fix: panic

* fix: couple warnings

* fix: lint

* fix: generate external ids properly

* refactor: return trigger task data from helper

* fix: handle matches correctly for dag spawns

* fix: add validators, one more uuid type

* chore: gen

* chore: bump pytest-asyncio to latest

* fix: store the worker instead of the dispatcher, then look up the dispatcher

* fix: store dispatcher id on the worker

* chore: lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants