Skip to content

ACTIVE event-publish degradation on events.extrachill.com: upsert handler rejects every tool call (job_id missing from satellite handler tool defs); new event writes stopped ~16:00 UTC June 6 #2560

@chubes4

Description

@chubes4

Summary

After the "require explicit tool context bindings" contract migration, the only way job_id reaches a handler's handle_tool_call() is via an explicitly declared client_context_bindings entry on the tool definition. Every data-machine-core handler tool was updated to declare 'client_context_bindings' => array( 'job_id' ) — but the satellite plugin handler tool definitions were never updated. As a result, every invocation of those handlers' tools fails immediately because job_id is absent from $parameters.

This is firing as a live error storm on events.extrachill.com (blog 7) via the data-machine-events upsert_event tool. The base-class guard rejects every single AI tool call.

The error

Upsert Handler Error: job_id parameter is required for update operations

Source: inc/Core/Steps/Upsert/Handlers/UpsertHandler.php:73

final public function handle_tool_call( array $parameters, array $tool_def = array() ): array {
    $job_id = (int) ( $parameters['job_id'] ?? null );
    if ( ! $job_id ) {
        return $this->errorResponse( 'job_id parameter is required for update operations' );
    }
    ...

Scale / frequency

  • 9,162+ entries of this exact error in the events.extrachill.com DM log (the log table itself only retains back to 2026-06-03 22:02, so the true count is higher).
  • Firing multiple times per minute, continuously. Latest observed: 2026-06-06 18:00:38 UTC.
  • First occurrence in the (retention-limited) log: 2026-06-04 16:56:27 UTC. The error storm is contiguous from that timestamp forward.
  • Agent: agent_id = 6 (events-bot) on every entry.

Root cause (traced end to end)

How job_id is supposed to reach the upsert handler:

  1. The conversation loop builds the loop payload (which carries top-level job_id) and mirrors context-bindable run fields into client_context via datamachine_payload_with_client_context_bindings() (inc/Engine/AI/conversation-loop.php).
  2. ToolExecutor::executeTool() calls applyDeclaredContextBindings() (inc/Engine/AI/Tools/ToolExecutor.php:204), which injects payload/context values into tool parameters only for keys named in the tool definition's client_context_bindings.
  3. The substrate's WP_Agent_Tool_Parameters::buildParameters() (vendor/wordpress/agents-api/src/Tools/class-wp-agent-tool-parameters.php) also only maps context → parameters through client_context_bindings. There is no ambient/key-name matching anywhere in the path. If a tool def does not declare the binding, job_id is never injected.

Where it breaks:

The data-machine-events upsert_event tool definition omits client_context_bindings entirely:

data-machine-events/inc/Steps/Upsert/Events/EventUpsertFilters.php:107-114 (getDynamicEventTool()):

return array(
    'class'          => EventUpsert::class,
    'method'         => 'handle_tool_call',
    'handler'        => 'upsert_event',
    'description'    => 'Create or update WordPress event post...',
    'parameters'     => $parameters,
    'handler_config' => $ue_config,
    // <-- NO 'client_context_bindings' => array( 'job_id' )
);

EventUpsert extends UpsertHandler (data-machine-events/inc/Steps/Upsert/Events/EventUpsert.php:39), so it inherits the strict job_id guard at UpsertHandler.php:71-74. With no binding, $parameters['job_id'] is always empty → every call returns the error and the event is never created/updated.

For contrast, the data-machine-core WordPress upsert handler does declare it:

inc/Core/Steps/Upsert/Handlers/WordPress/WordPress.php:41:

'client_context_bindings' => array( 'job_id' ),

Verdict on the deploy: NOT introduced by v0.140.0

The prompt hypothesis was that the 2026-06-06 ~17:07 UTC v0.139.18 → v0.140.0 deploy introduced/exposed this. The data contradicts that — the storm began 2026-06-04 16:56:27 UTC, ~2 days before the v0.140.0 deploy.

The actual offending commit is:

  • eeb4affa "fix: require explicit tool context bindings" (2026-06-02 22:13 EDT) — first shipped in tag v0.139.6.

That commit added 'client_context_bindings' => array( 'job_id' ) to every data-machine-core handler (Fetch reject/defer disposition tools, Email publish, WordPress publish, WordPress upsert) as part of the substrate migration that made bindings the only injection mechanism. The satellite plugins (data-machine-events, etc.) were not part of that commit and were never updated.

The onset timestamp (2026-06-04 16:56) lines up with the production deploy that carried eeb4affa to events.extrachill.com — tags v0.139.18 (2026-06-04 15:26 UTC) / v0.139.19 (2026-06-04 16:12 UTC). eeb4affa is an ancestor of v0.139.18, v0.139.19, and v0.140.0. Note: de057266 "Fix declared tool context bindings (#2543)" (June 5, in v0.140.0) hardens the re-application of bindings inside ToolExecutor but is not the cause — the storm predates it by ~19 hours.

Offending pipeline / agent / handler

  • Site: events.extrachill.com — blog 7 (c8c_7_* tables).
  • Agent: agent_id = 6 (events-bot, owner chubes).
  • Handler subclass: DataMachineEvents\Steps\Upsert\Events\EventUpsert (tool upsert_event), deployed data-machine-events v0.40.7.
  • The events-bot city/venue pipelines all terminate in the upsert_event upsert step, so every pipeline run that reaches the upsert tool call fails.

Blast radius

  • events.extrachill.com only for the active storm — the main site (blog 1) has zero of these errors, because only data-machine-events ships an upsert handler tool that lacks the binding.

  • Jobs impact on blog 7: datamachine jobs summary reports 743 stuck-processing and 10,450 failed jobs (large failed buckets per pipeline). The upsert failures are a primary contributor to the failed/stuck event-creation jobs.

  • Wider latent exposure (same class of bug, not all firing yet): a platform-wide grep shows handler/tool definitions across the satellite plugins that call into handle_tool_call / getEngineData($job_id) but do not declare client_context_bindings:

    • data-machine-events: EventUpsertFilters.php (the live one) + numerous inc/Api/Chat/Tools/* tools.
    • data-machine-socials: most inc/Handlers/*/*.php publish handlers (Facebook, Twitter, Instagram, Bluesky, Threads, LinkedIn, Pinterest) + inc/Chat/Tools/*.
    • data-machine-business: GoogleAnalytics, BingWebmaster, GoogleSearchConsole, GoogleSearch, PageSpeedTool, AmazonAffiliateLink.

    Not all of these need job_id (many chat tools don't), but any that call getEngineData() / depend on the run's job_id are silently exposed to the same failure mode the moment they're used in a pipeline. This should be audited as part of the fix.

Recommended fix direction (do NOT implement here — separate PR)

Two layers:

  1. Immediate (unblocks events): add 'client_context_bindings' => array( 'job_id' ) to the upsert_event tool definition in data-machine-events/inc/Steps/Upsert/Events/EventUpsertFilters.php::getDynamicEventTool(). (This is a data-machine-events change — track/cross-link there.)
  2. Systemic (data-machine): the contract migration silently broke any external handler that relied on the old implicit job_id injection. data-machine should either (a) make the base UpsertHandler / PublishHandler / FetchHandler registration helpers auto-attach the job_id binding for handler-class tools that go through handle_tool_call, or (b) emit a loud one-time registration-time warning when a handler tool def that resolves to a *Handler subclass omits client_context_bindings, so future satellite handlers can't regress into a silent storm. Decide whether the binding should be a registration-helper default rather than a per-tool-def opt-in.

Acceptance criteria

  • upsert_event (and any other handler-class pipeline tool that needs the run's job_id) receives a non-zero job_id in $parameters at handle_tool_call() time on events.extrachill.com.
  • The Upsert Handler Error: job_id parameter is required for update operations log entries stop appearing on blog 7.
  • events-bot pipelines that reach the upsert step successfully create/update events again (verify with a real flow run + a drop in the failed/stuck job counts).
  • A guard or default exists so a satellite handler that omits the binding fails loudly at registration (or auto-binds), instead of silently rejecting every tool call at runtime.
  • The latent-exposure list above is triaged: every satellite handler tool that depends on job_id either declares the binding or is confirmed not to need it.

Evidence references

  • Error guard: inc/Core/Steps/Upsert/Handlers/UpsertHandler.php:71-74
  • Core binding (present): inc/Core/Steps/Upsert/Handlers/WordPress/WordPress.php:41
  • Events tool def (missing binding): data-machine-events/inc/Steps/Upsert/Events/EventUpsertFilters.php:107-114
  • Binding application: inc/Engine/AI/Tools/ToolExecutor.php:204-231
  • Loop context mirroring: inc/Engine/AI/conversation-loop.php (datamachine_payload_with_client_context_bindings)
  • Substrate (bindings-only, no ambient match): vendor/wordpress/agents-api/src/Tools/class-wp-agent-tool-parameters.php::buildParameters()
  • Root-cause commit: eeb4affa "fix: require explicit tool context bindings" (first in tag v0.139.6)

Metadata

Metadata

Assignees

No one assigned

    Labels

    architectureFoundational architecture changesbugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions