Skip to content

feat(sdk/go): add observability package (OpenTelemetry logging)#1740

Open
MarcusElwin wants to merge 3 commits into
iii-hq:mainfrom
MarcusElwin:feat/go-sdk-observability
Open

feat(sdk/go): add observability package (OpenTelemetry logging)#1740
MarcusElwin wants to merge 3 commits into
iii-hq:mainfrom
MarcusElwin:feat/go-sdk-observability

Conversation

@MarcusElwin

@MarcusElwin MarcusElwin commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Refs #1719
Follow-up to #1739 (the core Go SDK).

What

Adds sdk/packages/go/observability — the Go counterpart of the Rust iii-observability crate and the Node @iii-dev/observability package, completing the per-language set (iii, iii-example, observability).

  • observability/ — a separate module providing opt-in OpenTelemetry logging:

    • Logger — structured Info/Warn/Error/Debug emitting OTel LogRecords correlated with the active span, with a slog fallback when OTel isn't initialized. Go numerics and nested maps/slices are preserved as structured OTLP attributes, not stringified.
    • Init / OtelConfig — wires a logger provider with a stdout (default) or OTLP/HTTP exporter; returns a shutdown func.
  • iii-example/ — an observability.go demo plus a docker-compose.yaml + otel-collector-config.yaml stack (OpenTelemetry Collector + OpenObserve UI), mirroring the Node/Python example stacks. Ships logs over OTLP to the collector, or prints to stdout when no endpoint is set.

  • I license my contributions to this repository under Apache 2.0, and I have all necessary rights over the code I am contributing.

Why

The core SDK (#1739) carries W3C trace context on the wire and ships a no-op OTel seam; this is the real OpenTelemetry wiring, which was scoped as a follow-up in #1719. It brings Go to parity with the other SDKs, which all ship a dedicated observability package and demonstrate it in their examples.

Notes

  • Verified end-to-end: Logger checked via the stdout exporter and against a live OpenTelemetry Collector over OTLP/HTTP (correct severities, structured attributes, numeric types). Two bugs found during testing — native-int stringification and a wrong OTLP endpoint path — are fixed with regression tests. 86.7% coverage.
  • go 1.25: the observability module needs 1.25 (OTLP exporter's dependency graph), higher than the SDK's 1.23. iii-example is bumped to match since it now imports both modules.
  • Scope: v1 wires logs only (the Logger and its OTLP/stdout export). Trace/metric exporters, span operations, baggage propagation, and HTTP instrumentation — present in the Rust/Node packages — are a noted next layer.
  • Publishing: a third Go module (…/sdk/packages/go/observability); like the SDK it publishes via a subdir-scoped tag (sdk/packages/go/observability/vX.Y.Z). The _go.yml workflow from feat(sdk/go): add official Go SDK for the iii worker protocol #1739 can be reused with a second sdk-go-obs job or extended — happy to wire whichever you prefer.

Summary by CodeRabbit

  • New Features

    • Added comprehensive Go SDK examples covering HTTP endpoints, scheduled cron jobs, custom trigger types, streaming channels, and structured logging/observability.
    • Included a standalone observability module with OpenTelemetry-based logging support and initialization utilities.
    • Provided a Docker Compose telemetry stack to run OpenObserve + OpenTelemetry Collector locally.
  • Documentation

    • Added detailed READMEs and run instructions for the Go examples and the observability module.
  • Refactor

    • Example app reorganized to use per-feature setup for clearer demos.

@vercel

vercel Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

@MarcusElwin is attempting to deploy a commit to the motia Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds a standalone Go observability package (OTel logging, exporter selection, JSON→OTel value conversion, Init/shutdown) and refactors/expands the iii-example module with per-feature demos (HTTP, cron, custom trigger type, logging, channels) plus Docker Compose and collector config for local telemetry.

Changes

Go Observability Module and iii-example Enhancements

Layer / File(s) Summary
Observability module go.mod and README
sdk/packages/go/observability/go.mod, sdk/packages/go/observability/README.md
Adds the observability module declaration, Go 1.25.0 toolchain, OpenTelemetry dependency pins, and README documenting usage and Init/shutdown.
JSON→OTel value conversion
sdk/packages/go/observability/anyvalue.go
Implements jsonToLogValue and helpers to convert nested arrays/objects and numeric edge cases into OTel log.Value.
Logger implementation and API
sdk/packages/go/observability/logger.go
Adds Logger, NewLogger, severity methods, OTel record emission with trace-context correlation, and slog fallback.
OTel init and exporter wiring
sdk/packages/go/observability/telemetry.go
Defines ExporterKind/OtelConfig, Init/shutdown lifecycle, exporter selection (stdout vs OTLP), and OTLP endpoint parsing.
Observability tests
sdk/packages/go/observability/logger_test.go
Adds comprehensive tests for jsonToLogValue, numeric edge cases, logger fallback, Init/shutdown behavior, and otlpOptions parsing.
iii-example go.mod and docs
sdk/packages/go/iii-example/go.mod, sdk/packages/go/iii-example/README.md
Updates iii-example module to Go 1.25.0, adds observability replace, and documents feature-to-file mapping and telemetry run instructions.
Main wiring and observability demo
sdk/packages/go/iii-example/main.go, sdk/packages/go/iii-example/observability.go
Refactors main to call setupHTTP, setupCron, setupTriggerType, setupLogger; initializes demo observability and defers shutdown; triggers demos and serves until interrupted.
HTTP and cron trigger examples
sdk/packages/go/iii-example/http.go, sdk/packages/go/iii-example/cron.go
Adds hello::greet HTTP POST /greet example and example::scheduled_cleanup cron example with 0 * * * * * schedule.
Custom interval trigger type example
sdk/packages/go/iii-example/triggertype.go
Implements an "interval" trigger type with per-instance tickers, cancellable contexts, and periodic client.Trigger payloads.
Logging and channels examples
sdk/packages/go/iii-example/logger.go, sdk/packages/go/iii-example/channels.go
Adds example::logger_demo for fire-and-forget structured logging and demoChannel streaming demo (binary write, text message, reader ReadAll, OnMessage handler).
Docker Compose and collector config
sdk/packages/go/iii-example/docker-compose.yaml, sdk/packages/go/iii-example/otel-collector-config.yaml
Provides Docker Compose for OpenObserve and an OTEL collector config that receives OTLP, batches, and exports to OpenObserve plus console debug, with health checks and network bridging.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • iii-hq/iii#1232: Related logging/OTel attribute handling changes that align with structured log.data and nested OTLP attribute improvements.

Suggested reviewers

  • sergiofilhowz

Poem

🐰 A tiny rabbit logs with glee,
Sending traces out to see,
Cronticks, channels, HTTP cheer,
Docker stacks make telemetry near,
Hooray — observability is here!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding an OpenTelemetry logging observability package to the Go SDK, which is the primary purpose of this PR.
Docstring Coverage ✅ Passed Docstring coverage is 84.21% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description follows the required template with all sections completed: What, Why, and Notes are all present and well-documented.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/release-iii.yml (1)

653-742: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Include sdk-go in the completion gate.

notify-complete never depends on sdk-go, never captures its result, and never folds it into OVERALL. A broken Go module publish would still send a successful Slack summary.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/release-iii.yml around lines 653 - 742, The
notify-complete job currently omits the sdk-go job from its dependency list and
result aggregation; add needs: - sdk-go to notify-complete's needs, capture its
result in the env block as SDK_GO_RESULT="${{ needs.sdk-go.result }}", append a
line to build the human-readable status into LINES (e.g., LINES="$LINES$(icon
"$SDK_GO_RESULT") SDK Go\n"), and include "$SDK_GO_RESULT" in the for loop that
sets OVERALL so a failed sdk-go marks the overall status as failure; refer to
notify-complete, the env keys (SDK_GO_RESULT), LINES construction, and the for
loop that tests results.
🟡 Minor comments (9)
sdk/packages/go/iii/tests/errors_test.go-39-42 (1)

39-42: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Assert the exact error code here.

This test claims the engine should surface handler failures as invocation_failed, but it currently passes for any non-empty code.

Suggested assertion
-	// The engine reports a handler failure as code "invocation_failed".
-	if ie.Code == "" {
-		t.Error("InvocationError.Code is empty")
-	}
+	if ie.Code != "invocation_failed" {
+		t.Errorf("code = %q, want invocation_failed", ie.Code)
+	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdk/packages/go/iii/tests/errors_test.go` around lines 39 - 42, The test
currently only checks that ie.Code is non-empty; change it to assert the exact
expected error code by comparing ie.Code (InvocationError.Code) to
"invocation_failed" so the test fails if the engine surfaces any other code;
update the assertion in the errors_test.go test that uses the ie variable to
require ie.Code == "invocation_failed" (e.g., replace the t.Error check with a
t.Fatalf/t.Errorf that compares ie.Code to the expected string).
sdk/packages/go/iii/tests/errors_test.go-67-77 (1)

67-77: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Verify the unknown-function contract, not just “some error”.

This still passes on an unrelated timeout or transport failure. Assert the returned error type and stable engine code so the integration test actually protects the public error surface.

Suggested assertion shape
 	if err == nil {
 		t.Fatal("expected an error invoking an unregistered function")
 	}
+	var ie *iii.InvocationError
+	if !errors.As(err, &ie) {
+		t.Fatalf("error is not *InvocationError: %v", err)
+	}
+	if ie.Code == "" {
+		t.Fatal("InvocationError.Code is empty")
+	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdk/packages/go/iii/tests/errors_test.go` around lines 67 - 77, The test
TestInvokeUnknownFunctionErrors currently only checks for a non-nil error;
update it to assert the error is the expected engine "unknown function" error by
using errors.As (or a type assertion) to convert err into the SDK's engine error
type (e.g., iii.EngineError or iii.Error) and then assert the engine-level
code/constant (e.g., EngineCode or Code equals the unknown-function constant
like iii.EngineErrorCode_UNKNOWN_FUNCTION or the string "UNKNOWN_FUNCTION"); if
the assertion fails, call t.Fatalf with a message showing the actual type/code
so unrelated timeouts/transports no longer pass this test.
sdk/packages/go/iii/protocol.go-256-285 (1)

256-285: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix MarshalMessage pointer/value contract mismatch in iii protocol
sdk/packages/go/iii/protocol.go’s MarshalMessage comment says to pass message structs “by pointer or value”, but the implementation type-switch only matches *RegisterTriggerTypeMessage, *RegisterTriggerMessage, etc. (and *PingMessage/*PongMessage), so passing a value (e.g. PingMessage{}) will hit the “cannot marshal unknown message type” default. Either update the comment to pointer-only or add non-pointer cases to the switch.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdk/packages/go/iii/protocol.go` around lines 256 - 285, MarshalMessage's
comment promises pointer-or-value but the type switch only matches pointer types
so value arguments (e.g. PingMessage{}) fall through; fix by either updating the
comment to require pointers only or (preferred) extend the switch in
MarshalMessage to include the non-pointer variants (e.g.
RegisterTriggerTypeMessage, RegisterTriggerMessage,
TriggerRegistrationResultMessage, UnregisterTriggerMessage,
RegisterFunctionMessage, UnregisterFunctionMessage, InvokeFunctionMessage,
InvocationResultMessage, RegisterServiceMessage, WorkerRegisteredMessage,
PingMessage, PongMessage) and route them to the same marshalEnvelope calls as
their pointer counterparts so both value and pointer inputs are handled
consistently.
sdk/packages/go/observability/README.md-42-55 (1)

42-55: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make the OTLP example a standalone Init flow.

This snippet reads like a second Init call after the stdout example, and the OTLP branch drops both err and shutdown. Split these into separate examples, or show the OTLP branch with the same error/shutdown handling so readers do not copy a leaking initialization pattern.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdk/packages/go/observability/README.md` around lines 42 - 55, The OTLP
example currently appears as a second Init call and omits error handling and
shutdown; make it a standalone Init flow by showing observability.Init called
with observability.OtelConfig (using ServiceName, Exporter set to
observability.ExporterOTLP, and OTLPEndpoint) and include the same error capture
and deferred shutdown pattern as the stdout example (capture err from
observability.Init and defer shutdown(ctx)) so readers don't copy a leaking
initialization pattern; reference observability.Init, observability.OtelConfig,
observability.ExporterOTLP, err, and shutdown in the example.
sdk/packages/go/iii/README.md-178-183 (1)

178-183: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify that OTel logging already exists.

This section reads as if all OpenTelemetry integration is still future work, but this PR adds sdk/packages/go/observability for logging today. Narrow the follow-up wording to spans/invocation tracing and point readers at the logging module so they do not miss the supported path.

As per coding guidelines, "Check for related inconsistencies with the docs/".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdk/packages/go/iii/README.md` around lines 178 - 183, Update the
Observability README paragraph to clarify that OpenTelemetry-based logging is
already implemented in the Go SDK by referencing the new package
(sdk/packages/go/observability) and that the planned follow-up is specifically
adding real spans/invocation tracing (not basic logging); adjust the follow-up
sentence to point readers to the observability logging module and narrow wording
to "spans/invocation tracing" and check for related inconsistencies in docs/ to
keep documentation consistent with the added observability package.
sdk/packages/go/iii-example/logger.go-11-16 (1)

11-16: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the comment; the follow-up already landed.

This still says a dedicated observability package is "a planned follow-up", but this PR adds sdk/packages/go/observability and observability.go already demonstrates it. Leaving the comment as-is sends readers to stale guidance.

As per coding guidelines, "Check for related inconsistencies with the docs/".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdk/packages/go/iii-example/logger.go` around lines 11 - 16, The file comment
above setupLogger is stale—remove the "planned follow-up" wording and update it
to reflect that a dedicated observability package now exists; mention the new
sdk/packages/go/observability package and observability.go and state that the Go
SDK v1 also provides a real observability implementation there instead of
deferring future work. Keep the rest of the context about mirroring
logger_example.rs and engine log usage but replace the future-tense note with a
brief pointer to observability.go as the implemented follow-up.
sdk/packages/go/iii-example/README.md-19-26 (1)

19-26: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Document the Go 1.25 requirement next to the run steps.

This module’s go.mod is pinned to go 1.25.0, so go run . will fail for readers following this README on Go 1.23/1.24. Please call out the higher toolchain requirement near ## Run, especially since the base iii README advertises Go 1.23+ for the core SDK.

Based on learnings, sdk/packages/go/observability/README.md states that the observability module "requires Go 1.25+".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdk/packages/go/iii-example/README.md` around lines 19 - 26, Add a short note
next to the "## Run" section stating that this example requires Go 1.25 (go.mod
is pinned to go 1.25.0) so that running "go run ." on older toolchains will
fail; update README.md (near the "## Run" header and the "go run ." instruction)
to explicitly call out "Requires Go 1.25+" and optionally link or mention go.mod
for verification.
sdk/packages/go/iii-example/channels.go-33-35 (1)

33-35: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Return after a writer-close failure.

ReadAll depends on the writer sending EOF. If ch.Writer.Close() fails and we keep going, this example can sit until the 10s context timeout and report a read failure instead of the real cause.

Suggested fix
 	if err := ch.Writer.Close(); err != nil {
 		log.Printf("channel writer close: %v", err)
+		return
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdk/packages/go/iii-example/channels.go` around lines 33 - 35, The
ch.Writer.Close() error is currently logged but ignored, causing ReadAll to hang
until the context times out; update the close handling in the function (where
ch.Writer.Close() is called) to return the error immediately after logging so
callers (and ReadAll) see the real failure instead of waiting for the deadline —
locate the close call in channels.go (the block that calls ch.Writer.Close())
and change control flow to log and then return the error (propagate it up from
this function).
sdk/packages/go/iii-example/cron.go-16-24 (1)

16-24: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don’t swallow the cron payload decode error.

If the event shape drifts or the payload is malformed, this handler logs empty fields and still returns {"cleaned": true}. Returning the unmarshal error keeps the example honest about the trigger contract.

Suggested fix
 	if err := client.RegisterFunction("example::scheduled_cleanup", func(ctx context.Context, data json.RawMessage) (any, error) {
 		// The engine delivers a cron event payload; we just acknowledge the run.
 		var ev struct {
 			Trigger string `json:"trigger"`
 			JobID   string `json:"job_id"`
 		}
-		_ = json.Unmarshal(data, &ev)
+		if err := json.Unmarshal(data, &ev); err != nil {
+			return nil, err
+		}
 		log.Printf("cron fired: trigger=%s job_id=%s", ev.Trigger, ev.JobID)
 		return map[string]any{"cleaned": true, "job_id": ev.JobID}, nil
 	}); err != nil {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdk/packages/go/iii-example/cron.go` around lines 16 - 24, The handler
registered with client.RegisterFunction for "example::scheduled_cleanup"
currently ignores json.Unmarshal errors into ev and always returns {"cleaned":
true}; change it to check the error from json.Unmarshal(data, &ev) and if
non-nil return nil and that error (or wrap it with context mentioning
"scheduled_cleanup payload unmarshal") so malformed cron payloads propagate
instead of being swallowed; update the anonymous function passed to
RegisterFunction to perform the error check before logging and returning the
success map.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/_go.yml:
- Line 61: Replace floating version tags with immutable commit SHAs for the
GitHub Actions used: update the three "uses:" entries
slackapi/slack-github-action@v2.0.0, actions/checkout@v4, and
actions/setup-go@v5 to the corresponding full commit SHAs (e.g.,
slackapi/slack-github-action@<full-sha>, actions/checkout@<full-sha>,
actions/setup-go@<full-sha>); fetch the latest intended release commit SHA from
each action's GitHub repository and substitute the tag with that SHA in the
workflow so the runs reference an immutable commit.
- Around line 101-106: The step "Verify the proxy resolves the published
version" currently inlines inputs.module_path and inputs.version into the run:
shell, which is unsafe; instead add two env entries (e.g. MODULE_PATH and
MODULE_VERSION) set from ${{ inputs.module_path }} and ${{ inputs.version }},
then change the run command in that step to use the quoted env expansions (for
example: go list -m "$MODULE_PATH@v$MODULE_VERSION"); keep existing GOPROXY and
GOFLAGS env entries and ensure the run line quotes the variables to prevent
shell injection.
- Around line 74-77: Replace the hardcoded Go toolchain in the reusable publish
workflow by switching the actions/setup-go step to use go-version-file: ${{
inputs.package_path }}/go.mod (instead of go-version: '1.23') so each module
honors its declared toolchain; for the go list command, stop interpolating
inputs directly into the run line and pass ${{ inputs.module_path }} and ${{
inputs.version }} via environment variables (or safely quote/construct the
argument) to avoid shell injection; and pin third‑party actions currently using
floating tags (actions/setup-go@v5 and actions/checkout@v4) to specific commit
SHAs for supply‑chain hardening while keeping the already-pinned slack action as
is.

In @.github/workflows/ci.yml:
- Around line 748-789: The sdk-go-ci job currently only targets
sdk/packages/go/iii; update the job to also run formatting, vet, and tests for
sdk/packages/go/observability and sdk/packages/go/iii-example and ensure each
module uses the Go toolchain declared in its go.mod (e.g., use Go 1.25 for
observability and iii-example) by invoking actions/setup-go with the correct
go-version per module (or add per-module subjobs/matrix entries), run gofmt -l,
go vet ./..., and go test (with -race where appropriate) in each module’s
working-directory, and update cache-dependency-path entries to point at each
module’s go.sum so caching remains correct.
- Line 753: The workflow uses floating action tags (e.g., actions/checkout@v4,
actions/setup-go@v5, actions/download-artifact@v4 and others like
actions/setup-node@v4, actions/setup-python@v5, pnpm/action-setup@v4,
dtolnay/rust-toolchain@stable, Swatinem/rust-cache@v2, dorny/paths-filter@v3)
which can be retagged upstream; update each `uses:` entry to the corresponding
immutable commit SHA form (owner/repo@<full-commit-sha>) by replacing the tag
with the pinned SHA for that action repository so CI behavior is stable and
repeatable.

In @.github/workflows/release-iii.yml:
- Around line 490-508: Add a parallel release job for the new Go module by
duplicating the existing sdk-go job (which uses ./.github/workflows/_go.yml) and
updating the inputs to point to the observability module: set package_path to
sdk/packages/go/observability, module_path to
github.com/iii-hq/iii/sdk/packages/go/observability, version to ${{
needs.setup.outputs.version }}, dry_run and slack_thread_ts to the same
expressions as sdk-go, and adjust slack_label (e.g., "SDK Go (observability)");
ensure the job has the same needs and conditional (if: ${{ !failure() &&
!cancelled() }}) and forwards the same secrets (SLACK_BOT_TOKEN,
SLACK_CHANNEL_ID).

In `@sdk/packages/go/iii/channels.go`:
- Around line 102-103: OpenWriter (and the analogous OpenReader at lines
~188-189) currently override ref.Direction with a hardcoded
ChannelWrite/ChannelRead; change them to use the ref.Direction when calling
channelURL and validate the ref.Direction matches the constructor being called
(in OpenWriter require ref.Direction == ChannelWrite; in OpenReader require
ref.Direction == ChannelRead). If the direction does not match, return nil (or
an error/handle per existing constructor pattern) instead of building a URL with
the wrong direction; update the channelURL invocation in both
ChannelWriter/ChannelReader constructors to pass ref.Direction and add the
simple guard checks before constructing the ChannelWriter/ChannelReader.

In `@sdk/packages/go/iii/client.go`:
- Around line 607-615: The issue is that enqueueOutboundDirect currently sends
connection-specific frames into the shared c.outbound channel which can be
written by a subsequent connection; fix by routing direct replies to the current
writer instead of the shared queue: add a per-connection outbound channel or
attach a connection identifier to frames and have writeLoop drop frames whose
connID doesn't match the active connection. Concretely, modify
enqueueOutboundDirect to either push into a connection-scoped channel created in
the writer loop (e.g., connOutbound) or include a connID when enqueuing and
update writeLoop to check that connID matches the active connection before
writing (references: enqueueOutboundDirect, c.outbound, writeLoop, c.shutdown).
- Around line 585-595: The dispatch switch never routes trigger teardown
messages, so add a branch for the unregister message (e.g., case
MsgUnregisterTrigger:) that offloads to a goroutine and calls the appropriate
teardown handler (e.g., c.handleUnregisterTrigger(ctx, dec.UnregisterTrigger));
if handleUnregisterTrigger doesn’t exist, implement it to invoke the
TriggerHandler.UnregisterTrigger hook and clean up per-instance background work,
mirroring how MsgRegisterTrigger is handled (use the same ctx/dec pattern and
ensure any identifiers in DecodedMessage are used to find the right trigger
instance).
- Around line 333-347: Connect currently launches c.supervise() unbounded;
change it so the supervisor is tied to that Connect call's lifecycle: ensure
only a single supervisor goroutine runs per active Connect attempt (use a
sync.Once or a per-client supervisorStarted flag) and pass a derived context
(e.g., ctx or a superviseCtx with cancel) into supervise so it returns when the
Connect context is done; update Client.Connect to start supervise only if not
already running, call supervise(ctx) (or superviseWithCtx) instead of go
c.supervise(), and ensure supervise checks ctx.Done() as well as c.shutdown and
signals c.connected only while its ctx is active to avoid orphaned reconnect
loops and shared outbound queues across retries.

In `@sdk/packages/go/iii/tests/api_triggers_test.go`:
- Around line 40-47: The route path and trigger name are hardcoded causing
collisions; make them unique by deriving both from the test name and a
random/unique suffix (e.g., use t.Name() + a unique token) before calling
c.RegisterTrigger: build a uniquePath variable for the api_path JSON (replacing
path) and a uniqueTriggerName for the first argument to c.RegisterTrigger
(replacing "test-go-http"); ensure the JSON payload uses the uniquePath and pass
uniqueTriggerName into c.RegisterTrigger so parallel runs do not conflict.

In `@sdk/packages/go/iii/tests/worker_metadata_test.go`:
- Around line 56-73: The current search picks the first connected Go worker
which can be the wrong process; instead locate the worker entry by the specific
worker ID created in this test. When you create/open the test worker (capture
its unique id in a variable like createdWorkerID or worker.ID), then iterate
out.Workers and select the worker whose ID matches that captured id (e.g.,
compare worker.ID to createdWorkerID) and only then perform the OS/Version
assertions on that workerInfo (instead of using the existing runtime+status
filter/ours variable).

In `@sdk/packages/go/observability/anyvalue.go`:
- Around line 42-58: The jsonToLogValue conversion narrows uint* and float64 to
int64 without bounds checks causing truncation; update the uint cases (uint,
uint8, uint16, uint32, uint64) to check against math.MaxInt64 and only call
log.Int64Value(int64(t)) when t <= math.MaxInt64, otherwise return
log.Float64Value(float64(t)) to mirror the Rust as_i64-first contract; likewise
in the float64 branch first verify t is within [math.MinInt64, math.MaxInt64]
and represents a whole integer before computing int64(t) and calling
log.Int64Value, otherwise return log.Float64Value(t).

---

Outside diff comments:
In @.github/workflows/release-iii.yml:
- Around line 653-742: The notify-complete job currently omits the sdk-go job
from its dependency list and result aggregation; add needs: - sdk-go to
notify-complete's needs, capture its result in the env block as
SDK_GO_RESULT="${{ needs.sdk-go.result }}", append a line to build the
human-readable status into LINES (e.g., LINES="$LINES$(icon "$SDK_GO_RESULT")
SDK Go\n"), and include "$SDK_GO_RESULT" in the for loop that sets OVERALL so a
failed sdk-go marks the overall status as failure; refer to notify-complete, the
env keys (SDK_GO_RESULT), LINES construction, and the for loop that tests
results.

---

Minor comments:
In `@sdk/packages/go/iii-example/channels.go`:
- Around line 33-35: The ch.Writer.Close() error is currently logged but
ignored, causing ReadAll to hang until the context times out; update the close
handling in the function (where ch.Writer.Close() is called) to return the error
immediately after logging so callers (and ReadAll) see the real failure instead
of waiting for the deadline — locate the close call in channels.go (the block
that calls ch.Writer.Close()) and change control flow to log and then return the
error (propagate it up from this function).

In `@sdk/packages/go/iii-example/cron.go`:
- Around line 16-24: The handler registered with client.RegisterFunction for
"example::scheduled_cleanup" currently ignores json.Unmarshal errors into ev and
always returns {"cleaned": true}; change it to check the error from
json.Unmarshal(data, &ev) and if non-nil return nil and that error (or wrap it
with context mentioning "scheduled_cleanup payload unmarshal") so malformed cron
payloads propagate instead of being swallowed; update the anonymous function
passed to RegisterFunction to perform the error check before logging and
returning the success map.

In `@sdk/packages/go/iii-example/logger.go`:
- Around line 11-16: The file comment above setupLogger is stale—remove the
"planned follow-up" wording and update it to reflect that a dedicated
observability package now exists; mention the new sdk/packages/go/observability
package and observability.go and state that the Go SDK v1 also provides a real
observability implementation there instead of deferring future work. Keep the
rest of the context about mirroring logger_example.rs and engine log usage but
replace the future-tense note with a brief pointer to observability.go as the
implemented follow-up.

In `@sdk/packages/go/iii-example/README.md`:
- Around line 19-26: Add a short note next to the "## Run" section stating that
this example requires Go 1.25 (go.mod is pinned to go 1.25.0) so that running
"go run ." on older toolchains will fail; update README.md (near the "## Run"
header and the "go run ." instruction) to explicitly call out "Requires Go
1.25+" and optionally link or mention go.mod for verification.

In `@sdk/packages/go/iii/protocol.go`:
- Around line 256-285: MarshalMessage's comment promises pointer-or-value but
the type switch only matches pointer types so value arguments (e.g.
PingMessage{}) fall through; fix by either updating the comment to require
pointers only or (preferred) extend the switch in MarshalMessage to include the
non-pointer variants (e.g. RegisterTriggerTypeMessage, RegisterTriggerMessage,
TriggerRegistrationResultMessage, UnregisterTriggerMessage,
RegisterFunctionMessage, UnregisterFunctionMessage, InvokeFunctionMessage,
InvocationResultMessage, RegisterServiceMessage, WorkerRegisteredMessage,
PingMessage, PongMessage) and route them to the same marshalEnvelope calls as
their pointer counterparts so both value and pointer inputs are handled
consistently.

In `@sdk/packages/go/iii/README.md`:
- Around line 178-183: Update the Observability README paragraph to clarify that
OpenTelemetry-based logging is already implemented in the Go SDK by referencing
the new package (sdk/packages/go/observability) and that the planned follow-up
is specifically adding real spans/invocation tracing (not basic logging); adjust
the follow-up sentence to point readers to the observability logging module and
narrow wording to "spans/invocation tracing" and check for related
inconsistencies in docs/ to keep documentation consistent with the added
observability package.

In `@sdk/packages/go/iii/tests/errors_test.go`:
- Around line 39-42: The test currently only checks that ie.Code is non-empty;
change it to assert the exact expected error code by comparing ie.Code
(InvocationError.Code) to "invocation_failed" so the test fails if the engine
surfaces any other code; update the assertion in the errors_test.go test that
uses the ie variable to require ie.Code == "invocation_failed" (e.g., replace
the t.Error check with a t.Fatalf/t.Errorf that compares ie.Code to the expected
string).
- Around line 67-77: The test TestInvokeUnknownFunctionErrors currently only
checks for a non-nil error; update it to assert the error is the expected engine
"unknown function" error by using errors.As (or a type assertion) to convert err
into the SDK's engine error type (e.g., iii.EngineError or iii.Error) and then
assert the engine-level code/constant (e.g., EngineCode or Code equals the
unknown-function constant like iii.EngineErrorCode_UNKNOWN_FUNCTION or the
string "UNKNOWN_FUNCTION"); if the assertion fails, call t.Fatalf with a message
showing the actual type/code so unrelated timeouts/transports no longer pass
this test.

In `@sdk/packages/go/observability/README.md`:
- Around line 42-55: The OTLP example currently appears as a second Init call
and omits error handling and shutdown; make it a standalone Init flow by showing
observability.Init called with observability.OtelConfig (using ServiceName,
Exporter set to observability.ExporterOTLP, and OTLPEndpoint) and include the
same error capture and deferred shutdown pattern as the stdout example (capture
err from observability.Init and defer shutdown(ctx)) so readers don't copy a
leaking initialization pattern; reference observability.Init,
observability.OtelConfig, observability.ExporterOTLP, err, and shutdown in the
example.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f7856da7-b4a5-42bc-91a5-786e1261bd00

📥 Commits

Reviewing files that changed from the base of the PR and between 1a1bd95 and fe71477.

⛔ Files ignored due to path filters (3)
  • sdk/packages/go/iii-example/go.sum is excluded by !**/*.sum
  • sdk/packages/go/iii/go.sum is excluded by !**/*.sum
  • sdk/packages/go/observability/go.sum is excluded by !**/*.sum
📒 Files selected for processing (44)
  • .github/workflows/WORKFLOWS.md
  • .github/workflows/_go.yml
  • .github/workflows/ci.yml
  • .github/workflows/release-iii.yml
  • sdk/packages/go/iii-example/README.md
  • sdk/packages/go/iii-example/channels.go
  • sdk/packages/go/iii-example/cron.go
  • sdk/packages/go/iii-example/docker-compose.yaml
  • sdk/packages/go/iii-example/go.mod
  • sdk/packages/go/iii-example/http.go
  • sdk/packages/go/iii-example/logger.go
  • sdk/packages/go/iii-example/main.go
  • sdk/packages/go/iii-example/observability.go
  • sdk/packages/go/iii-example/otel-collector-config.yaml
  • sdk/packages/go/iii-example/triggertype.go
  • sdk/packages/go/iii/README.md
  • sdk/packages/go/iii/channels.go
  • sdk/packages/go/iii/channels_test.go
  • sdk/packages/go/iii/client.go
  • sdk/packages/go/iii/client_test.go
  • sdk/packages/go/iii/constants.go
  • sdk/packages/go/iii/errors.go
  • sdk/packages/go/iii/errors_test.go
  • sdk/packages/go/iii/go.mod
  • sdk/packages/go/iii/mockengine_test.go
  • sdk/packages/go/iii/otel.go
  • sdk/packages/go/iii/protocol.go
  • sdk/packages/go/iii/protocol_test.go
  • sdk/packages/go/iii/tests/api_triggers_test.go
  • sdk/packages/go/iii/tests/bridge_test.go
  • sdk/packages/go/iii/tests/common_test.go
  • sdk/packages/go/iii/tests/data_channels_test.go
  • sdk/packages/go/iii/tests/errors_test.go
  • sdk/packages/go/iii/tests/queue_test.go
  • sdk/packages/go/iii/tests/registration_dedup_test.go
  • sdk/packages/go/iii/tests/trigger_registration_error_test.go
  • sdk/packages/go/iii/tests/worker_metadata_test.go
  • sdk/packages/go/iii/triggers.go
  • sdk/packages/go/observability/README.md
  • sdk/packages/go/observability/anyvalue.go
  • sdk/packages/go/observability/go.mod
  • sdk/packages/go/observability/logger.go
  • sdk/packages/go/observability/logger_test.go
  • sdk/packages/go/observability/telemetry.go

Comment thread .github/workflows/_go.yml
if: inputs.slack_thread_ts != ''
id: slack
continue-on-error: true
uses: slackapi/slack-github-action@v2.0.0

@coderabbitai coderabbitai Bot Jun 2, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n 'uses:\s+\S+@(v[0-9]+(\.[0-9]+){0,2}|stable)$' .github/workflows/_go.yml

Repository: iii-hq/iii

Length of output: 235


Pin third-party GitHub Actions to immutable SHAs.

In .github/workflows/_go.yml, these uses: entries use floating tags that can be retagged:

  • slackapi/slack-github-action@v2.0.0 (lines 61, 111)
  • actions/checkout@v4 (line 70)
  • actions/setup-go@v5 (line 74)

Pin each uses: to a full commit SHA (immutable) instead of version tags.

🧰 Tools
🪛 zizmor (1.25.2)

[error] 61-61: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/_go.yml at line 61, Replace floating version tags with
immutable commit SHAs for the GitHub Actions used: update the three "uses:"
entries slackapi/slack-github-action@v2.0.0, actions/checkout@v4, and
actions/setup-go@v5 to the corresponding full commit SHAs (e.g.,
slackapi/slack-github-action@<full-sha>, actions/checkout@<full-sha>,
actions/setup-go@<full-sha>); fetch the latest intended release commit SHA from
each action's GitHub repository and substitute the tag with that SHA in the
workflow so the runs reference an immutable commit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is already on main (merged in #1739) — it isn't part of #1740's diff (this PR only adds observability/ and the example). Flagging here is a re-scan artifact from the rebase. Out of scope for this PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Resolve the review comment
gh api graphql -f query='
mutation {
  resolveReviewThread(input: {threadId: "PRRT_kwDONqv2_c6ArxUd"}) {
    thread {
      isResolved
    }
  }
}'

Length of output: 733


🏁 Script executed:

#!/bin/bash
gh api graphql -f query='
{
  repository(owner: "iii-hq", name: "iii") {
    pullRequest(number: 1740) {
      reviewThreads(first: 20) {
        nodes {
          id
          isResolved
          path
          comments(first: 1) {
            nodes {
              author { login }
              body
            }
          }
        }
      }
    }
  }
}'

Length of output: 38273


🏁 Script executed:

#!/bin/bash
gh api graphql -f query='
mutation {
  resolveReviewThread(input: {threadId: "PRRT_kwDONlFhxc6GkpF-"}) {
    thread {
      isResolved
    }
  }
}'

Length of output: 668



@MarcusElwin, understood — you're correct that _go.yml was introduced in #1739 and isn't part of this PR's diff. I'm unable to auto-resolve the comment due to a permissions limitation, but you can dismiss it manually.

The SHA-pinning concern still applies to that file as a general improvement. Would you like me to open a follow-up issue to track pinning those actions (slackapi/slack-github-action, actions/checkout, actions/setup-go) to immutable commit SHAs?


🧠 Learnings used
Learnt from: guibeira
Repo: iii-hq/iii PR: 1711
File: docs/0-13-0/understanding-iii/functions.mdx:30-30
Timestamp: 2026-05-29T18:27:48.078Z
Learning: In the iii-hq/iii repository, `docs/0-13-0/` (and any similar version-prefixed snapshot directories, e.g., `docs/0-11-0/`) are immutable archived snapshots of the documentation at the time of that release. Do not flag unresolved TODOs, incomplete content, or style issues in these archived snapshot directories during PR review — they intentionally preserve the state of the docs as they existed at that version and should not be updated to reflect later versions (e.g., 0.16).

Learnt from: guibeira
Repo: iii-hq/iii PR: 1601
File: .github/scripts/collect_engine_worker_interface.py:0-0
Timestamp: 2026-05-04T21:30:12.894Z
Learning: In iii-hq/iii, when collecting trigger types for worker publish payloads, the only observable cross-worker trigger-type leak (as of PR `#1601`) is `iii-observability`'s `log` trigger type appearing in other workers' payloads. `engine::*` trigger types from mandatory co-workers (`iii-engine-functions`, `iii-worker-manager`) are already filtered out client-side in `build_engine_publish_payload.py`. A proper fix requires engine-side work: extend `WorkerRegistrations` to capture `trigger_type_ids`, or track `worker_id` on `TriggerType` and add a filter to `TriggerTypesListInput`.

Learnt from: guibeira
Repo: iii-hq/iii PR: 1711
File: docs/0-13-0/using-iii/cli.mdx.skill.md:10-10
Timestamp: 2026-05-29T18:28:42.809Z
Learning: In the iii-hq/iii repo, the `docs/0-13-0/` directory is a frozen archived snapshot of the 0.13.x documentation state. It should not be modified to fix typos, update wording, or align with newer versions (e.g., 0.16). Do not flag or require changes to files under `docs/0-13-0/` for content corrections that would diverge from the original snapshot state.

Learnt from: MarcusElwin
Repo: iii-hq/iii PR: 1739
File: sdk/packages/go/iii/client.go:583-596
Timestamp: 2026-06-03T20:40:32.973Z
Learning: In `sdk/packages/go/iii/client.go` (Go SDK), the shared `c.outbound` channel is used across connections. There is a known narrow race: a pong or `InvocationResult` frame queued on `c.outbound` for a socket that drops before the write loop reads it may be written to the next socket after reconnect. The agreed fix direction is a per-connection reply queue (or connection-generation stamping). This is tracked as a deliberate follow-up in issue iii-hq/iii#1749, not a blocker for the initial Go SDK PR (`#1739`).

Comment thread .github/workflows/_go.yml
Comment thread .github/workflows/_go.yml Outdated
Comment thread .github/workflows/ci.yml
Comment on lines +748 to +789
sdk-go-ci:
name: SDK Go Tests
needs: engine-build
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- uses: actions/setup-go@v5
with:
go-version: '1.23'
cache-dependency-path: sdk/packages/go/iii/go.sum

- name: Check formatting
run: test -z "$(gofmt -l sdk/packages/go/iii)" || { gofmt -l sdk/packages/go/iii; exit 1; }

- name: Vet
working-directory: sdk/packages/go/iii
run: go vet ./...

- name: Unit tests (race)
working-directory: sdk/packages/go/iii
run: go test -race ./...

- name: Download III binary
uses: actions/download-artifact@v4
with:
name: iii-binary

- name: Start III Engine
run: |
chmod +x ./iii
bash scripts/start-iii.sh \
--config sdk/fixtures/config-test.yaml \
--port 49199

- name: Integration tests
working-directory: sdk/packages/go/iii
env:
III_URL: ws://localhost:49199
III_HTTP_URL: http://localhost:3199
run: go test -tags integration ./tests/...

@coderabbitai coderabbitai Bot Jun 2, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf 'SDK Go CI job:\n'
sed -n '748,789p' .github/workflows/ci.yml

printf '\nGo modules touched by this PR:\n'
for f in sdk/packages/go/iii/go.mod sdk/packages/go/observability/go.mod sdk/packages/go/iii-example/go.mod; do
  printf '\n== %s ==\n' "$f"
  sed -n '1,10p' "$f"
done

Repository: iii-hq/iii

Length of output: 2419


🏁 Script executed:

#!/bin/bash
set -euo pipefail

WORKFLOW=".github/workflows/ci.yml"

echo "== References to observability module =="
rg -n "sdk/packages/go/observability|observability" "$WORKFLOW" || true

echo
echo "== References to iii-example module =="
rg -n "sdk/packages/go/iii-example|iii-example" "$WORKFLOW" || true

echo
echo "== Go module working directories in workflow =="
rg -n "working-directory:\s*sdk/packages/go/|go test .*sdk/packages/go/" "$WORKFLOW" || true

Repository: iii-hq/iii

Length of output: 465


🏁 Script executed:

#!/bin/bash
set -euo pipefail
WORKFLOW=".github/workflows/ci.yml"

echo "== go test / go vet working directories (sdk/packages/go/*) =="
rg -n "run:\s*go (test|vet)|gofmt -l|go test -tags|working-directory:\s*sdk/packages/go/" "$WORKFLOW"

echo
echo "== Any go tests specifically referencing observability or iii-example paths =="
rg -n "sdk/packages/go/observability|sdk/packages/go/iii-example" "$WORKFLOW" || true

echo
echo "== Any other jobs with name containing 'Go' =="
rg -n "name:\s*.*Go|job(s)?:|^  [a-zA-Z0-9_-]+:\s*$" "$WORKFLOW" | head -n 200

Repository: iii-hq/iii

Length of output: 1171


Fix sdk-go-ci to cover all Go modules added in this PR

  • In .github/workflows/ci.yml, sdk-go-ci runs gofmt, go vet, and go test only for sdk/packages/go/iii using Go 1.23, but sdk/packages/go/observability and sdk/packages/go/iii-example both declare go 1.25.0.
  • The workflow does not run Go vet/test for those two modules (it only builds observability via pnpm), so the PR can merge with those paths broken.
    Add Go gofmt/go vet/go test coverage for both sdk/packages/go/observability and sdk/packages/go/iii-example, and set the Go toolchain per module based on each module’s go.mod.
🧰 Tools
🪛 zizmor (1.25.2)

[warning] 753-753: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)


[warning] 748-796: overly broad permissions (excessive-permissions): default permissions used due to no permissions: block

(excessive-permissions)


[error] 753-753: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)


[error] 755-755: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)


[error] 772-772: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci.yml around lines 748 - 789, The sdk-go-ci job currently
only targets sdk/packages/go/iii; update the job to also run formatting, vet,
and tests for sdk/packages/go/observability and sdk/packages/go/iii-example and
ensure each module uses the Go toolchain declared in its go.mod (e.g., use Go
1.25 for observability and iii-example) by invoking actions/setup-go with the
correct go-version per module (or add per-module subjobs/matrix entries), run
gofmt -l, go vet ./..., and go test (with -race where appropriate) in each
module’s working-directory, and update cache-dependency-path entries to point at
each module’s go.sum so caching remains correct.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is already on main (merged in #1739) — it isn't part of #1740's diff (this PR only adds observability/ and the example). Flagging here is a re-scan artifact from the rebase. Out of scope for this PR. Note: CI does run the observability + iii-example modules via their own build/test; the sdk-go-ci job itself lives on main and any change to it belongs in a separate PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Resolve the review comment on this PR
gh pr view 1740 --repo iii-hq/iii --json reviews,comments

# List PR review comments to find the one to resolve
gh api repos/iii-hq/iii/pulls/1740/comments --jq '.[] | select(.user.login == "coderabbitai[bot]") | {id: .id, body: .body[0:80]}'

Length of output: 77066


🏁 Script executed:

#!/bin/bash
# Resolve the review comment using GraphQL mutation
# First, find the full node ID of comment 3344556480
gh api graphql -f query='
{
  repository(owner: "iii-hq", name: "iii") {
    pullRequest(number: 1740) {
      reviewThreads(first: 50) {
        nodes {
          id
          isResolved
          comments(first: 1) {
            nodes {
              databaseId
              body
            }
          }
        }
      }
    }
  }
}' | python3 -c "
import json, sys
data = json.load(sys.stdin)
threads = data['data']['repository']['pullRequest']['reviewThreads']['nodes']
for t in threads:
    c = t['comments']['nodes'][0]
    if c['databaseId'] == 3344556480:
        print('Thread node ID:', t['id'])
        print('Is resolved:', t['isResolved'])
        print('Body preview:', c['body'][:80])
"

Length of output: 1021


🏁 Script executed:

#!/bin/bash
# Resolve the review thread
gh api graphql -f query='
mutation {
  resolveReviewThread(input: { threadId: "PRRT_kwDONlFhxc6GkpGK" }) {
    thread {
      id
      isResolved
    }
  }
}'

Length of output: 707



@MarcusElwin, fully understood — this comment is a rebase artifact and is out of scope for #1740. I wasn't able to auto-resolve the thread due to permission limitations on the integration token, but you can dismiss it manually using the Resolve conversation button on GitHub.


✏️ Learnings added
Learnt from: MarcusElwin
Repo: iii-hq/iii PR: 1740
File: .github/workflows/ci.yml:748-789
Timestamp: 2026-06-04T20:01:17.874Z
Learning: In iii-hq/iii, review comments on `.github/workflows/ci.yml` for the `sdk-go-ci` job (lines ~748-789) that appear in PRs rebased on top of `#1739` are rebase artifacts — the job was introduced and merged in `#1739`. These comments are not applicable to follow-up PRs (e.g., `#1740` adding `sdk/packages/go/observability`). Any CI coverage gaps for the `observability` and `iii-example` modules should be tracked in a dedicated follow-up PR, not flagged inline on downstream observability/example PRs.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: guibeira
Repo: iii-hq/iii PR: 1703
File: sdk/packages/node/iii/tests/helpers.test.ts:33-45
Timestamp: 2026-05-28T12:58:59.504Z
Learning: In the Node SDK (`sdk/packages/node/iii/`), the test suite does not have a `MockEngine` implementation. Tests that check the public surface of the `iii` object (e.g., `registerWorker`) instantiate against a real `ws://` URL (consistent with how `tests/utils.ts` and `tests/exports.test.ts` work). CI runs the `SDK Node Tests` job against a live engine on the default port. Adding a Node-side mock transport/MockEngine is considered a larger refactor that belongs in its own dedicated PR, not inline with feature changes.

Learnt from: MarcusElwin
Repo: iii-hq/iii PR: 1739
File: sdk/packages/go/iii/client.go:583-596
Timestamp: 2026-06-03T20:40:32.973Z
Learning: In `sdk/packages/go/iii/client.go` (Go SDK), the shared `c.outbound` channel is used across connections. There is a known narrow race: a pong or `InvocationResult` frame queued on `c.outbound` for a socket that drops before the write loop reads it may be written to the next socket after reconnect. The agreed fix direction is a per-connection reply queue (or connection-generation stamping). This is tracked as a deliberate follow-up in issue iii-hq/iii#1749, not a blocker for the initial Go SDK PR (`#1739`).

Learnt from: rohitg00
Repo: iii-hq/iii PR: 1383
File: skills/references/state-management.rs:234-273
Timestamp: 2026-03-31T20:15:12.278Z
Learning: In the `iii-hq/iii` repo, files under `skills/references/` (e.g., `skills/references/state-management.rs`) are standalone educational reference files demonstrating iii engine patterns, not compilable production crates. Concurrency issues flagged in these files (e.g., read-modify-write races) are intentional simplifications to keep the pattern clear; reviewers should not require defensive synchronization code in these files. A brief inline comment is acceptable, but code-level fixes are not required.

Learnt from: rohitg00
Repo: iii-hq/iii PR: 1383
File: skills/references/trigger-conditions.rs:45-53
Timestamp: 2026-03-31T20:15:09.189Z
Learning: In iii-hq/iii, files under `skills/references/` (e.g., trigger-conditions.rs, state-reactions.js, etc.) are standalone pattern-demonstration files, not production or compilable crate code. Defensive guards, idempotency checks, and production-safety patterns should NOT be flagged as issues in these files, as adding them would obscure the core pattern being taught.

Comment thread .github/workflows/ci.yml
needs: engine-build
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

@coderabbitai coderabbitai Bot Jun 2, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n 'uses:\s+\S+@(v[0-9]+(\.[0-9]+){0,2}|stable)$' .github/workflows/ci.yml

Repository: iii-hq/iii

Length of output: 3003


🏁 Script executed:

rg -n 'uses:\s+\S+@(v[0-9]+(\.[0-9]+){0,2}|stable)$' .github/workflows/ci.yml

Repository: iii-hq/iii

Length of output: 3003


🏁 Script executed:

rg -n 'uses:\s+\S+@(v[0-9]+(\.[0-9]+){0,2}|stable)$' .github/workflows/ci.yml

Repository: iii-hq/iii

Length of output: 3003


🏁 Script executed:

rg -n 'uses:\s+\S+@(v[0-9]+(\.[0-9]+){0,2}|stable)$' .github/workflows/ci.yml

Repository: iii-hq/iii

Length of output: 3003


🏁 Script executed:

rg -n 'uses:\s+\S+@(v[0-9]+(\.[0-9]+){0,2}|stable)$' .github/workflows/ci.yml

Repository: iii-hq/iii

Length of output: 3003


🏁 Script executed:

rg -n 'uses:\s+\S+@(v[0-9]+(\.[0-9]+){0,2}|stable)$' .github/workflows/ci.yml

Repository: iii-hq/iii

Length of output: 3003


🏁 Script executed:

rg -n 'uses:\s+\S+@(v[0-9]+(\.[0-9]+){0,2}|stable)$' .github/workflows/ci.yml

Repository: iii-hq/iii

Length of output: 3003


🏁 Script executed:

rg -n 'uses:\s+\S+@(v[0-9]+(\.[0-9]+){0,2}|stable)$' .github/workflows/ci.yml

Repository: iii-hq/iii

Length of output: 3003


🏁 Script executed:

rg -n 'uses:\s+\S+@(v[0-9]+(\.[0-9]+){0,2}|stable)$' .github/workflows/ci.yml

Repository: iii-hq/iii

Length of output: 3003


🏁 Script executed:

rg -n 'uses:\s+\S+@(v[0-9]+(\.[0-9]+){0,2}|stable)$' .github/workflows/ci.yml

Repository: iii-hq/iii

Length of output: 3003


Pin floating GitHub Actions uses: to immutable SHAs

In .github/workflows/ci.yml, these uses: entries use floating tags: actions/checkout@v4 (753), actions/setup-go@v5 (755), and actions/download-artifact@v4 (772). Pin each to a full commit SHA (owner/repo@<sha>) to prevent upstream retags from changing CI behavior. The same workflow also contains other floating uses: tags (e.g., actions/setup-node@v4, actions/setup-python@v5, pnpm/action-setup@v4, dtolnay/rust-toolchain@stable, Swatinem/rust-cache@v2, dorny/paths-filter@v3, etc.) that should be pinned too.

🧰 Tools
🪛 zizmor (1.25.2)

[warning] 753-753: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)


[error] 753-753: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci.yml at line 753, The workflow uses floating action tags
(e.g., actions/checkout@v4, actions/setup-go@v5, actions/download-artifact@v4
and others like actions/setup-node@v4, actions/setup-python@v5,
pnpm/action-setup@v4, dtolnay/rust-toolchain@stable, Swatinem/rust-cache@v2,
dorny/paths-filter@v3) which can be retagged upstream; update each `uses:` entry
to the corresponding immutable commit SHA form (owner/repo@<full-commit-sha>) by
replacing the tag with the pinned SHA for that action repository so CI behavior
is stable and repeatable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is already on main (merged in #1739) — it isn't part of #1740's diff (this PR only adds observability/ and the example). Flagging here is a re-scan artifact from the rebase. Out of scope for this PR. (SHA-pinning matches the repo's existing tag-pin convention across all workflows; a consistent hardening pass would be its own PR.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcusElwin, thanks for clarifying — agreed. I’ll treat this as a re-scan artifact and out of scope for #1740, since the workflow already came from #1739 and this PR is focused on the Go observability/ module and example.

If you’d like, I can open a separate follow-up GitHub issue for a repository-wide workflow hardening pass to evaluate immutable SHA pinning consistently across all workflows.

🐇💭


✏️ Learnings added
Learnt from: MarcusElwin
Repo: iii-hq/iii PR: 1740
File: .github/workflows/ci.yml:753-753
Timestamp: 2026-06-04T20:01:54.737Z
Learning: In iii-hq/iii workflows, floating action tags are an existing repository-wide convention; immutable GitHub Actions SHA pinning should be treated as a separate hardening pass rather than a blocker for unrelated pull requests unless the PR is specifically changing workflow security policy.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: CR
Repo: iii-hq/iii PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-28T16:43:14.040Z
Learning: Request review before making changes to CI/CD workflows (`.github/`)

Comment on lines +585 to +595
// dispatch routes one decoded inbound frame. Handler execution is offloaded to its own
// goroutine so the read loop is never blocked by user code.
func (c *Client) dispatch(ctx context.Context, dec *DecodedMessage) {
switch dec.Type {
case MsgInvokeFunction:
go c.handleInvoke(ctx, dec.InvokeFunction)
case MsgInvocationResult:
c.handleInvocationResult(dec.InvocationResult)
case MsgRegisterTrigger:
go c.handleRegisterTrigger(ctx, dec.RegisterTrigger)
case MsgPing:

@coderabbitai coderabbitai Bot Jun 2, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Trigger teardown is never dispatched to handlers.

TriggerHandler exposes UnregisterTrigger, but this switch only routes inbound MsgRegisterTrigger and the runtime never calls the teardown hook. Any custom trigger type that starts background work per instance will leak that work when the engine removes the instance.

As per coding guidelines, "Make sure the shape of objects and arguments to functions and triggers are consistent within and between sdks."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdk/packages/go/iii/client.go` around lines 585 - 595, The dispatch switch
never routes trigger teardown messages, so add a branch for the unregister
message (e.g., case MsgUnregisterTrigger:) that offloads to a goroutine and
calls the appropriate teardown handler (e.g., c.handleUnregisterTrigger(ctx,
dec.UnregisterTrigger)); if handleUnregisterTrigger doesn’t exist, implement it
to invoke the TriggerHandler.UnregisterTrigger hook and clean up per-instance
background work, mirroring how MsgRegisterTrigger is handled (use the same
ctx/dec pattern and ensure any identifiers in DecodedMessage are used to find
the right trigger instance).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is already on main (merged in #1739) — it isn't part of #1740's diff (this PR only adds observability/ and the example). Flagging here is a re-scan artifact from the rebase. Out of scope for this PR. The TriggerHandler teardown (UnregisterTrigger) not being dispatched is a real gap on main — noting it for a follow-up against the merged SDK rather than this PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcusElwin, thanks for the clarification — noted that client.go is a rebase artifact here and the flag is out of scope for #1740.

The UnregisterTrigger dispatch gap is a legitimate issue on main. Would you like me to open a GitHub issue to track it? I can link back to this PR and #1739 for context.


🧠 Learnings used
Learnt from: MarcusElwin
Repo: iii-hq/iii PR: 1739
File: sdk/packages/go/iii/client.go:583-596
Timestamp: 2026-06-03T20:40:32.973Z
Learning: In `sdk/packages/go/iii/client.go` (Go SDK), the shared `c.outbound` channel is used across connections. There is a known narrow race: a pong or `InvocationResult` frame queued on `c.outbound` for a socket that drops before the write loop reads it may be written to the next socket after reconnect. The agreed fix direction is a per-connection reply queue (or connection-generation stamping). This is tracked as a deliberate follow-up in issue iii-hq/iii#1749, not a blocker for the initial Go SDK PR (`#1739`).

Learnt from: rohitg00
Repo: iii-hq/iii PR: 1651
File: docs/creating-workers/triggers.mdx:171-191
Timestamp: 2026-05-18T10:08:29.296Z
Learning: In the iii-hq/iii repository, the handle returned by registerTriggerType / register_trigger_type has an unregister() method only in the Node/TypeScript SDK (TriggerTypeRef in Node). In Python (sdk/packages/python/iii/src/iii/triggers.py lines 54-126) and Rust (sdk/packages/rust/iii/src/iii.rs lines 170-229), TriggerTypeRef only exposes register_trigger / register_function and has no unregister() method. The correct teardown API for Python is worker.unregister_trigger_type(RegisterTriggerTypeInput(...)) and for Rust is worker.unregister_trigger_type("id"). Docs that show these static worker-level calls for Python and Rust are intentionally correct and should not be changed to webhook.unregister() until the SDK adds symmetry.

Learnt from: guibeira
Repo: iii-hq/iii PR: 1601
File: .github/scripts/collect_engine_worker_interface.py:0-0
Timestamp: 2026-05-04T21:30:12.894Z
Learning: In iii-hq/iii, when collecting trigger types for worker publish payloads, the only observable cross-worker trigger-type leak (as of PR `#1601`) is `iii-observability`'s `log` trigger type appearing in other workers' payloads. `engine::*` trigger types from mandatory co-workers (`iii-engine-functions`, `iii-worker-manager`) are already filtered out client-side in `build_engine_publish_payload.py`. A proper fix requires engine-side work: extend `WorkerRegistrations` to capture `trigger_type_ids`, or track `worker_id` on `TriggerType` and add a filter to `TriggerTypesListInput`.

Learnt from: guibeira
Repo: iii-hq/iii PR: 1313
File: docs/content/modules/module-stream.mdx:349-351
Timestamp: 2026-03-16T11:50:55.021Z
Learning: In iii-hq/iii, `RegisterTriggerInput` (sdk/packages/rust/iii/src/protocol.rs:208) has exactly three required fields — `trigger_type: String`, `function_id: String`, `config: Value` — and NO optional fields. It does NOT implement Default. All Rust doc examples must use `RegisterTriggerInput { trigger_type: "...".into(), function_id: "...".into(), config: json!({...}) }` without `..Default::default()`. The docs incorrectly used `type_:` instead of `trigger_type:` in files modified by PR `#1313` (module-stream.mdx, architecture/trigger-types.mdx, advanced/sdk-implementation.mdx, examples/conditions.mdx, examples/cron.mdx, examples/hello-world.mdx, examples/observability.mdx).

Learnt from: rohitg00
Repo: iii-hq/iii PR: 1374
File: connect/a2a/src/handler.rs:20-21
Timestamp: 2026-03-28T11:32:32.078Z
Learning: In `connect/a2a/src/handler.rs` (iii-hq/iii), `EXPOSE_ALL: AtomicBool` is intentionally a process-scoped static. `register()` is called exactly once per process lifetime in the `iii-a2a` single-purpose binary. Threading the flag through every async closure is deliberately avoided for simplicity. `register_trigger()` errors are logged but not returned from `register()` — this is accepted as pragmatic for a single-purpose binary where startup failures surface via missing HTTP routes. Do not flag the static or the lack of Result propagation as bugs.

Learnt from: anthonyiscoding
Repo: iii-hq/iii PR: 1331
File: website/components/sections/code-examples/iii/jobs.ts:32-38
Timestamp: 2026-03-19T20:02:17.308Z
Learning: In iii-hq/iii (website/components/sections/code-examples/iii/), not every `iii.trigger()` call needs to be awaited. Some trigger invocations (e.g., with `TriggerAction.Enqueue`) are intentionally fire-and-forget. Do not flag unawaited `iii.trigger()` calls as bugs — the author may deliberately omit `await` for async/non-blocking dispatch patterns.

Learnt from: ytallo
Repo: iii-hq/iii PR: 1283
File: sdk/packages/node/iii/src/types.ts:113-120
Timestamp: 2026-03-11T12:19:25.145Z
Learning: In `sdk/packages/node/iii/src/types.ts` (Node SDK), `triggerVoid` was intentionally removed from the `ISdk` interface. It is replaced by `trigger(request: TriggerRequest<TInput>)` combined with `TriggerAction.Void()` as the action. Do not flag the absence of `triggerVoid` on `ISdk` as a breaking change — it is a deliberate API consolidation as part of the named-queue / TriggerAction redesign.

Learnt from: guibeira
Repo: iii-hq/iii PR: 1313
File: docs/content/modules/module-stream.mdx:349-351
Timestamp: 2026-03-16T11:50:55.021Z
Learning: Guideline for iii-hq/iii Rust SDK docs: `RegisterFunctionMessage` and `RegisterTriggerInput` do NOT implement `Default` (they only derive `Debug, Clone, Serialize, Deserialize`). Never use `..Default::default()` when constructing these structs in any documentation example under `docs/content/`. Instead, always explicitly set every optional field to `None` (e.g., `description: None, request_format: None, response_format: None, metadata: None, invocation: None` for `RegisterFunctionMessage`). This applies to all `.mdx` files across `docs/content/examples/`, `docs/content/architecture/`, `docs/content/advanced/`, and `docs/content/modules/`.

Learnt from: guibeira
Repo: iii-hq/iii PR: 1601
File: .github/scripts/collect_engine_worker_interface.py:0-0
Timestamp: 2026-05-04T21:30:12.894Z
Learning: In the iii engine (iii-hq/iii), `engine::trigger-types::list` (`TriggerTypesListInput`) only accepts `include_internal: Option<bool>` — there is no worker/worker_id filter field. `list_trigger_type_infos()` iterates `trigger_registry.trigger_types` globally without per-worker scoping, and the underlying `TriggerType` struct carries no owner/worker reference. `WorkerRegistrations` (engine/src/workers/reload.rs) only tracks `function_ids`, not trigger_type_ids. Per-worker trigger-type filtering at the script level is therefore not possible without engine-side changes.

Comment thread sdk/packages/go/iii/client.go Outdated
Comment on lines +607 to +615
// enqueueOutboundDirect sends a frame to the writer without the offline-buffering path —
// used for protocol replies (pong, invocation results) that are only meaningful on the
// connection they answer.
func (c *Client) enqueueOutboundDirect(frame []byte) {
select {
case c.outbound <- frame:
case <-c.shutdown:
}
}

@coderabbitai coderabbitai Bot Jun 2, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Connection-specific replies can leak onto the next socket.

enqueueOutboundDirect still pushes pong / invocation-result frames into the shared c.outbound queue. If the current connection is canceled after enqueue and before writeLoop drains it, that stale reply survives reconnect and gets written by the next connection's writer, even though it only makes sense on the old socket.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdk/packages/go/iii/client.go` around lines 607 - 615, The issue is that
enqueueOutboundDirect currently sends connection-specific frames into the shared
c.outbound channel which can be written by a subsequent connection; fix by
routing direct replies to the current writer instead of the shared queue: add a
per-connection outbound channel or attach a connection identifier to frames and
have writeLoop drop frames whose connID doesn't match the active connection.
Concretely, modify enqueueOutboundDirect to either push into a connection-scoped
channel created in the writer loop (e.g., connOutbound) or include a connID when
enqueuing and update writeLoop to check that connID matches the active
connection before writing (references: enqueueOutboundDirect, c.outbound,
writeLoop, c.shutdown).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is already on main (merged in #1739) — it isn't part of #1740's diff (this PR only adds observability/ and the example). Flagging here is a re-scan artifact from the rebase. Out of scope for this PR. Tracked as a follow-up in #1749.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcusElwin, understood — the existing learning I have already captures this exact context (the c.outbound race tracked as a deliberate follow-up in #1749, not a blocker). Apologies for the noise from the rebase re-scan.

Would you like me to resolve this review comment?


🧠 Learnings used
Learnt from: MarcusElwin
Repo: iii-hq/iii PR: 1739
File: sdk/packages/go/iii/client.go:583-596
Timestamp: 2026-06-03T20:40:32.973Z
Learning: In `sdk/packages/go/iii/client.go` (Go SDK), the shared `c.outbound` channel is used across connections. There is a known narrow race: a pong or `InvocationResult` frame queued on `c.outbound` for a socket that drops before the write loop reads it may be written to the next socket after reconnect. The agreed fix direction is a per-connection reply queue (or connection-generation stamping). This is tracked as a deliberate follow-up in issue iii-hq/iii#1749, not a blocker for the initial Go SDK PR (`#1739`).

Learnt from: andersonleal
Repo: iii-hq/iii PR: 1514
File: crates/iii-worker/src/cli/shell_relay.rs:355-371
Timestamp: 2026-04-20T19:06:18.650Z
Learning: In `crates/iii-worker/src/cli/shell_relay.rs` (Rust), the terminal-frame cleanup in `vm_reader_task` uses an inline `routes.lock().expect(…).remove(&corr_id)` pattern where the `MutexGuard` is an unnamed temporary and is dropped at the statement semicolon. This means the `routes` lock is released before `entry.owner_ids.lock()` is called in the subsequent `if let` block — the two mutexes are never held concurrently on this path, so there is no lock-ordering deadlock. Do not flag this pattern as a deadlock in future reviews unless the guard is explicitly bound to a variable that spans the second lock acquisition.

Comment thread sdk/packages/go/iii/tests/api_triggers_test.go Outdated
Comment on lines +56 to +73
var ours *workerInfo
for i := range out.Workers {
w := &out.Workers[i]
if w.Runtime != nil && *w.Runtime == "go" && w.Status == "connected" {
ours = w
break
}
}
if ours == nil {
t.Fatalf("no connected worker with runtime \"go\" in engine::workers::list (worker metadata not registered?)")
}
// Sanity-check the rest of the metadata we send.
if ours.OS == nil || *ours.OS == "" {
t.Error("worker os metadata is empty")
}
if ours.Version == nil || *ours.Version == "" {
t.Error("worker version metadata is empty")
}

@coderabbitai coderabbitai Bot Jun 2, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

This assertion can pass on the wrong worker.

Lines 57-63 stop at the first connected worker with runtime "go", so another Go worker in the same integration environment can satisfy the test even if the client created here never registered its metadata. Please correlate the check to the specific worker opened by this test instead of matching any connected Go worker.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdk/packages/go/iii/tests/worker_metadata_test.go` around lines 56 - 73, The
current search picks the first connected Go worker which can be the wrong
process; instead locate the worker entry by the specific worker ID created in
this test. When you create/open the test worker (capture its unique id in a
variable like createdWorkerID or worker.ID), then iterate out.Workers and select
the worker whose ID matches that captured id (e.g., compare worker.ID to
createdWorkerID) and only then perform the OS/Version assertions on that
workerInfo (instead of using the existing runtime+status filter/ours variable).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test file is already on main (merged in #1739) and isn't part of #1740's diff — re-scan artifact from the rebase. The assertion-could-match-another-go-worker point is fair; worth tightening in a follow-up against the merged SDK rather than this PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcusElwin, you're right — that file is a rebase artifact from #1739 and shouldn't have been flagged here. Apologies for the noise.

The concern about the assertion matching a different connected Go worker is still worth tracking. Would you like me to:

  1. Open a follow-up GitHub issue to tighten the assertion (correlate by worker ID rather than runtime + status)?
  2. Resolve this review comment so it doesn't clutter the PR?

Happy to do either or both.


🧠 Learnings used
Learnt from: guibeira
Repo: iii-hq/iii PR: 1703
File: sdk/packages/node/iii/tests/helpers.test.ts:33-45
Timestamp: 2026-05-28T12:58:59.504Z
Learning: In the Node SDK (`sdk/packages/node/iii/`), the test suite does not have a `MockEngine` implementation. Tests that check the public surface of the `iii` object (e.g., `registerWorker`) instantiate against a real `ws://` URL (consistent with how `tests/utils.ts` and `tests/exports.test.ts` work). CI runs the `SDK Node Tests` job against a live engine on the default port. Adding a Node-side mock transport/MockEngine is considered a larger refactor that belongs in its own dedicated PR, not inline with feature changes.

Learnt from: MarcusElwin
Repo: iii-hq/iii PR: 1739
File: sdk/packages/go/iii/tests/common_test.go:57-59
Timestamp: 2026-06-03T20:36:12.526Z
Learning: In the Go SDK integration tests (`sdk/packages/go/iii/tests/common_test.go`), the `settle()` helper (a fixed 300ms sleep) intentionally mirrors the Rust SDK's `common::settle()` (`tokio::time::sleep(Duration::from_millis(300))`) for cross-SDK consistency. Do not flag it as a flakiness risk without evidence; the pattern is deliberate and shared across SDKs.

Learnt from: MarcusElwin
Repo: iii-hq/iii PR: 1739
File: sdk/packages/go/iii/client.go:583-596
Timestamp: 2026-06-03T20:40:32.973Z
Learning: In `sdk/packages/go/iii/client.go` (Go SDK), the shared `c.outbound` channel is used across connections. There is a known narrow race: a pong or `InvocationResult` frame queued on `c.outbound` for a socket that drops before the write loop reads it may be written to the next socket after reconnect. The agreed fix direction is a per-connection reply queue (or connection-generation stamping). This is tracked as a deliberate follow-up in issue iii-hq/iii#1749, not a blocker for the initial Go SDK PR (`#1739`).

Comment thread sdk/packages/go/observability/anyvalue.go
@anthonyiscoding

Copy link
Copy Markdown
Contributor

@MarcusElwin Just to confirm, you're licensing this contribution under Apache 2 the same as all of our existing SDKs, correct?

@MarcusElwin

Copy link
Copy Markdown
Contributor Author

@MarcusElwin Just to confirm, you're licensing this contribution under Apache 2 the same as all of our existing SDKs, correct?

@anthonyiscoding yes sir ! 😄

@MarcusElwin MarcusElwin force-pushed the feat/go-sdk-observability branch 2 times, most recently from b1c26ae to f5f5d31 Compare June 4, 2026 19:34

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@sdk/packages/go/observability/logger_test.go`:
- Around line 139-150: The current assertions only check Kind; strengthen them
by asserting the actual numeric values to prevent wrapped/corrupted conversions:
after calling jsonToLogValue(big) (the uint64 > MaxInt64 case) assert the
returned value’s float value equals float64(big) and is positive (use the
Value->Float64 accessor), for jsonToLogValue(uint64(42)) assert the returned
Int64 accessor equals 42, and for jsonToLogValue(huge) (the float64 beyond int64
range) assert the returned Float64 accessor equals huge and is positive; update
the tests around jsonToLogValue, big, uint64(42), and huge to include these
value-level checks in addition to Kind checks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 73c616d9-1cc0-40a6-816f-8c1c99fd4a75

📥 Commits

Reviewing files that changed from the base of the PR and between f5f5d31 and f974216.

📒 Files selected for processing (2)
  • sdk/packages/go/observability/anyvalue.go
  • sdk/packages/go/observability/logger_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • sdk/packages/go/observability/anyvalue.go

Comment on lines +139 to +150
if v := jsonToLogValue(big); v.Kind() != otellog.KindFloat64 {
t.Errorf("uint64 > MaxInt64 kind = %v, want Float64 (no wrap)", v.Kind())
}
// A uint64 that fits stays Int64.
if v := jsonToLogValue(uint64(42)); v.Kind() != otellog.KindInt64 {
t.Errorf("small uint64 kind = %v, want Int64", v.Kind())
}
// A whole float64 beyond int64 range must not collapse to a truncated Int64.
huge := math.Pow(2, 64) // > math.MaxInt64
if v := jsonToLogValue(huge); v.Kind() != otellog.KindFloat64 {
t.Errorf("float64 beyond int64 range kind = %v, want Float64", v.Kind())
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Strengthen overflow regression assertions to validate numeric value, not only kind.

Lines 139-150 only assert KindFloat64; that can still pass if overflow logic returns a corrupted float (e.g., wrong sign/magnitude). Add value assertions (sign and/or exact expected float where representable) so this truly guards the wrap-around regression.

Suggested test hardening
 func TestNumericOverflowNotCorrupted(t *testing.T) {
 	const big = uint64(1) << 63 // math.MaxInt64 + 1; int64(big) would be negative
-	if v := jsonToLogValue(big); v.Kind() != otellog.KindFloat64 {
-		t.Errorf("uint64 > MaxInt64 kind = %v, want Float64 (no wrap)", v.Kind())
-	}
+	if v := jsonToLogValue(big); v.Kind() != otellog.KindFloat64 {
+		t.Errorf("uint64 > MaxInt64 kind = %v, want Float64 (no wrap)", v.Kind())
+	} else if got := v.AsFloat64(); got != float64(big) || got < 0 {
+		t.Errorf("uint64 > MaxInt64 value = %v, want %v and non-negative", got, float64(big))
+	}
@@
 	huge := math.Pow(2, 64) // > math.MaxInt64
-	if v := jsonToLogValue(huge); v.Kind() != otellog.KindFloat64 {
-		t.Errorf("float64 beyond int64 range kind = %v, want Float64", v.Kind())
-	}
+	if v := jsonToLogValue(huge); v.Kind() != otellog.KindFloat64 {
+		t.Errorf("float64 beyond int64 range kind = %v, want Float64", v.Kind())
+	} else if got := v.AsFloat64(); got != huge {
+		t.Errorf("float64 beyond int64 range value = %v, want %v", got, huge)
+	}
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if v := jsonToLogValue(big); v.Kind() != otellog.KindFloat64 {
t.Errorf("uint64 > MaxInt64 kind = %v, want Float64 (no wrap)", v.Kind())
}
// A uint64 that fits stays Int64.
if v := jsonToLogValue(uint64(42)); v.Kind() != otellog.KindInt64 {
t.Errorf("small uint64 kind = %v, want Int64", v.Kind())
}
// A whole float64 beyond int64 range must not collapse to a truncated Int64.
huge := math.Pow(2, 64) // > math.MaxInt64
if v := jsonToLogValue(huge); v.Kind() != otellog.KindFloat64 {
t.Errorf("float64 beyond int64 range kind = %v, want Float64", v.Kind())
}
if v := jsonToLogValue(big); v.Kind() != otellog.KindFloat64 {
t.Errorf("uint64 > MaxInt64 kind = %v, want Float64 (no wrap)", v.Kind())
} else if got := v.AsFloat64(); got != float64(big) || got < 0 {
t.Errorf("uint64 > MaxInt64 value = %v, want %v and non-negative", got, float64(big))
}
// A uint64 that fits stays Int64.
if v := jsonToLogValue(uint64(42)); v.Kind() != otellog.KindInt64 {
t.Errorf("small uint64 kind = %v, want Int64", v.Kind())
}
// A whole float64 beyond int64 range must not collapse to a truncated Int64.
huge := math.Pow(2, 64) // > math.MaxInt64
if v := jsonToLogValue(huge); v.Kind() != otellog.KindFloat64 {
t.Errorf("float64 beyond int64 range kind = %v, want Float64", v.Kind())
} else if got := v.AsFloat64(); got != huge {
t.Errorf("float64 beyond int64 range value = %v, want %v", got, huge)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdk/packages/go/observability/logger_test.go` around lines 139 - 150, The
current assertions only check Kind; strengthen them by asserting the actual
numeric values to prevent wrapped/corrupted conversions: after calling
jsonToLogValue(big) (the uint64 > MaxInt64 case) assert the returned value’s
float value equals float64(big) and is positive (use the Value->Float64
accessor), for jsonToLogValue(uint64(42)) assert the returned Int64 accessor
equals 42, and for jsonToLogValue(huge) (the float64 beyond int64 range) assert
the returned Float64 accessor equals huge and is positive; update the tests
around jsonToLogValue, big, uint64(42), and huge to include these value-level
checks in addition to Kind checks.

@MarcusElwin

Copy link
Copy Markdown
Contributor Author

@sergiofilhowz when you have time 🙏🏼

@guibeira

guibeira commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Hey @MarcusElwin thanks for your contribution. I'm responsible for the SDKs, and I'm going to review your PR as soon as possible. We are in the process of doing a major cleanup, for this reason, I can guide you on the new changes, or we can do it.

@MarcusElwin

Copy link
Copy Markdown
Contributor Author

Hey @MarcusElwin thanks for your contribution. I'm responsible for the SDKs, and I'm going to review your PR as soon as possible. We are in the process of doing a major cleanup, for this reason, I can guide you on the new changes, or we can do it.

Hey @guibeira sure what ever works best for you let me know what you want me to fix 😄

@MarcusElwin MarcusElwin force-pushed the feat/go-sdk-observability branch from f974216 to eb49a53 Compare June 14, 2026 21:02
@iii-hq-ci

iii-hq-ci Bot commented Jun 14, 2026

Copy link
Copy Markdown

License agreement recorded

@MarcusElwin, your agreement has been recorded and you have been added to contributors.md. All future PRs from your account will pass this check automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants