Skip to content

salesforce: introduce inputs and align sink output#4355

Merged
mmatczuk merged 12 commits intomainfrom
mmt/sf_input_v2
May 4, 2026
Merged

salesforce: introduce inputs and align sink output#4355
mmatczuk merged 12 commits intomainfrom
mmt/sf_input_v2

Conversation

@mmatczuk
Copy link
Copy Markdown
Contributor

@mmatczuk mmatczuk commented Apr 28, 2026

Summary

  • Adds Salesforce input family salesforce (SOQL one-shot), salesforce_cdc (multi-topic Pub/Sub: CDC, firehose, Platform Events, with REST snapshot phase and per-topic replay-ID checkpoints), and salesforce_graphql (with Bloblang-evaluated variables).
  • Extracts a Subscription type from salesforcegrpc.Client so one Client can own multiple subscriptions sharing a single connection, auth, and schema cache.
  • Reworks the salesforce_sink output to use the shared authFieldSpecs() / httpFieldSpec() helpers from config.go. Breaking config rename: restapi_versionapi_version; flat request_timeout and max_retries move under the nested http namespace.
  • Adds salesforcehttp helpers (SObjectPath, variables-aware GraphQL methods) and a RPCN_Test__e Platform Event definition for the SFDX scratch org used by integration tests.
image

@mmatczuk mmatczuk changed the title salesforce: introduce v2 inputs and align sink output salesforce: introduce inputs and align sink output Apr 28, 2026
@mmatczuk mmatczuk force-pushed the mmt/sf_input_v2 branch 2 times, most recently from 6c73578 to 6885768 Compare April 28, 2026 13:47
Comment on lines +1009 to +1042
func (e *salesforceCDCInputExecutor) saveState(ctx context.Context) error {
e.mu.Lock()
state := persistState{
SnapshotComplete: e.snapshotComplete,
RestCursor: e.restCursor,
Topics: make(TopicReplays, len(e.topicReplays)),
}
maps.Copy(state.Topics, e.topicReplays)
prev := e.lastPersisted
e.mu.Unlock()

b, err := json.Marshal(state)
if err != nil {
return fmt.Errorf("marshal checkpoint state: %w", err)
}
if bytes.Equal(prev, b) {
return nil
}

var setErr error
if err := e.mgr.AccessCache(ctx, e.conf.Checkpoint.Cache, func(cache service.Cache) {
setErr = cache.Set(ctx, e.conf.Checkpoint.CacheKey, b, nil)
}); err != nil {
return fmt.Errorf("access cache %q: %w", e.conf.Checkpoint.Cache, err)
}
if setErr != nil {
return fmt.Errorf("set cache key %q: %w", e.conf.Checkpoint.CacheKey, setErr)
}

e.mu.Lock()
e.lastPersisted = b
e.mu.Unlock()
return nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

saveState releases e.mu between snapshotting in-memory state and writing to the cache. When multiple per-topic ack callbacks run concurrently (which they will — every topic goroutine calls this independently), two concurrent calls can:

  1. G1 reads state S1, releases lock, starts cache.Set(b1).
  2. G2 reads state S2 (newer, contains an additional topic's replay), releases lock, starts cache.Set(b2).
  3. The cache writes can complete in either order. If b1 lands last, the cache holds older state than memory.
  4. After the writes, G1 sets lastPersisted = b1 and G2 sets lastPersisted = b2. End state: lastPersisted = b2 (newer) but cache = b1 (older).
  5. On the next saveState call where the in-memory state still equals S2, bytes.Equal(prev, b) returns true and the write is skipped — so the cache permanently lags memory until the in-memory state changes again.

The result is regressed replay-IDs in the persisted checkpoint: on restart a topic loses progress and replays already-acked events. To make this safe, hold e.mu for the duration of the cache write, or serialize saveState calls through a single writer goroutine.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduces state type and variable

Use a different Salesforce input instead if:

- You need continuous change events — use xref:components:inputs/salesforce_cdc.adoc[` + "`salesforce_cdc`" + `].
- You need Platform Events or channel subscriptions — use xref:components:inputs/salesforce_platform_events.adoc[` + "`salesforce_platform_events`" + `].
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This xref points to salesforce_platform_events.adoc, but no salesforce_platform_events component is added in this PR (only salesforce, salesforce_cdc, and salesforce_graphql are registered). The reference is propagated into the generated docs/modules/components/pages/inputs/salesforce.adoc and renders as a broken AsciiDoc link. Either drop the bullet or point readers at salesforce_cdc (which already handles /event/... Platform Event topics).


- You only need single-object SELECTs — use xref:components:inputs/salesforce.adoc[` + "`salesforce`" + `] (simpler, no GraphQL schema knowledge needed).
- You need continuous change events — use xref:components:inputs/salesforce_cdc.adoc[` + "`salesforce_cdc`" + `].
- You need Platform Events or channels — use xref:components:inputs/salesforce_platform_events.adoc[` + "`salesforce_platform_events`" + `].
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as in input_salesforce.go: this xref points at salesforce_platform_events.adoc for a component that does not exist in this PR. The reference flows into docs/modules/components/pages/inputs/salesforce_graphql.adoc and renders as a broken AsciiDoc link. Replace the pointer with salesforce_cdc (which subscribes to /event/... Platform Event topics) or remove the bullet.

Comment thread internal/impl/salesforce/README.md Outdated
./app/setup.sh > /dev/null
```

When `SALESFORCE_CLIENT_SECRET` is set, `setup.sh` probes the `client_credentials` OAuth flow end-to-end. `OAuth: OK` on stderr means the integration tests (`internal/impl/salesforce/processor_salesforce_integration_test.go`) can run against this org. Any other response is printed verbatim and the script exits non-zero.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence references internal/impl/salesforce/processor_salesforce_integration_test.go, but the processor (and all its tests) is removed in this PR. The integration tests now live in input_salesforce_integration_test.go, input_salesforce_cdc_integration_test.go, and input_salesforce_graphql_integration_test.go. Update the file reference so the README doesn't point at a deleted file.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 28, 2026

Commits
LGTM

Review
Three new Salesforce inputs (salesforce, salesforce_cdc, salesforce_graphql), a refactored salesforce_sink, and the supporting salesforcegrpc.Subscription extraction all landed cleanly. Component registration, info.csv, public wrapper, and bundle imports are wired correctly; license headers and shared config helpers are consistent. Minor doc/reference issues only.

  1. internal/impl/salesforce/input_salesforce.go:57 and internal/impl/salesforce/input_salesforce_graphql.go:54 xref a salesforce_platform_events component that is not added in this PR, producing broken links in the generated AsciiDoc.
  2. internal/impl/salesforce/README.md:39 references the deleted processor_salesforce_integration_test.go; should point at the new input_salesforce*_integration_test.go files.

Comment thread internal/plugins/info.csv Outdated
Comment on lines +244 to +246
salesforce ,input ,Salesforce ,0.0.0 ,enterprise ,n ,n ,n
salesforce_cdc ,input ,Salesforce ,0.0.0 ,enterprise ,n ,n ,n
salesforce_graphql ,input ,Salesforce ,0.0.0 ,enterprise ,n ,y ,y
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cloud / cloud_with_gpu columns are inconsistent across the new salesforce inputs: salesforce and salesforce_cdc are n,n but salesforce_graphql is y,y. salesforce_cdc reasonably differs because it depends on a checkpoint_cache resource, but salesforce and salesforce_graphql are both stateless one-shot extracts that share the same auth/HTTP plumbing — they should match. Either both should be cloud-eligible or neither should. Worth verifying which is intended; the schema filter uses these flags to gate availability in the cloud distribution.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 28, 2026

Commits

  1. The final commit Update docs only modifies salesforce-scoped docs (docs/modules/components/pages/inputs/salesforce*.adoc, docs/modules/components/pages/outputs/salesforce_sink.adoc), so per the system: message rule it should be salesforce: update docs rather than the sentence-case form (which is reserved for repo-wide changes like Bump to Go 1.26).

Review

One inline issue: the cloud flags for the new salesforce inputs in internal/plugins/info.csv look inconsistent — salesforce and salesforce_cdc are n,n while salesforce_graphql is y,y. salesforce and salesforce_graphql are both stateless one-shot extracts using the same auth/HTTP plumbing, so the asymmetry is suspicious. See

salesforce ,input ,Salesforce ,0.0.0 ,enterprise ,n ,n ,n
salesforce_cdc ,input ,Salesforce ,0.0.0 ,enterprise ,n ,n ,n
salesforce_graphql ,input ,Salesforce ,0.0.0 ,enterprise ,n ,y ,y
salesforce_sink ,output ,Salesforce ,4.0.0 ,enterprise ,n ,y ,y
.

Comment thread internal/plugins/info.csv Outdated
ristretto ,cache ,Ristretto ,0.0.0 ,community ,n ,y ,y
salesforce ,processor ,Salesforce ,4.0.0 ,enterprise ,n ,y ,y
salesforce ,input ,Salesforce ,0.0.0 ,enterprise ,n ,n ,n
salesforce_cdc ,input ,Salesforce ,0.0.0 ,enterprise ,n ,n ,n
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please enable all 3 for cloud

mmatczuk added 12 commits May 4, 2026 14:48
Adds setup.sh (and Metadata API descriptors) under app/ that deploys
a Connected App via sf CLI and prints SALESFORCE_ORG_URL and
SALESFORCE_CLIENT_ID. SALESFORCE_CLIENT_SECRET is not exposed by the
Salesforce API and must be copied from the UI.

Updates the README with the automated flow and splits manual steps
into their own section.
GetNextBatchParallel can return a non-empty batch together with done=true
when the last SObject's final page arrives in the same call. The snapshot
completion path discarded that batch and returned nil, silently dropping
records — most visible with rest_objects scoped to a single SObject small
enough to finish in one parallel-fetch cycle.
…tp.Client

Adds GraphQLWithVariables and GraphQLQueryPageWithVariables so callers can
forward GraphQL variables alongside the query. The original GraphQL and
GraphQLQueryPage methods now delegate to the new variants, preserving
backward compatibility.
… Client

Introduces Events(), WaitReady(ctx), and a subscribeSettleDelay-gated ready
channel so callers can wait for the Pub/Sub subscription to be fully
routed server-side before publishing events. Without the settle delay,
events emitted immediately after Send returns are silently dropped on
ReplayPreset_LATEST because Salesforce's server-side subscription
routing lags the gRPC-level stream acknowledgement.
…e subscriptions

Splits subscription stream management out of salesforcegrpc.Client into a
standalone Subscription type. A single Client can now own multiple
Subscriptions sharing one connection, auth, and schema cache, instead of
being limited to one subscribe stream per Client.

Also moves Avro decoding helpers to avro.go and shared types to types.go.
Introduces salesforce input family:

- salesforce: extracts the SOQL builder into soql.go with tests and
  pulls SalesforceInputConfig parsing into a reusable struct.
- salesforce_graphql: extracts the config struct and introduces
  Bloblang-evaluated variables for the GraphQL query.
- salesforce_cdc: supports multiple topics on one Pub/Sub gRPC client
  (one Subscription per topic), accepts CDC sObject shorthand, full
  CDC paths, the /data/ChangeEvents firehose, and /event/* Platform
  Events. Introduces a snapshot-then-stream phase backed by REST and
  a durable per-topic replay-ID checkpoint (TopicReplays).

Adds integration tests and a RPCN_Test__e Platform Event definition
under app/objects for the SFDX scratch org used by the test suite.
Replaces the four hand-rolled auth fields and the flat HTTP transport
fields with the shared helpers from config.go:

- authFieldSpecs() / NewAuthConfigFromParsed for org_url, client_id,
  client_secret, api_version.
- httpFieldSpec() / newHTTPConfigFromParsed for the nested http
  namespace; the inline httpclient.Config literal is gone.

Breaking config changes (output not yet released):

- restapi_version → api_version
- request_timeout (flat, 30s) removed; configure under http.timeout
  (default 5s from httpclient).
- max_retries (flat, 10) removed; configure under http.retries
  (default 3 from httpclient).
@mmatczuk mmatczuk force-pushed the mmt/sf_input_v2 branch from 9ff8765 to efa773c Compare May 4, 2026 12:57
@claude
Copy link
Copy Markdown

claude Bot commented May 4, 2026

Commits
LGTM

Review
salesforce input family (salesforce, salesforce_cdc, salesforce_graphql) introduced cleanly alongside salesforce_sink rework to share auth/HTTP config helpers. httpclient.Fields/FieldsWithBaseURL split is propagated to all existing callers. Subscription extraction in salesforcegrpc enables multi-topic CDC over a single gRPC connection.

LGTM

@mmatczuk mmatczuk merged commit 9ae58ec into main May 4, 2026
7 checks passed
@mmatczuk mmatczuk deleted the mmt/sf_input_v2 branch May 4, 2026 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants