Skip to content

refactor!: Centralize KVS value semantics in @crawlee/core#3774

Merged
janbuchar merged 14 commits into
v4from
centralize-kvs-value-semantics
Jun 24, 2026
Merged

refactor!: Centralize KVS value semantics in @crawlee/core#3774
janbuchar merged 14 commits into
v4from
centralize-kvs-value-semantics

Conversation

@janbuchar

Copy link
Copy Markdown
Contributor

KeyValueStore now owns serialize/parse; the storage clients (backends) are pure byte transports.

  • New key_value_store_codec.ts (serializeValue + parseValue); removed maybeStringify.
  • Frontend getValue/iterator parse, setValue serializes.
  • Memory client no longer serializes/parses; deleted body-parser.ts.

Behavior-preserving. Read contract changes atomically across core + memory. fs-storage/apify clients deferred to #3762. Tests green.

@janbuchar janbuchar added the t-tooling Issues with this label are in the ownership of the tooling team. label Jun 22, 2026
@janbuchar janbuchar requested review from B4nan and barjin June 22, 2026 15:43

@barjin barjin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @janbuchar, I like the idea of this PR 👍

See some issues I encountered while testing below ⬇️

Comment thread packages/core/src/storages/key_value_store_codec.ts
Comment thread packages/core/src/storages/key_value_store.ts Outdated
@janbuchar janbuchar requested a review from barjin June 23, 2026 08:58
@janbuchar janbuchar self-assigned this Jun 23, 2026
Comment thread test/core/storages/key_value_store.test.ts

@barjin barjin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my last comment, but otherwise lgtm. Thanks!

Comment thread packages/core/src/storages/key_value_store_codec.ts Outdated

@apify-service-account apify-service-account left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Staff review — KVS value-semantics centralization

Verdict (no linked issue; judged against the PR's stated goal): The refactor cleanly delivers what it describes for the core + memory pair — serializeValue/parseValue own the codec, the memory client becomes a pure byte transport, isBuffer/isStream are centralized, and the new round-trip tests are thorough. The one material concern is that the read contract change is global (it runs for every backend) while only the memory client was migrated, so the default FileSystemStorageClient double-parses. This is fine only if the deferred #3762 lands together before any v4 release; merged alone, this branch regresses the default backend's JSON reads. Worth an explicit gate.

# Location Issue Severity
1 key_value_store.ts:203,313 Frontend parses on read for all backends; default fs-storage still parses internally → double-parse throws on JSON records High
2 key_value_store.ts:242 getRecord declares value: Buffer via unchecked cast; untrue for non-memory backends Low
3 utils/general.ts:201 isStream loosened to pipe OR pipeTo (was on AND pipe); plain { pipe } now misclassifies Nit
4 key_value_store_codec.ts:109 isomorphicBufferToString ignores charset for ArrayBuffer input (always UTF-8) Nit

Inline comments below. Submitted as a non-blocking COMMENT review — a human makes the final call.

Comment thread packages/core/src/storages/key_value_store.ts Outdated
Comment thread packages/core/src/storages/key_value_store.ts Outdated
Comment thread packages/utils/src/internals/general.ts
Comment thread packages/core/src/storages/key_value_store_codec.ts
Finish the fs-storage migration and clean up the edge cases the review dug up:

- fs-storage setValue is now a pure byte transport: dropped the leftover
  content-type inference and JSON.stringify so it can't double-serialize and
  matches memory-storage's behavior.
- isStream no longer waves through plain { pipe } ducks; the pipe branch now
  also requires async-iterability, so fakes fail validation instead of blowing
  up mid-drain with a cryptic TypeError.
- getValue keeps a stored literal null instead of silently swapping it for the
  default value; only a genuinely absent record falls back now.
- isomorphicBufferToString honors the charset for ArrayBuffer input in Node
  instead of always decoding as UTF-8 (browsers stay UTF-8-only).
- Both clients map mime.contentType's false return to undefined so the frontend
  sees 'no content type' rather than a bogus value.
- Added a static KeyValueStore.getRecord to match the instance method, and gave
  the raw record its own named KeyValueStoreRawRecord type.
CI caught two direct-client setValue calls that relied on the old client-side
serialization/inference, which the byte-transport refactor removed:

- async-iteration.test.ts passed a raw object with no content type, which now
  hits toBuffer and throws (TypeError: Received undefined). Serialize it and
  pass application/json, mirroring the memory-storage test.
- write-metadata.test.ts wrote a bare string with no content type and waited
  for foo.txt; without inference the client defaults to octet-stream -> foo.bin,
  so the wait timed out at 60s. Pass text/plain so the extension stays txt.
@janbuchar janbuchar merged commit 13976b3 into v4 Jun 24, 2026
7 checks passed
@janbuchar janbuchar deleted the centralize-kvs-value-semantics branch June 24, 2026 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-tooling Issues with this label are in the ownership of the tooling team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants