refactor!: Centralize KVS value semantics in @crawlee/core#3774
Conversation
barjin
left a comment
There was a problem hiding this comment.
Thank you, @janbuchar, I like the idea of this PR 👍
See some issues I encountered while testing below ⬇️
barjin
left a comment
There was a problem hiding this comment.
Please see my last comment, but otherwise lgtm. Thanks!
apify-service-account
left a comment
There was a problem hiding this comment.
Staff review — KVS value-semantics centralization
Verdict (no linked issue; judged against the PR's stated goal): The refactor cleanly delivers what it describes for the core + memory pair — serializeValue/parseValue own the codec, the memory client becomes a pure byte transport, isBuffer/isStream are centralized, and the new round-trip tests are thorough. The one material concern is that the read contract change is global (it runs for every backend) while only the memory client was migrated, so the default FileSystemStorageClient double-parses. This is fine only if the deferred #3762 lands together before any v4 release; merged alone, this branch regresses the default backend's JSON reads. Worth an explicit gate.
| # | Location | Issue | Severity |
|---|---|---|---|
| 1 | key_value_store.ts:203,313 | Frontend parses on read for all backends; default fs-storage still parses internally → double-parse throws on JSON records | High |
| 2 | key_value_store.ts:242 | getRecord declares value: Buffer via unchecked cast; untrue for non-memory backends |
Low |
| 3 | utils/general.ts:201 | isStream loosened to pipe OR pipeTo (was on AND pipe); plain { pipe } now misclassifies |
Nit |
| 4 | key_value_store_codec.ts:109 | isomorphicBufferToString ignores charset for ArrayBuffer input (always UTF-8) |
Nit |
Inline comments below. Submitted as a non-blocking COMMENT review — a human makes the final call.
Finish the fs-storage migration and clean up the edge cases the review dug up:
- fs-storage setValue is now a pure byte transport: dropped the leftover
content-type inference and JSON.stringify so it can't double-serialize and
matches memory-storage's behavior.
- isStream no longer waves through plain { pipe } ducks; the pipe branch now
also requires async-iterability, so fakes fail validation instead of blowing
up mid-drain with a cryptic TypeError.
- getValue keeps a stored literal null instead of silently swapping it for the
default value; only a genuinely absent record falls back now.
- isomorphicBufferToString honors the charset for ArrayBuffer input in Node
instead of always decoding as UTF-8 (browsers stay UTF-8-only).
- Both clients map mime.contentType's false return to undefined so the frontend
sees 'no content type' rather than a bogus value.
- Added a static KeyValueStore.getRecord to match the instance method, and gave
the raw record its own named KeyValueStoreRawRecord type.
CI caught two direct-client setValue calls that relied on the old client-side serialization/inference, which the byte-transport refactor removed: - async-iteration.test.ts passed a raw object with no content type, which now hits toBuffer and throws (TypeError: Received undefined). Serialize it and pass application/json, mirroring the memory-storage test. - write-metadata.test.ts wrote a bare string with no content type and waited for foo.txt; without inference the client defaults to octet-stream -> foo.bin, so the wait timed out at 60s. Pass text/plain so the extension stays txt.
KeyValueStorenow owns serialize/parse; the storage clients (backends) are pure byte transports.key_value_store_codec.ts(serializeValue+parseValue); removedmaybeStringify.getValue/iterator parse,setValueserializes.body-parser.ts.Behavior-preserving. Read contract changes atomically across core + memory. fs-storage/apify clients deferred to #3762. Tests green.