Open
Conversation
When mydumper runs with --stream NO_STREAM or NO_STREAM_AND_NO_DELETE, it writes files to disk and outputs markers to stdout. This new mode reads those markers from stdin, performs search-and-replace on the data files in-place, and optionally forwards the marker stream to stdout for piping to myloader via --forward. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Member
Author
|
Let's add a benchmarking test here to compare stream out vs file in-place replacement |
The containsRegex.Match() call on every line was expensive for the common case of literal string replacements. bytes.Contains is much cheaper and produces identical results since replacement "from" values are always literal byte sequences, not regex patterns. Also makes runStreamMode testable by accepting an io.Reader parameter instead of reading os.Stdin directly, and adds benchmarks with cached test data generation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Stream mode now processes files concurrently using a worker pool. Defaults to runtime.NumCPU() workers, configurable via --workers flag. Workers pick files from a channel as markers arrive on stdin, so multiple files can be processed simultaneously on multi-core systems. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously, markers were forwarded to stdout immediately when read from stdin, before the worker finished processing the file. This could cause downstream consumers (myloader) to read a file still being modified. Now workers send markers to a forwarding goroutine only after processFileInPlace completes successfully. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extend dataFileRegex to match both .sql and .dat extensions so files produced by mydumper's LOAD_DATA format are also processed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The forwarding goroutine now emits lines in the original stdin order by waiting on per-item done channels. Workers still process files in parallel, but the forwarder blocks until each item (in order) is complete before writing to stdout. Non-data lines are forwarded immediately via a pre-closed channel. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--streamflag (acceptsNO_STREAMorNO_STREAM_AND_NO_DELETE) that reads mydumper marker lines from stdin and performs search-and-replace on the already-written data files in-place--forwardflag to re-emit stdin lines to stdout after processing, enabling piping to myloaderrunSplitMode()with no behavior changesUsage
Test plan
go vetandgo buildpass--forwardre-emits markers to stdout--streamvalues are rejected🤖 Generated with Claude Code