[finelog] Durable writes, async server, and catalog/append refactors#5777
[finelog] Durable writes, async server, and catalog/append refactors#5777rjpower wants to merge 3 commits into
Conversation
|
Claude finished @rjpower's task in 7m 18s —— View job PR Review complete
Verdict: No issues found. Checked for bugs and CLAUDE.md/AGENTS.md compliance. Highlights of what I verified:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9bea281eac
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ns = self.catalog.require_live(name) | ||
| schema = ns.schema | ||
| aligned = validate_and_align_batch(batch, schema) | ||
| ns.append_aligned_batch(aligned) |
There was a problem hiding this comment.
Synchronize schema updates with write_rows append
This split write_rows path now snapshots ns.schema, validates, and appends later without any shared lock with register_table/update_schema, so a concurrent additive schema evolution can race the append and cause rows to be stamped with an older Arrow schema, silently dropping values for newly added columns instead of persisting them. The issue appears when register_table updates a live namespace while writes are in flight: validation succeeds against one schema version, but append_aligned_batch may project with another.
Useful? React with 👍 / 👎.
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
PushLogs and WriteRows now block until the rows land in an L0 parquet segment so callers never observe success on data that a crash could lose. Each namespace exposes a max_persisted_seq cursor advanced by the bg flush thread; the now-async ASGI handlers poll it with asyncio.sleep so the wait parks a coroutine instead of a threadpool worker. Default flush interval drops from 60s to 5s and every append wakes the bg loop, so the typical persistence wait is well under one second. Adds a SlowRpcInterceptor (delegates to rigging.log_setup.slow_log) wired into both LogService and StatsService so query/write_rows that exceed the threshold are logged once. Also includes pre-existing branch tweaks: drop the Docker --cpus cap so finelog isn't subject to CFS bandwidth parking spikes, and tests for the ConnectionPool query-timeout watchdog.
|
🤖 Specification (this PR is >500 LOC, per the pull-request skill) Problem
Approach
Key code Cursor advance in # log_namespace.py
try:
self._write_new_segment(visible)
except Exception:
self._buffers.restore_flush()
return
# Atomic int write; service handlers polling max_persisted_seq() will
# observe this on their next tick.
self._max_persisted_seq = visible.max_seqAsync waiter: # server/persistence_wait.py
async def await_persisted(log_store, namespace, target_seq, *, timeout=30.0,
poll_interval=0.05):
if target_seq < 0:
return
deadline = time.monotonic() + timeout
while log_store.max_persisted_seq(namespace) < target_seq:
if time.monotonic() >= deadline:
raise ConnectError(Code.DEADLINE_EXCEEDED, ...)
await asyncio.sleep(poll_interval)Wake-driven flush in # log_namespace.py
woken_by_append = self._wake.wait(timeout=min(self._flush_rl.time_until_next(), 1.0))
self._wake.clear()
writer_pending = woken_by_append and chunk_count > 0
if force_drain or writer_pending:
self._flush_step()Tests
Total: 244 tests pass in ~3s in |
Make WriteRows/PushLogs block until rows are persisted to an L0 parquet segment via a max_persisted_seq cursor polled on the event loop; the server moves to ASGI handlers and drops the flush interval to 5s. Bundles the prior catalog refactor (live namespace registry lifted into Catalog, leaf types extracted to break the catalog/namespace import cycle) and append-path slim down (AlignedBatch replaces RecordBatch repack). Adds a SlowRpcInterceptor on top of rigging.slow_log.