You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pgduck_server: add RECEIVE protocol for streaming COPY-IN sink
Adds a TCP listener and a new "RECEIVE" query prefix to pgduck_server.
With these in place, a remote client can stream CSV (or other data)
to a server-local sink path via the standard libpq COPY-IN protocol,
and pgduck runs a deferred query reading from that path once
CopyDone arrives.
Motivation:
The existing pgduck_server design assumes the client and pgduck share
a filesystem — the standard pg_lake "sidecar" deployment colocates
postgres and pgduck_server in the same pod and lets pg_lake's bulk-
write paths drop CSV under $PGDATA/pgsql_tmp for pgduck to read with
read_csv(). That topology breaks under deployment models that don't
co-locate the two processes (operator-managed Kubernetes deployments
where postgres and pgduck run in separate pods, multi-host setups,
etc.). The streaming-write paths in the companion pg_lake patch lean
on this RECEIVE protocol to push bytes directly to pgduck via libpq
without needing a shared filesystem.
What this patch adds:
- TCP listener on pgduck_server controlled by --listen-addresses /
--port (default Unix-socket path remains supported). Lets remote
postgres backends reach pgduck over the network.
- A "RECEIVE <inner-query>" query prefix recognized by
process_query_message. When it sees "RECEIVE …", the server:
1. Picks a server-local sink path (under --recv-dir).
2. Substitutes the bare token "@@PGLAKE_RECV_SINK@@" inside the
inner query with a properly-quoted SQL literal containing
the sink path. The server adds the surrounding single quotes
and escapes any embedded single quotes; the placeholder
itself is intentionally NOT inside a SQL string literal in
the client-emitted query, so it can never collide with user-
supplied data.
3. Sends CopyInResponse to the client and accepts CopyData
chunks, writing them straight to the sink.
4. On CopyDone, runs the deferred inner query against the sink
path and streams its result rows back via the standard
PGresult flow.
This lets clients use the existing libpq COPY-IN flow as the
transport for arbitrary inner queries that read from a path.
- recv_sink module: opens, writes, and cleans up the per-client sink
files; bounded by --recv-max-bytes; refuses path traversal.
- SSL-negotiate-byte handling: pgduck_server replies 'N' to libpq's
SSLRequest instead of closing the connection. Lets clients with
sslmode=prefer fall back to plaintext cleanly.
- Explicit pgsession_flush() after pgsession_send_copy_in_response.
Without this, the 5-byte CopyInResponse can sit in pgduck's send
buffer indefinitely (no auto-flush on small messages); the client
blocks forever in PQputCopyData waiting for the server to be ready,
and the RECEIVE handshake deadlocks. Found while debugging exactly
that ~4-hour hang during smoke runs.
Compatibility:
- Existing simple SELECT and COPY paths are unchanged.
- Unix-socket transport remains supported when --listen-addresses is
not set; nothing forces TCP.
- No new dependencies. The recv_sink uses the same memory and I/O
primitives the rest of pgsession.c uses.
- The "RECEIVE" prefix is only recognized in the simple-query path
(because it needs the server-side prefix detection that the
extended-query protocol doesn't offer). Clients using
PQsendQueryParams keep using the existing path.
Signed-off-by: Tim McLaughlin <tim@gotab.io>
Copy file name to clipboardExpand all lines: pgduck_server/src/command_line/command_line.c
+20-2Lines changed: 20 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -57,7 +57,8 @@ print_usage()
57
57
printf(" --unix_socket_directory <path> Specify the unix socket directory, default is %s\n", DEFAULT_UNIX_DOMAIN_PATH);
58
58
printf(" --unix_socket_group <group name> Specify the unix socket group owner, default is \"%s\"\n", DEFAULT_UNIX_DOMAIN_GROUP);
59
59
printf(" --unix_socket_permissions <mask> Specify the unix socket (chmod) permissions, default is %o\n", DEFAULT_UNIX_DOMAIN_PERMISSIONS);
60
-
printf(" --port <port> Specify the port number, default is %d\n", DEFAULT_PORT);
60
+
printf(" --listen_addresses <addrs> Comma-separated TCP addresses to listen on (e.g. \"0.0.0.0,::\"). Default: empty (Unix socket only)\n");
61
+
printf(" --port <port> Specify the port number (used for both Unix socket suffix and TCP listener), default is %d\n", DEFAULT_PORT);
61
62
printf(" --max_clients <max_clients> Specify the maximum allowed clients, default is %d\n", DEFAULT_MAX_CLIENTS);
62
63
printf(" --memory_limit=<memory_limit> Optionally specify the maximum memory of pgduck_server similar to DuckDB's memory_limit, the default is 80 percent of the system memory\n");
63
64
printf(" --continue_on_oom If out of memory error occurs, continue operating\n");
@@ -66,6 +67,7 @@ print_usage()
66
67
printf(" --check_cli_params_only Only check the cli arguments, do not run the server\n");
67
68
printf(" --init_file_path <path> Execute all statements in this file on start-up\n");
68
69
printf(" --cache_dir Specify the directory to use to cache remote files (from S3)\n");
70
+
printf(" --recv_dir <path> Base dir for RECEIVE-query landing files; default <cache_dir>/recv or /tmp/pgduck_recv\n");
69
71
printf(" --extensions_dir <path> Install and load extensions in the specified directory\n");
70
72
printf(" --pidfile <path> Write the pid of this program to the given path\n");
0 commit comments