feat(csharp): optimize CloudFetch defaults for JDBC parity (~32% throughput)#377
feat(csharp): optimize CloudFetch defaults for JDBC parity (~32% throughput)#377msrathore-db wants to merge 2 commits intomainfrom
Conversation
f9fe389 to
fa6da7b
Compare
📊 CloudFetch Benchmark Results.NET 8.0
.NET Framework 4.7.2
🟢 Improvement | 🔴 Regression | ⚪ No change Format:
Metrics:
|
fa6da7b to
fea5d6d
Compare
📊 CloudFetch Benchmark Results.NET 8.0
.NET Framework 4.7.2
🟢 Improvement | 🔴 Regression | ⚪ No change Format:
Metrics:
|
fea5d6d to
4de0e50
Compare
📊 CloudFetch Benchmark Results.NET 8.0
.NET Framework 4.7.2
🟢 Improvement | 🔴 Regression | ⚪ No change Format:
Metrics:
|
| @@ -43,6 +43,13 @@ public static void Main(string[] args) | |||
| // Enable TLS 1.2/1.3 for .NET Framework 4.7.2 (required for modern HTTPS endpoints) | |||
| ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12 | SecurityProtocolType.Tls11 | (SecurityProtocolType)3072; // 3072 = Tls13 | |||
There was a problem hiding this comment.
Not related to the current change, but it would be better to do something like ServicePointManager.SecurityProtocol = ServicePointManager.SecurityProtocol | SecurityProtocolType.Tls12 | (SecurityProtocolType)3072; // 3072 = Tls13 in case the host has deliberately disabled TLS 1.1.
There was a problem hiding this comment.
Derp! this is in benchmark code. Please ignore.
There was a problem hiding this comment.
Yeah please ignore. This is some experimentation on my end.
4de0e50 to
9be0cb2
Compare
📊 CloudFetch Benchmark Results.NET 8.0
.NET Framework 4.7.2
🟢 Improvement | 🔴 Regression | ⚪ No change Format:
Metrics:
|
9be0cb2 to
332e0ca
Compare
60d41f6 to
5221e42
Compare
5221e42 to
3ff0e84
Compare
📊 CloudFetch Benchmark Results.NET 8.0
.NET Framework 4.7.2
🟢 Improvement | 🔴 Regression | ⚪ No change Format:
Metrics:
|
816bfe2 to
efe9c92
Compare
📊 CloudFetch Benchmark Results.NET 8.0
.NET Framework 4.7.2
🟢 Improvement | 🔴 Regression | ⚪ No change Format:
Metrics:
|
efe9c92 to
0094324
Compare
📊 CloudFetch Benchmark Results.NET 8.0
.NET Framework 4.7.2
🟢 Improvement | 🔴 Regression | ⚪ No change Format:
Metrics:
|
…provement) Tune CloudFetch pipeline defaults and architecture to match JDBC driver: Pipeline optimizations: - Pre-parse Arrow IPC on download threads (JDBC parity): moves Arrow deserialization from the single reader thread to the 16 download threads. Reader now iterates pre-parsed RecordBatch objects (pure memory access) instead of parsing Arrow IPC on-the-fly. This is the key architectural change — it enables the sliding window to work because the reader can consume chunks faster than downloads complete. - Replace memory polling (Task.Delay 10ms) with async SemaphoreSlim signaling for instant wakeup on memory release. Default tuning: - Increase ParallelDownloads from 3 to 16 (JDBC uses 16 threads) - Increase PrefetchCount from 2 to 16 (resultQueue 4→32, implicit sliding window) - Increase MemoryBufferSizeMB from 200 to 400 (supports higher parallelism) - Add LinkPrefetchWindowSize=128 (downloadQueue capacity, matches JDBC) Benchmark (catalog_sales SF10, 14.4M rows, 34 columns): Baseline: ~57K rows/sec (avg), best 70K Optimized: ~76K rows/sec (avg), best 94K (~40% improvement) Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
Summary
LinkPrefetchWindow)Task.Delay(10ms)loop) with asyncSemaphoreSlimsignaling — eliminates 10ms latency floor and ThreadPool thread blockingcatalog_sales_sf10to benchmark-queries.json (14.4M rows, 34 columns)QuickValidationtool for quick correctness + timing checks (uses env vars for credentials)Benchmark Results
Query:
SELECT * FROM main.tpcds_sf10_delta.catalog_sales(14.4M rows, 34 columns)Parameter Sweep
Selected Config (16/16/400) vs Baseline
Key Design Decisions
resultQueue = PrefetchCount x 2 = 32 is the implicit sliding window. Too small (4) starves downloads. Too big (64) causes GC pressure. 32 is the sweet spot.
downloadQueue = LinkPrefetchWindowSize = 128 lets the fetcher run far ahead (lightweight URL metadata only), ensuring downloads never starve waiting for links.
Async memory signaling via
SemaphoreSliminstead ofTask.Delay(10ms)polling — truly async (no ThreadPool thread blocked), with instant wakeup onReleaseMemory().Test plan
catalog_returnsSF10 (1.4M rows): row count = 1,439,882 verifiedcatalog_salesSF10 (14.4M rows): row count = 14,400,425 verifiedbenchmarklabelThis pull request was AI-assisted by Isaac.