Skip to content

feat(csharp): optimize CloudFetch defaults for JDBC parity (~32% throughput)#377

Draft
msrathore-db wants to merge 2 commits intomainfrom
cloudfetch-parity
Draft

feat(csharp): optimize CloudFetch defaults for JDBC parity (~32% throughput)#377
msrathore-db wants to merge 2 commits intomainfrom
cloudfetch-parity

Conversation

@msrathore-db
Copy link
Copy Markdown
Collaborator

Summary

  • Increase ParallelDownloads from 3 → 16 (matches JDBC's 16-thread pool)
  • Increase PrefetchCount from 2 → 16 (resultQueue capacity 4 → 32, the implicit sliding window)
  • Increase MemoryBufferSizeMB from 200 → 400 (supports higher parallelism without starving)
  • Add LinkPrefetchWindowSize = 128 (downloadQueue capacity, matches JDBC's LinkPrefetchWindow)
  • Replace memory polling (Task.Delay(10ms) loop) with async SemaphoreSlim signaling — eliminates 10ms latency floor and ThreadPool thread blocking
  • Add catalog_sales_sf10 to benchmark-queries.json (14.4M rows, 34 columns)
  • Add QuickValidation tool for quick correctness + timing checks (uses env vars for credentials)

Benchmark Results

Query: SELECT * FROM main.tpcds_sf10_delta.catalog_sales (14.4M rows, 34 columns)

Parameter Sweep

Config HTTP Prefetch (rQ) Memory Rows/sec
Baseline 3 2 (rQ=4) 200MB ~57K
16/16/400 (this PR) 16 16 (rQ=32) 400MB ~75K
16/32/400 16 32 (rQ=64) 400MB ~55K
16/16/200 16 16 (rQ=32) 200MB ~73K
10/16/400 10 16 (rQ=32) 400MB ~83K
16/20/400 16 20 (rQ=40) 400MB ~45K

Selected Config (16/16/400) vs Baseline

Metric Baseline This PR Delta
Avg rows/sec ~57,000 ~75,000 +32%
Best single run 70,128 92,977 +33%

Key Design Decisions

  1. resultQueue = PrefetchCount x 2 = 32 is the implicit sliding window. Too small (4) starves downloads. Too big (64) causes GC pressure. 32 is the sweet spot.

  2. downloadQueue = LinkPrefetchWindowSize = 128 lets the fetcher run far ahead (lightweight URL metadata only), ensuring downloads never starve waiting for links.

  3. Async memory signaling via SemaphoreSlim instead of Task.Delay(10ms) polling — truly async (no ThreadPool thread blocked), with instant wakeup on ReleaseMemory().

Test plan

  • catalog_returns SF10 (1.4M rows): row count = 1,439,882 verified
  • catalog_sales SF10 (14.4M rows): row count = 14,400,425 verified
  • Build succeeds with 0 errors, 0 warnings
  • Run benchmark suite with benchmark label

This pull request was AI-assisted by Isaac.

@msrathore-db msrathore-db added the benchmark Run performance benchmarks on this PR label Mar 30, 2026
@github-actions
Copy link
Copy Markdown

📊 CloudFetch Benchmark Results

.NET 8.0
Query Mean (ms) Peak Memory (MB) Allocated Memory (MB) Gen2 Rows Cols
catalog_sales 1260.69 408.42 408.42 13 1441548 34
catalog_sales_sf10 8775.18 2808.22 2808.22 44 14400425 34
customer 759.42 37.17 37.17 2 100000 18
inventory 6831.99 366.56 366.56 33 11745000 5
sales_with_timestamps 3460.59 289.68 289.68 15 2880404 13
store_sales_numeric 1820.78 516.95 516.95 15 2880404 16
store_sales_sf10 16218.81 3754.52 3754.52 61 28800501 23
web_sales 1047.26 277.36 277.36 11 719384 34
wide_sales_analysis 4758.08 1325.69 1325.69 26 2880404 54
.NET Framework 4.7.2
Query Mean (ms) Peak Memory (MB) Allocated Memory (MB) Gen2 Rows Cols
catalog_sales 4833.28 630.05 630.05 3 1441548 34
catalog_sales_sf10 57488.72 7004.48 7004.48 18 14400425 34
customer 1825.38 38.79 38.79 0 100000 18
inventory 45894.32 394.57 394.57 2 11745000 5
sales_with_timestamps 14021.43 335.19 335.19 1 2880404 13
store_sales_numeric 12960.08 602.79 602.79 2 2880404 16
store_sales_sf10 108483.19 4075.40 4075.40 46 28800501 23
web_sales 3815.22 503.31 503.31 2 719384 34
wide_sales_analysis 13736.88 3200.25 3200.25 7 2880404 54

🟢 Improvement | 🔴 Regression | ⚪ No change

Format: current_value (baseline) diff%

  • Baseline: Latest successful run on main branch

Metrics:

  • Mean: Execution time in milliseconds
  • Peak Memory: Total bytes allocated during operation in MB
  • Allocated Memory: Bytes allocated per operation in MB
  • Gen2: Number of Gen2 garbage collections

@msrathore-db msrathore-db added benchmark Run performance benchmarks on this PR and removed benchmark Run performance benchmarks on this PR labels Mar 30, 2026
@github-actions
Copy link
Copy Markdown

📊 CloudFetch Benchmark Results

.NET 8.0
Query Mean (ms) Peak Memory (MB) Allocated Memory (MB) Gen2 Rows Cols
catalog_sales 1030.56 333.26 333.26 8 1441548 34
catalog_sales_sf10 11355.40 2536.58 2536.58 130 14400425 34
customer 429.13 33.49 33.49 2 100000 18
inventory 6577.67 215.71 215.71 12 11745000 5
sales_with_timestamps 2815.85 145.55 145.55 6 2880404 13
store_sales_numeric 1720.55 378.83 378.83 19 2880404 16
store_sales_sf10 15389.10 3448.34 3448.34 264 28800501 23
web_sales 711.66 208.38 208.38 7 719384 34
wide_sales_analysis 4050.66 1160.05 1160.05 25 2880404 54
.NET Framework 4.7.2
Query Mean (ms) Peak Memory (MB) Allocated Memory (MB) Gen2 Rows Cols
catalog_sales 4308.78 486.08 486.08 4 1441548 34
catalog_sales_sf10 56818.82 6650.03 6650.03 85 14400425 34
customer 923.53 39.00 39.00 0 100000 18
inventory 45319.08 238.09 238.09 3 11745000 5
sales_with_timestamps 13312.51 101.97 101.97 1 2880404 13
store_sales_numeric 12481.35 359.02 359.02 2 2880404 16
store_sales_sf10 107773.68 3719.43 3719.43 61 28800501 23
web_sales 3406.43 359.51 359.51 3 719384 34
wide_sales_analysis 13254.09 2874.72 2874.72 23 2880404 54

🟢 Improvement | 🔴 Regression | ⚪ No change

Format: current_value (baseline) diff%

  • Baseline: Latest successful run on main branch

Metrics:

  • Mean: Execution time in milliseconds
  • Peak Memory: Total bytes allocated during operation in MB
  • Allocated Memory: Bytes allocated per operation in MB
  • Gen2: Number of Gen2 garbage collections

@msrathore-db msrathore-db added benchmark Run performance benchmarks on this PR and removed benchmark Run performance benchmarks on this PR labels Mar 30, 2026
@github-actions
Copy link
Copy Markdown

📊 CloudFetch Benchmark Results

.NET 8.0
Query Mean (ms) Peak Memory (MB) Allocated Memory (MB) Gen2 Rows Cols
catalog_sales 2372.62 288.18 288.18 33 1441548 34
catalog_sales_sf10 9022.56 2505.67 2505.67 183 14400425 34
customer 2115.03 33.36 33.36 2 100000 18
inventory 7812.26 209.71 209.71 29 11745000 5
sales_with_timestamps 3793.96 102.48 102.48 24 2880404 13
store_sales_numeric 2860.48 338.08 338.08 40 2880404 16
store_sales_sf10 16291.15 3442.46 3442.46 290 28800501 23
web_sales 1946.31 155.99 155.99 23 719384 34
wide_sales_analysis 6099.68 1148.64 1148.64 57 2880404 54
.NET Framework 4.7.2
Query Mean (ms) Peak Memory (MB) Allocated Memory (MB) Gen2 Rows Cols
catalog_sales 3979.26 516.62 516.62 0 1441548 34
catalog_sales_sf10 57289.41 6681.01 6681.01 22 14400425 34
customer 1263.40 34.99 34.99 0 100000 18
inventory 45952.37 247.32 247.32 3 11745000 5
sales_with_timestamps 13939.86 129.32 129.32 0 2880404 13
store_sales_numeric 12781.17 382.41 382.41 0 2880404 16
store_sales_sf10 108373.50 3747.75 3747.75 15 28800501 23
web_sales 3795.06 383.33 383.33 0 719384 34
wide_sales_analysis 13777.16 2907.47 2907.47 2 2880404 54

🟢 Improvement | 🔴 Regression | ⚪ No change

Format: current_value (baseline) diff%

  • Baseline: Latest successful run on main branch

Metrics:

  • Mean: Execution time in milliseconds
  • Peak Memory: Total bytes allocated during operation in MB
  • Allocated Memory: Bytes allocated per operation in MB
  • Gen2: Number of Gen2 garbage collections

@@ -43,6 +43,13 @@ public static void Main(string[] args)
// Enable TLS 1.2/1.3 for .NET Framework 4.7.2 (required for modern HTTPS endpoints)
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12 | SecurityProtocolType.Tls11 | (SecurityProtocolType)3072; // 3072 = Tls13
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to the current change, but it would be better to do something like ServicePointManager.SecurityProtocol = ServicePointManager.SecurityProtocol | SecurityProtocolType.Tls12 | (SecurityProtocolType)3072; // 3072 = Tls13 in case the host has deliberately disabled TLS 1.1.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derp! this is in benchmark code. Please ignore.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah please ignore. This is some experimentation on my end.

@msrathore-db msrathore-db added benchmark Run performance benchmarks on this PR and removed benchmark Run performance benchmarks on this PR labels Mar 30, 2026
@msrathore-db msrathore-db marked this pull request as draft March 30, 2026 17:03
@github-actions
Copy link
Copy Markdown

📊 CloudFetch Benchmark Results

.NET 8.0
Query Mean (ms) Peak Memory (MB) Allocated Memory (MB) Gen2 Rows Cols
catalog_sales 2312.72 292.41 292.41 30 1441548 34
catalog_sales_sf10 9016.54 2519.57 2519.57 149 14400425 34
customer 2044.65 33.36 33.36 2 100000 18
inventory 7718.41 218.69 218.69 25 11745000 5
sales_with_timestamps 3491.06 92.84 92.84 38 2880404 13
store_sales_numeric 2767.17 338.75 338.75 37 2880404 16
store_sales_sf10 16399.88 3442.68 3442.68 278 28800501 23
web_sales 1886.43 163.57 163.57 18 719384 34
wide_sales_analysis 5825.44 1151.58 1151.58 61 2880404 54
.NET Framework 4.7.2
Query Mean (ms) Peak Memory (MB) Allocated Memory (MB) Gen2 Rows Cols
catalog_sales 3761.38 512.81 512.81 0 1441548 34
catalog_sales_sf10 56902.08 6675.17 6675.17 18 14400425 34
customer 1094.36 34.98 34.98 0 100000 18
inventory 45582.38 263.17 263.17 2 11745000 5
sales_with_timestamps 13394.89 131.89 131.89 0 2880404 13
store_sales_numeric 12565.32 384.78 384.78 0 2880404 16
store_sales_sf10 107997.20 3742.39 3742.39 31 28800501 23
web_sales 3560.60 382.03 382.03 0 719384 34
wide_sales_analysis 13405.32 2902.55 2902.55 2 2880404 54

🟢 Improvement | 🔴 Regression | ⚪ No change

Format: current_value (baseline) diff%

  • Baseline: Latest successful run on main branch

Metrics:

  • Mean: Execution time in milliseconds
  • Peak Memory: Total bytes allocated during operation in MB
  • Allocated Memory: Bytes allocated per operation in MB
  • Gen2: Number of Gen2 garbage collections

@msrathore-db msrathore-db added benchmark Run performance benchmarks on this PR and removed benchmark Run performance benchmarks on this PR labels Mar 30, 2026
@msrathore-db msrathore-db added benchmark Run performance benchmarks on this PR and removed benchmark Run performance benchmarks on this PR labels Apr 2, 2026
@msrathore-db msrathore-db force-pushed the cloudfetch-parity branch 2 times, most recently from 60d41f6 to 5221e42 Compare April 2, 2026 06:01
@msrathore-db msrathore-db added benchmark Run performance benchmarks on this PR and removed benchmark Run performance benchmarks on this PR labels Apr 2, 2026
@msrathore-db msrathore-db added benchmark Run performance benchmarks on this PR and removed benchmark Run performance benchmarks on this PR labels Apr 2, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

📊 CloudFetch Benchmark Results

.NET 8.0
Query Mean (ms) Peak Memory (MB) Allocated Memory (MB) Gen2 Rows Cols
catalog_sales 1085.80 1545.46 1545.46 13 1441548 34
.NET Framework 4.7.2
Query Mean (ms) Peak Memory (MB) Allocated Memory (MB) Gen2 Rows Cols
catalog_sales 3906.76 1854.13 1854.13 9 1441548 34

🟢 Improvement | 🔴 Regression | ⚪ No change

Format: current_value (baseline) diff%

  • Baseline: Latest successful run on main branch

Metrics:

  • Mean: Execution time in milliseconds
  • Peak Memory: Total bytes allocated during operation in MB
  • Allocated Memory: Bytes allocated per operation in MB
  • Gen2: Number of Gen2 garbage collections

@msrathore-db msrathore-db force-pushed the cloudfetch-parity branch 2 times, most recently from 816bfe2 to efe9c92 Compare April 2, 2026 10:39
@msrathore-db msrathore-db added benchmark Run performance benchmarks on this PR and removed benchmark Run performance benchmarks on this PR labels Apr 2, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

📊 CloudFetch Benchmark Results

.NET 8.0
Query Mean (ms) Peak Memory (MB) Allocated Memory (MB) Gen2 Rows Cols
catalog_sales 1464.78 1540.76 1540.76 15 1441548 34
catalog_sales_sf10 9166.15 14592.50 14592.50 66 14400425 34
customer 690.97 53.93 53.93 1 100000 18
inventory 6707.19 731.39 731.39 20 11745000 5
sales_with_timestamps 2908.39 812.27 812.27 14 2880404 13
store_sales_numeric 1945.41 1918.43 1918.43 16 2880404 16
store_sales_sf10 15462.31 22162.70 22162.70 108 28800501 23
web_sales 1050.79 766.82 766.82 14 719384 34
wide_sales_analysis 6963.46 5464.58 5464.58 30 2880404 54
.NET Framework 4.7.2
Query Mean (ms) Peak Memory (MB) Allocated Memory (MB) Gen2 Rows Cols
catalog_sales 3857.90 1858.75 1858.75 8 1441548 34

🟢 Improvement | 🔴 Regression | ⚪ No change

Format: current_value (baseline) diff%

  • Baseline: Latest successful run on main branch

Metrics:

  • Mean: Execution time in milliseconds
  • Peak Memory: Total bytes allocated during operation in MB
  • Allocated Memory: Bytes allocated per operation in MB
  • Gen2: Number of Gen2 garbage collections

@msrathore-db msrathore-db added benchmark Run performance benchmarks on this PR and removed benchmark Run performance benchmarks on this PR labels Apr 3, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 3, 2026

📊 CloudFetch Benchmark Results

.NET 8.0
Query Mean (ms) Peak Memory (MB) Allocated Memory (MB) Gen2 Rows Cols
catalog_sales 1118.85 1546.43 1546.43 10 1441548 34
catalog_sales_sf10 8476.12 14559.63 14559.63 61 14400425 34
customer 448.60 58.05 58.05 1 100000 18
inventory 6554.74 727.26 727.26 13 11745000 5
sales_with_timestamps 2911.79 849.01 849.01 8 2880404 13
store_sales_numeric 1492.18 1899.64 1899.64 14 2880404 16
store_sales_sf10 16111.72 22171.60 22171.60 88 28800501 23
web_sales 739.51 706.58 706.58 8 719384 34
wide_sales_analysis 4958.31 5604.30 5604.30 29 2880404 54
.NET Framework 4.7.2
Query Mean (ms) Peak Memory (MB) Allocated Memory (MB) Gen2 Rows Cols
catalog_sales 4133.20 1856.74 1856.74 11 1441548 34
catalog_sales_sf10 58165.59 21485.25 21485.25 67 14400425 34
customer 910.76 64.47 64.47 2 100000 18
inventory 46176.97 745.73 745.73 5 11745000 5
sales_with_timestamps 13393.95 850.39 850.39 7 2880404 13
store_sales_numeric 12449.10 2202.72 2202.72 8 2880404 16
store_sales_sf10 109855.37 26969.23 26969.23 80 28800501 23
web_sales 3478.31 989.51 989.51 6 719384 34
wide_sales_analysis 13011.72 8163.36 8163.36 25 2880404 54

🟢 Improvement | 🔴 Regression | ⚪ No change

Format: current_value (baseline) diff%

  • Baseline: Latest successful run on main branch

Metrics:

  • Mean: Execution time in milliseconds
  • Peak Memory: Total bytes allocated during operation in MB
  • Allocated Memory: Bytes allocated per operation in MB
  • Gen2: Number of Gen2 garbage collections

…provement)

Tune CloudFetch pipeline defaults and architecture to match JDBC driver:

Pipeline optimizations:
- Pre-parse Arrow IPC on download threads (JDBC parity): moves Arrow
  deserialization from the single reader thread to the 16 download threads.
  Reader now iterates pre-parsed RecordBatch objects (pure memory access)
  instead of parsing Arrow IPC on-the-fly. This is the key architectural
  change — it enables the sliding window to work because the reader can
  consume chunks faster than downloads complete.
- Replace memory polling (Task.Delay 10ms) with async SemaphoreSlim
  signaling for instant wakeup on memory release.

Default tuning:
- Increase ParallelDownloads from 3 to 16 (JDBC uses 16 threads)
- Increase PrefetchCount from 2 to 16 (resultQueue 4→32, implicit sliding window)
- Increase MemoryBufferSizeMB from 200 to 400 (supports higher parallelism)
- Add LinkPrefetchWindowSize=128 (downloadQueue capacity, matches JDBC)

Benchmark (catalog_sales SF10, 14.4M rows, 34 columns):
  Baseline:  ~57K rows/sec (avg), best 70K
  Optimized: ~76K rows/sec (avg), best 94K (~40% improvement)

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark Run performance benchmarks on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants