feat(csharp): optimize CloudFetch defaults for JDBC parity (~32% throughput) by msrathore-db · Pull Request #377 · adbc-drivers/databricks

msrathore-db · 2026-03-30T06:59:15Z

Summary

Increase ParallelDownloads from 3 → 16 (matches JDBC's 16-thread pool)
Increase PrefetchCount from 2 → 16 (resultQueue capacity 4 → 32, the implicit sliding window)
Increase MemoryBufferSizeMB from 200 → 400 (supports higher parallelism without starving)
Add LinkPrefetchWindowSize = 128 (downloadQueue capacity, matches JDBC's LinkPrefetchWindow)
Replace memory polling (Task.Delay(10ms) loop) with async SemaphoreSlim signaling — eliminates 10ms latency floor and ThreadPool thread blocking
Add catalog_sales_sf10 to benchmark-queries.json (14.4M rows, 34 columns)
Add QuickValidation tool for quick correctness + timing checks (uses env vars for credentials)

Benchmark Results

Query: SELECT * FROM main.tpcds_sf10_delta.catalog_sales (14.4M rows, 34 columns)

Parameter Sweep

Config	HTTP	Prefetch (rQ)	Memory	Rows/sec
Baseline	3	2 (rQ=4)	200MB	~57K
16/16/400 (this PR)	16	16 (rQ=32)	400MB	~75K
16/32/400	16	32 (rQ=64)	400MB	~55K
16/16/200	16	16 (rQ=32)	200MB	~73K
10/16/400	10	16 (rQ=32)	400MB	~83K
16/20/400	16	20 (rQ=40)	400MB	~45K

Selected Config (16/16/400) vs Baseline

Metric	Baseline	This PR	Delta
Avg rows/sec	~57,000	~75,000	+32%
Best single run	70,128	92,977	+33%

Key Design Decisions

resultQueue = PrefetchCount x 2 = 32 is the implicit sliding window. Too small (4) starves downloads. Too big (64) causes GC pressure. 32 is the sweet spot.
downloadQueue = LinkPrefetchWindowSize = 128 lets the fetcher run far ahead (lightweight URL metadata only), ensuring downloads never starve waiting for links.
Async memory signaling via SemaphoreSlim instead of Task.Delay(10ms) polling — truly async (no ThreadPool thread blocked), with instant wakeup on ReleaseMemory().

Test plan

catalog_returns SF10 (1.4M rows): row count = 1,439,882 verified
catalog_sales SF10 (14.4M rows): row count = 14,400,425 verified
Build succeeds with 0 errors, 0 warnings
Run benchmark suite with benchmark label

This pull request was AI-assisted by Isaac.

github-actions · 2026-03-30T07:58:16Z

📊 CloudFetch Benchmark Results

.NET 8.0

Query	Mean (ms)	Peak Memory (MB)	Allocated Memory (MB)	Gen2	Rows	Cols
catalog_sales	1260.69	408.42	408.42	13	1441548	34
catalog_sales_sf10	8775.18	2808.22	2808.22	44	14400425	34
customer	759.42	37.17	37.17	2	100000	18
inventory	6831.99	366.56	366.56	33	11745000	5
sales_with_timestamps	3460.59	289.68	289.68	15	2880404	13
store_sales_numeric	1820.78	516.95	516.95	15	2880404	16
store_sales_sf10	16218.81	3754.52	3754.52	61	28800501	23
web_sales	1047.26	277.36	277.36	11	719384	34
wide_sales_analysis	4758.08	1325.69	1325.69	26	2880404	54

.NET Framework 4.7.2

Query	Mean (ms)	Peak Memory (MB)	Allocated Memory (MB)	Gen2	Rows	Cols
catalog_sales	4833.28	630.05	630.05	3	1441548	34
catalog_sales_sf10	57488.72	7004.48	7004.48	18	14400425	34
customer	1825.38	38.79	38.79	0	100000	18
inventory	45894.32	394.57	394.57	2	11745000	5
sales_with_timestamps	14021.43	335.19	335.19	1	2880404	13
store_sales_numeric	12960.08	602.79	602.79	2	2880404	16
store_sales_sf10	108483.19	4075.40	4075.40	46	28800501	23
web_sales	3815.22	503.31	503.31	2	719384	34
wide_sales_analysis	13736.88	3200.25	3200.25	7	2880404	54

🟢 Improvement | 🔴 Regression | ⚪ No change

Format: current_value (baseline) diff%

Baseline: Latest successful run on main branch

Metrics:

Mean: Execution time in milliseconds
Peak Memory: Total bytes allocated during operation in MB
Allocated Memory: Bytes allocated per operation in MB
Gen2: Number of Gen2 garbage collections

github-actions · 2026-03-30T13:55:35Z

📊 CloudFetch Benchmark Results

.NET 8.0

Query	Mean (ms)	Peak Memory (MB)	Allocated Memory (MB)	Gen2	Rows	Cols
catalog_sales	1030.56	333.26	333.26	8	1441548	34
catalog_sales_sf10	11355.40	2536.58	2536.58	130	14400425	34
customer	429.13	33.49	33.49	2	100000	18
inventory	6577.67	215.71	215.71	12	11745000	5
sales_with_timestamps	2815.85	145.55	145.55	6	2880404	13
store_sales_numeric	1720.55	378.83	378.83	19	2880404	16
store_sales_sf10	15389.10	3448.34	3448.34	264	28800501	23
web_sales	711.66	208.38	208.38	7	719384	34
wide_sales_analysis	4050.66	1160.05	1160.05	25	2880404	54

.NET Framework 4.7.2

Query	Mean (ms)	Peak Memory (MB)	Allocated Memory (MB)	Gen2	Rows	Cols
catalog_sales	4308.78	486.08	486.08	4	1441548	34
catalog_sales_sf10	56818.82	6650.03	6650.03	85	14400425	34
customer	923.53	39.00	39.00	0	100000	18
inventory	45319.08	238.09	238.09	3	11745000	5
sales_with_timestamps	13312.51	101.97	101.97	1	2880404	13
store_sales_numeric	12481.35	359.02	359.02	2	2880404	16
store_sales_sf10	107773.68	3719.43	3719.43	61	28800501	23
web_sales	3406.43	359.51	359.51	3	719384	34
wide_sales_analysis	13254.09	2874.72	2874.72	23	2880404	54

🟢 Improvement | 🔴 Regression | ⚪ No change

Format: current_value (baseline) diff%

Baseline: Latest successful run on main branch

Metrics:

Mean: Execution time in milliseconds
Peak Memory: Total bytes allocated during operation in MB
Allocated Memory: Bytes allocated per operation in MB
Gen2: Number of Gen2 garbage collections

github-actions · 2026-03-30T15:11:03Z

📊 CloudFetch Benchmark Results

.NET 8.0

Query	Mean (ms)	Peak Memory (MB)	Allocated Memory (MB)	Gen2	Rows	Cols
catalog_sales	2372.62	288.18	288.18	33	1441548	34
catalog_sales_sf10	9022.56	2505.67	2505.67	183	14400425	34
customer	2115.03	33.36	33.36	2	100000	18
inventory	7812.26	209.71	209.71	29	11745000	5
sales_with_timestamps	3793.96	102.48	102.48	24	2880404	13
store_sales_numeric	2860.48	338.08	338.08	40	2880404	16
store_sales_sf10	16291.15	3442.46	3442.46	290	28800501	23
web_sales	1946.31	155.99	155.99	23	719384	34
wide_sales_analysis	6099.68	1148.64	1148.64	57	2880404	54

.NET Framework 4.7.2

Query	Mean (ms)	Peak Memory (MB)	Allocated Memory (MB)	Gen2	Rows	Cols
catalog_sales	3979.26	516.62	516.62	0	1441548	34
catalog_sales_sf10	57289.41	6681.01	6681.01	22	14400425	34
customer	1263.40	34.99	34.99	0	100000	18
inventory	45952.37	247.32	247.32	3	11745000	5
sales_with_timestamps	13939.86	129.32	129.32	0	2880404	13
store_sales_numeric	12781.17	382.41	382.41	0	2880404	16
store_sales_sf10	108373.50	3747.75	3747.75	15	28800501	23
web_sales	3795.06	383.33	383.33	0	719384	34
wide_sales_analysis	13777.16	2907.47	2907.47	2	2880404	54

🟢 Improvement | 🔴 Regression | ⚪ No change

Format: current_value (baseline) diff%

Baseline: Latest successful run on main branch

Metrics:

Mean: Execution time in milliseconds
Peak Memory: Total bytes allocated during operation in MB
Allocated Memory: Bytes allocated per operation in MB
Gen2: Number of Gen2 garbage collections

CurtHagenlocher · 2026-03-30T15:41:31Z

csharp/Benchmarks/CloudFetchBenchmarkRunner.cs

@@ -43,6 +43,13 @@ public static void Main(string[] args)
            // Enable TLS 1.2/1.3 for .NET Framework 4.7.2 (required for modern HTTPS endpoints)
            ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12 | SecurityProtocolType.Tls11 | (SecurityProtocolType)3072; // 3072 = Tls13


Not related to the current change, but it would be better to do something like ServicePointManager.SecurityProtocol = ServicePointManager.SecurityProtocol | SecurityProtocolType.Tls12 | (SecurityProtocolType)3072; // 3072 = Tls13 in case the host has deliberately disabled TLS 1.1.

Derp! this is in benchmark code. Please ignore.

Yeah please ignore. This is some experimentation on my end.

github-actions · 2026-03-30T17:39:00Z

📊 CloudFetch Benchmark Results

.NET 8.0

Query	Mean (ms)	Peak Memory (MB)	Allocated Memory (MB)	Gen2	Rows	Cols
catalog_sales	2312.72	292.41	292.41	30	1441548	34
catalog_sales_sf10	9016.54	2519.57	2519.57	149	14400425	34
customer	2044.65	33.36	33.36	2	100000	18
inventory	7718.41	218.69	218.69	25	11745000	5
sales_with_timestamps	3491.06	92.84	92.84	38	2880404	13
store_sales_numeric	2767.17	338.75	338.75	37	2880404	16
store_sales_sf10	16399.88	3442.68	3442.68	278	28800501	23
web_sales	1886.43	163.57	163.57	18	719384	34
wide_sales_analysis	5825.44	1151.58	1151.58	61	2880404	54

.NET Framework 4.7.2

Query	Mean (ms)	Peak Memory (MB)	Allocated Memory (MB)	Gen2	Rows	Cols
catalog_sales	3761.38	512.81	512.81	0	1441548	34
catalog_sales_sf10	56902.08	6675.17	6675.17	18	14400425	34
customer	1094.36	34.98	34.98	0	100000	18
inventory	45582.38	263.17	263.17	2	11745000	5
sales_with_timestamps	13394.89	131.89	131.89	0	2880404	13
store_sales_numeric	12565.32	384.78	384.78	0	2880404	16
store_sales_sf10	107997.20	3742.39	3742.39	31	28800501	23
web_sales	3560.60	382.03	382.03	0	719384	34
wide_sales_analysis	13405.32	2902.55	2902.55	2	2880404	54

🟢 Improvement | 🔴 Regression | ⚪ No change

Format: current_value (baseline) diff%

Baseline: Latest successful run on main branch

Metrics:

Mean: Execution time in milliseconds
Peak Memory: Total bytes allocated during operation in MB
Allocated Memory: Bytes allocated per operation in MB
Gen2: Number of Gen2 garbage collections

github-actions · 2026-04-02T10:06:06Z

📊 CloudFetch Benchmark Results

.NET 8.0

Query	Mean (ms)	Peak Memory (MB)	Allocated Memory (MB)	Gen2	Rows	Cols
catalog_sales	1085.80	1545.46	1545.46	13	1441548	34

.NET Framework 4.7.2

Query	Mean (ms)	Peak Memory (MB)	Allocated Memory (MB)	Gen2	Rows	Cols
catalog_sales	3906.76	1854.13	1854.13	9	1441548	34

🟢 Improvement | 🔴 Regression | ⚪ No change

Format: current_value (baseline) diff%

Baseline: Latest successful run on main branch

Metrics:

Mean: Execution time in milliseconds
Peak Memory: Total bytes allocated during operation in MB
Allocated Memory: Bytes allocated per operation in MB
Gen2: Number of Gen2 garbage collections

github-actions · 2026-04-02T12:05:41Z

📊 CloudFetch Benchmark Results

.NET 8.0

Query	Mean (ms)	Peak Memory (MB)	Allocated Memory (MB)	Gen2	Rows	Cols
catalog_sales	1464.78	1540.76	1540.76	15	1441548	34
catalog_sales_sf10	9166.15	14592.50	14592.50	66	14400425	34
customer	690.97	53.93	53.93	1	100000	18
inventory	6707.19	731.39	731.39	20	11745000	5
sales_with_timestamps	2908.39	812.27	812.27	14	2880404	13
store_sales_numeric	1945.41	1918.43	1918.43	16	2880404	16
store_sales_sf10	15462.31	22162.70	22162.70	108	28800501	23
web_sales	1050.79	766.82	766.82	14	719384	34
wide_sales_analysis	6963.46	5464.58	5464.58	30	2880404	54

.NET Framework 4.7.2

Query	Mean (ms)	Peak Memory (MB)	Allocated Memory (MB)	Gen2	Rows	Cols
catalog_sales	3857.90	1858.75	1858.75	8	1441548	34

🟢 Improvement | 🔴 Regression | ⚪ No change

Format: current_value (baseline) diff%

Baseline: Latest successful run on main branch

Metrics:

Mean: Execution time in milliseconds
Peak Memory: Total bytes allocated during operation in MB
Allocated Memory: Bytes allocated per operation in MB
Gen2: Number of Gen2 garbage collections

github-actions · 2026-04-03T18:04:20Z

📊 CloudFetch Benchmark Results

.NET 8.0

Query	Mean (ms)	Peak Memory (MB)	Allocated Memory (MB)	Gen2	Rows	Cols
catalog_sales	1118.85	1546.43	1546.43	10	1441548	34
catalog_sales_sf10	8476.12	14559.63	14559.63	61	14400425	34
customer	448.60	58.05	58.05	1	100000	18
inventory	6554.74	727.26	727.26	13	11745000	5
sales_with_timestamps	2911.79	849.01	849.01	8	2880404	13
store_sales_numeric	1492.18	1899.64	1899.64	14	2880404	16
store_sales_sf10	16111.72	22171.60	22171.60	88	28800501	23
web_sales	739.51	706.58	706.58	8	719384	34
wide_sales_analysis	4958.31	5604.30	5604.30	29	2880404	54

.NET Framework 4.7.2

Query	Mean (ms)	Peak Memory (MB)	Allocated Memory (MB)	Gen2	Rows	Cols
catalog_sales	4133.20	1856.74	1856.74	11	1441548	34
catalog_sales_sf10	58165.59	21485.25	21485.25	67	14400425	34
customer	910.76	64.47	64.47	2	100000	18
inventory	46176.97	745.73	745.73	5	11745000	5
sales_with_timestamps	13393.95	850.39	850.39	7	2880404	13
store_sales_numeric	12449.10	2202.72	2202.72	8	2880404	16
store_sales_sf10	109855.37	26969.23	26969.23	80	28800501	23
web_sales	3478.31	989.51	989.51	6	719384	34
wide_sales_analysis	13011.72	8163.36	8163.36	25	2880404	54

🟢 Improvement | 🔴 Regression | ⚪ No change

Format: current_value (baseline) diff%

Baseline: Latest successful run on main branch

Metrics:

Mean: Execution time in milliseconds
Peak Memory: Total bytes allocated during operation in MB
Allocated Memory: Bytes allocated per operation in MB
Gen2: Number of Gen2 garbage collections

…provement) Tune CloudFetch pipeline defaults and architecture to match JDBC driver: Pipeline optimizations: - Pre-parse Arrow IPC on download threads (JDBC parity): moves Arrow deserialization from the single reader thread to the 16 download threads. Reader now iterates pre-parsed RecordBatch objects (pure memory access) instead of parsing Arrow IPC on-the-fly. This is the key architectural change — it enables the sliding window to work because the reader can consume chunks faster than downloads complete. - Replace memory polling (Task.Delay 10ms) with async SemaphoreSlim signaling for instant wakeup on memory release. Default tuning: - Increase ParallelDownloads from 3 to 16 (JDBC uses 16 threads) - Increase PrefetchCount from 2 to 16 (resultQueue 4→32, implicit sliding window) - Increase MemoryBufferSizeMB from 200 to 400 (supports higher parallelism) - Add LinkPrefetchWindowSize=128 (downloadQueue capacity, matches JDBC) Benchmark (catalog_sales SF10, 14.4M rows, 34 columns): Baseline: ~57K rows/sec (avg), best 70K Optimized: ~76K rows/sec (avg), best 94K (~40% improvement) Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>

msrathore-db added the benchmark Run performance benchmarks on this PR label Mar 30, 2026

msrathore-db temporarily deployed to azure-prod March 30, 2026 07:12 — with GitHub Actions Inactive

msrathore-db force-pushed the cloudfetch-parity branch from f9fe389 to fa6da7b Compare March 30, 2026 07:35

msrathore-db force-pushed the cloudfetch-parity branch from fa6da7b to fea5d6d Compare March 30, 2026 11:21

msrathore-db added benchmark Run performance benchmarks on this PR and removed benchmark Run performance benchmarks on this PR labels Mar 30, 2026

msrathore-db temporarily deployed to azure-prod March 30, 2026 13:26 — with GitHub Actions Inactive

msrathore-db force-pushed the cloudfetch-parity branch from fea5d6d to 4de0e50 Compare March 30, 2026 14:33

msrathore-db added benchmark Run performance benchmarks on this PR and removed benchmark Run performance benchmarks on this PR labels Mar 30, 2026

msrathore-db temporarily deployed to azure-prod March 30, 2026 14:37 — with GitHub Actions Inactive

CurtHagenlocher reviewed Mar 30, 2026

View reviewed changes

msrathore-db force-pushed the cloudfetch-parity branch from 4de0e50 to 9be0cb2 Compare March 30, 2026 16:23

msrathore-db added benchmark Run performance benchmarks on this PR and removed benchmark Run performance benchmarks on this PR labels Mar 30, 2026

msrathore-db marked this pull request as draft March 30, 2026 17:03

msrathore-db temporarily deployed to azure-prod March 30, 2026 17:03 — with GitHub Actions Inactive

msrathore-db force-pushed the cloudfetch-parity branch from 9be0cb2 to 332e0ca Compare March 30, 2026 19:21

msrathore-db added benchmark Run performance benchmarks on this PR and removed benchmark Run performance benchmarks on this PR labels Mar 30, 2026

msrathore-db temporarily deployed to azure-prod March 30, 2026 19:25 — with GitHub Actions Inactive

msrathore-db added benchmark Run performance benchmarks on this PR and removed benchmark Run performance benchmarks on this PR labels Apr 2, 2026

msrathore-db temporarily deployed to azure-prod April 2, 2026 05:18 — with GitHub Actions Inactive

msrathore-db had a problem deploying to azure-prod April 2, 2026 05:18 — with GitHub Actions Error

msrathore-db force-pushed the cloudfetch-parity branch 2 times, most recently from 60d41f6 to 5221e42 Compare April 2, 2026 06:01

msrathore-db added benchmark Run performance benchmarks on this PR and removed benchmark Run performance benchmarks on this PR labels Apr 2, 2026

msrathore-db had a problem deploying to azure-prod April 2, 2026 06:04 — with GitHub Actions Error

msrathore-db force-pushed the cloudfetch-parity branch from 5221e42 to 3ff0e84 Compare April 2, 2026 06:24

msrathore-db added benchmark Run performance benchmarks on this PR and removed benchmark Run performance benchmarks on this PR labels Apr 2, 2026

msrathore-db temporarily deployed to azure-prod April 2, 2026 08:54 — with GitHub Actions Inactive

msrathore-db force-pushed the cloudfetch-parity branch 2 times, most recently from 816bfe2 to efe9c92 Compare April 2, 2026 10:39

msrathore-db added benchmark Run performance benchmarks on this PR and removed benchmark Run performance benchmarks on this PR labels Apr 2, 2026

msrathore-db temporarily deployed to azure-prod April 2, 2026 10:43 — with GitHub Actions Inactive

msrathore-db force-pushed the cloudfetch-parity branch from efe9c92 to 0094324 Compare April 2, 2026 13:35

msrathore-db added benchmark Run performance benchmarks on this PR and removed benchmark Run performance benchmarks on this PR labels Apr 3, 2026

msrathore-db deployed to azure-prod April 3, 2026 17:34 — with GitHub Actions Active

msrathore-db temporarily deployed to azure-prod April 3, 2026 17:34 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(csharp): optimize CloudFetch defaults for JDBC parity (~32% throughput)#377

feat(csharp): optimize CloudFetch defaults for JDBC parity (~32% throughput)#377
msrathore-db wants to merge 2 commits intomainfrom
cloudfetch-parity

msrathore-db commented Mar 30, 2026

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

CurtHagenlocher Mar 30, 2026

Uh oh!

CurtHagenlocher Mar 30, 2026

Uh oh!

msrathore-db Mar 30, 2026

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

github-actions bot commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -43,6 +43,13 @@ public static void Main(string[] args)
		// Enable TLS 1.2/1.3 for .NET Framework 4.7.2 (required for modern HTTPS endpoints)
		ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12 \| SecurityProtocolType.Tls11 \| (SecurityProtocolType)3072; // 3072 = Tls13

Conversation

msrathore-db commented Mar 30, 2026

Summary

Benchmark Results

Parameter Sweep

Selected Config (16/16/400) vs Baseline

Key Design Decisions

Test plan

Uh oh!

github-actions bot commented Mar 30, 2026

📊 CloudFetch Benchmark Results

Uh oh!

github-actions bot commented Mar 30, 2026

📊 CloudFetch Benchmark Results

Uh oh!

github-actions bot commented Mar 30, 2026

📊 CloudFetch Benchmark Results

Uh oh!

CurtHagenlocher Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

CurtHagenlocher Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

msrathore-db Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 30, 2026

📊 CloudFetch Benchmark Results

Uh oh!

github-actions bot commented Apr 2, 2026

📊 CloudFetch Benchmark Results

Uh oh!

github-actions bot commented Apr 2, 2026

📊 CloudFetch Benchmark Results

Uh oh!

github-actions bot commented Apr 3, 2026

📊 CloudFetch Benchmark Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants