Skip to content

Commit df0898b

Browse files
Add odbc native app
1 parent c552cac commit df0898b

32 files changed

Lines changed: 5135 additions & 1 deletion
Lines changed: 195 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,195 @@
1+
# ICM 792529661 — Native Memory OOM with `Encrypt=Strict`
2+
3+
## Executive Summary
4+
5+
When using **`Encrypt=Strict`** (TDS 8.0) with **SQL Authentication** on **Windows native SNI**, each new `SqlConnection.Open()` leaks **~50–100 KB of native memory** that is never reclaimed, even after connection close/dispose. Under high-throughput workloads, this leads to **out-of-memory crashes**.
6+
7+
**Root Cause:** Windows SChannel's **TLS 1.3 session ticket cache** stores resumption tickets in a process-global, per-credential cache. Each new TLS 1.3 connection receives session tickets from the server, and SChannel caches them indefinitely. There is **no public API** to evict, limit, or disable this cache from user mode.
8+
9+
**Key observation — the leak does NOT occur with:**
10+
- Managed SNI (uses .NET `SslStream`, which doesn't use SChannel's session cache)
11+
- TLS 1.2 connections (no session tickets)
12+
- Non-Strict encryption modes (TLS handshake is handled differently)
13+
14+
---
15+
16+
## Detailed Technical Explanation
17+
18+
### 1. The Connection Flow with `Encrypt=Strict` (TDS 8.0)
19+
20+
In TDS 8.0 ("Strict" encryption), the TLS handshake happens **before any TDS traffic**. The flow is:
21+
22+
```
23+
Client SQL Server
24+
| |
25+
|--- TCP Connect ----------------->|
26+
|--- TLS ClientHello ------------>| ← TLS wraps the entire connection
27+
|<-- TLS ServerHello + Cert ------|
28+
|--- TLS Finished --------------->|
29+
|<-- TLS Finished ----------------|
30+
| |
31+
|=== TDS traffic inside TLS ======|
32+
|--- TDS Login7 (SQL Auth) ------>|
33+
|<-- TDS Login Response ----------|
34+
```
35+
36+
This differs from `Encrypt=Mandatory` where TDS pre-login happens first, then TLS wraps only the login, then optionally continues encrypted.
37+
38+
### 2. TLS 1.3 Session Tickets
39+
40+
TLS 1.3 introduced **post-handshake session tickets** (RFC 8446 §4.6.1). After the handshake completes, the server sends `NewSessionTicket` messages:
41+
42+
```
43+
Client SQL Server
44+
| |
45+
|=== Handshake complete ===========|
46+
|<-- NewSessionTicket (ticket 1) --| ← Server pushes tickets
47+
|<-- NewSessionTicket (ticket 2) --| ← Often 2+ tickets
48+
| |
49+
```
50+
51+
These tickets allow the client to perform **0-RTT or 1-RTT resumption** on future connections — skipping the expensive key exchange. SQL Server typically sends **2 session tickets** per connection.
52+
53+
### 3. SChannel's Session Ticket Cache
54+
55+
On Windows, the TLS implementation is **SChannel** (Secure Channel), a system DLL (`schannel.dll`). When SChannel receives `NewSessionTicket` messages, it:
56+
57+
1. Deserializes the ticket (contains encrypted session state, PSK identity, expiry)
58+
2. Stores it in a **process-global hash table** keyed by server name + credential handle
59+
3. Each ticket is ~20–50 KB (includes the PSK, ticket nonce, server certificate chain hash, etc.)
60+
61+
**The critical problem:** SChannel has **no public API** to:
62+
- Limit the number of cached tickets
63+
- Evict specific tickets
64+
- Disable ticket acceptance per-context
65+
- Set a maximum cache size
66+
67+
The cache grows unbounded as new connections produce new tickets.
68+
69+
### 4. Why Only `Encrypt=Strict`?
70+
71+
With `Encrypt=Mandatory` or `Encrypt=Optional`:
72+
- The TLS session is often **reused** across the connection pool because the pool keeps TCP connections alive
73+
- New TLS handshakes happen infrequently (only on pool misses or reconnects)
74+
- The ticket cache grows slowly
75+
76+
With `Encrypt=Strict`:
77+
- In high-throughput scenarios or when connections are frequently created/destroyed, many new TLS sessions occur
78+
- Each new TLS 1.3 handshake → server sends 2 new tickets → SChannel caches them
79+
- **~50–100 KB per connection** leaked permanently
80+
81+
### 5. Memory Growth Mechanics
82+
83+
```
84+
Connection 1: TLS handshake → 2 tickets cached → +50 KB
85+
Connection 2: TLS handshake → 2 tickets cached → +50 KB (old tickets NOT evicted)
86+
Connection 3: TLS handshake → 2 tickets cached → +50 KB
87+
...
88+
Connection N: TLS handshake → 2 tickets cached → +50 KB
89+
90+
Total leaked: N × ~50 KB (never freed)
91+
```
92+
93+
Even though connections are closed and disposed, the tickets remain in SChannel's process-global cache. The `DeleteSecurityContext` and `FreeCredentialsHandle` calls do NOT purge associated tickets.
94+
95+
### 6. Why Managed SNI Doesn't Leak
96+
97+
Managed SNI uses .NET's `SslStream` class, which:
98+
- Uses its own managed TLS implementation
99+
- .NET's `SslStream` disposes cleanly and the managed GC reclaims all associated buffers
100+
- The session cache in the managed path is bounded and properly evicted
101+
102+
### 7. The Native SNI Code Path
103+
104+
In `ssl.cpp`, the relevant flow is:
105+
106+
```cpp
107+
// Credential acquisition
108+
AcquireCredentialsHandle(..., &schCredentials, ..., &credHandle);
109+
110+
// TLS handshake
111+
InitializeSecurityContext(&credHandle, ..., &ctxtHandle, ...);
112+
// ↑ This is where SChannel receives and caches session tickets
113+
114+
// Connection close
115+
DeleteSecurityContext(&ctxtHandle); // Does NOT purge ticket cache
116+
FreeCredentialHandle(&credHandle); // Does NOT purge ticket cache
117+
```
118+
119+
---
120+
121+
## Fix Attempts (All Failed)
122+
123+
| # | Approach | Implementation | Outcome |
124+
|---|----------|---------------|---------|
125+
| 1 | **`SCH_CRED_DISABLE_RECONNECTS`** | Set flag on `SCHANNEL_CRED` structure passed to `AcquireCredentialsHandle` | Only prevents client from *offering* tickets for resumption. Does NOT prevent server from *sending* tickets, and does NOT prevent SChannel from *caching* received tickets. **Still leaks.** |
126+
| 2 | **Per-connection unique credentials** | Create fresh `CredHandle` for each connection instead of sharing | Ticket cache is indexed by {server name, credential config}. Fresh creds just create new cache buckets — tickets still accumulate. **Still leaks.** |
127+
| 3 | **`dwSessionLifespan = 1`** | Set minimum session lifetime on credential | Controls how long SChannel will *reuse* a cached ticket for outbound reconnection. Does NOT control how long tickets are *stored* in memory. **Still leaks.** |
128+
| 4 | **`ApplyControlToken` + `SSL_SESSION_DISABLE`** | Applied post-handshake to disable caching on the security context | Only applies to future operations on that context — tickets already received and cached are not affected. **Still leaks.** |
129+
| 5 | **`SslEmptyCacheW(NULL)`** | Called periodically or per-connection to flush entire SChannel cache | Nuclear option: purges ALL cached sessions process-wide. Causes thundering-herd re-handshakes, race conditions, and performance collapse. Tickets re-accumulate immediately. **Not viable.** |
130+
131+
**Benchmark results after all fixes:** ~68–108 KB/connection growth (unchanged from baseline).
132+
133+
---
134+
135+
## Platform Constraints
136+
137+
| Platform | Managed SNI Available? | Native SNI Required? | Workaround Possible? |
138+
|----------|----------------------|---------------------|---------------------|
139+
| **.NET 8/9 (Windows)** | Yes (opt-in via `UseManagedSNIOnWindows`) | Default but not required | Yes — use managed SNI |
140+
| **.NET Framework 4.6.2+** | **No** | **Yes — only option** | **No managed fallback** |
141+
142+
---
143+
144+
## Impact
145+
146+
- **Affected:** Any Windows application using native SNI + `Encrypt=Strict` + TLS 1.3 (default for SQL Server 2022+)
147+
- **Severity:** Process eventually OOMs under sustained connection creation patterns
148+
- **Rate:** ~50–100 KB per unique TLS session
149+
- **.NET Framework:** Cannot use managed SNI — permanently affected unless native fix found
150+
- **.NET Core:** Can work around via `UseManagedSNIOnWindows=true`
151+
152+
---
153+
154+
## Viable Paths Forward
155+
156+
### For .NET Core/.NET 8+ (Short-term)
157+
- Auto-switch to managed SNI when `Encrypt=Strict` is used
158+
- Or document `UseManagedSNIOnWindows=true` as recommended workaround
159+
160+
### For .NET Framework (Short-term)
161+
- **Cap TLS to 1.2** for Strict connections on native SNI (avoids session tickets entirely, but loses TLS 1.3 benefits)
162+
- **Accept and document** the limitation with guidance on connection pooling to minimize new TLS handshakes
163+
164+
### Long-term
165+
- **File a Windows/SChannel bug** requesting a cache eviction API or per-context opt-out
166+
- Windows team provides a proper API to control session ticket caching behavior per-credential or per-context
167+
168+
---
169+
170+
## The Fundamental Problem
171+
172+
This is a **design limitation in Windows SChannel**. The session ticket cache was designed for web browsers where:
173+
- You connect to a few hundred unique servers
174+
- Cache growth is bounded by the number of unique servers
175+
- The browser process restarts regularly
176+
177+
For database drivers:
178+
- You connect to the SAME server thousands/millions of times
179+
- Each connection gets new tickets (SQL Server rotates tickets)
180+
- The process runs for months/years (service lifetime)
181+
- The cache grows unboundedly because SQL Server issues fresh tickets per-connection
182+
183+
**There is no user-mode fix — it requires either a Windows/SChannel update to provide a cache control API, or avoiding TLS 1.3 on the native path.**
184+
185+
---
186+
187+
## References
188+
189+
- **ICM:** 792529661
190+
- **Affected component:** `Microsoft.Data.SqlClient.SNI` (native SNI, `ssl.cpp`)
191+
- **Branch (SNI):** `dev/ad/oom-fix` in `Microsoft.Data.SqlClient.sni` repo
192+
- **Branch (SqlClient):** `dev/ad/strict-oom` in `dotnet/SqlClient`
193+
- **Benchmark tool:** `tools/StrictEncryptMemoryBenchmark/`
194+
- **RFC 8446 §4.6.1:** TLS 1.3 Post-Handshake Messages — NewSessionTicket
195+
- **MS-TDS 8.0:** Strict encryption mode specification
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
cmake_minimum_required(VERSION 3.16)
2+
project(OdbcMemoryBenchmark LANGUAGES CXX)
3+
4+
set(CMAKE_CXX_STANDARD 17)
5+
set(CMAKE_CXX_STANDARD_REQUIRED ON)
6+
7+
add_executable(OdbcMemoryBenchmark OdbcMemoryBenchmark.cpp)
8+
9+
# Link ODBC and Windows memory APIs
10+
target_link_libraries(OdbcMemoryBenchmark PRIVATE odbc32 psapi)

0 commit comments

Comments
 (0)