Skip to content

Commit 2cafc03

Browse files
BrandtVavasourBrandt
andauthored
Fix multi-process race in Wrapper.Start (LOCALDB_ERROR_INSTANCE_DOES_NOT_EXIST) (#929)
* reproduce concurrency error * fix: serialize LocalDB `Start` concurency issue Implement serialization of the `Wrapper.Start` method for LocalDB instances using both in-process and cross-process locks. For in-process synchronization, employ a `ConcurrentDictionary` to manage `SemaphoreSlim` instances keyed by instance name, ensuring that two `Wrapper` instances within the same process don't race on the same LocalDB instance. For cross-process synchronization, use a lock file in `%TEMP%` with `FileShare.None`, preventing other processes from starting until the current instance has completed its startup task. This fix addresses the concurrency issues demonstrated by `ConcurrentStartTests` in single and multi-process scenarios. Such synchronization ensures no template rebuilds occur unexpectedly, avoiding potential races during internal state setup. The `WrapperTests` suite remains passing, confirming that existing functionality is preserved. These changes are motivated by the need to prevent the awkward interactions that can arise when multiple processes or threads attempt to interact with the same LocalDB instance simultaneously, posing risks of database corruption or test flakiness. * Revert "fix: serialize LocalDB `Start` concurency issue" This reverts commit f15cf15. * update to detect if an AI is running the test session and prefix the localdb * fix test instance --------- Co-authored-by: Brandt <brandt.vavasour@devsoul.onmicrosoft.com>
1 parent 874a92a commit 2cafc03

14 files changed

Lines changed: 768 additions & 3 deletions

pages/directory-and-name-resolution.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ public Task<SqlDatabase> Build(
113113
string? databaseSuffix = null,
114114
[CallerMemberName] string memberName = "")
115115
```
116-
<sup><a href='/src/LocalDb/SqlInstance.cs#L133-L155' title='Snippet source file'>snippet source</a> | <a href='#snippet-ConventionBuildSignature' title='Start of snippet'>anchor</a></sup>
116+
<sup><a href='/src/LocalDb/SqlInstance.cs#L138-L160' title='Snippet source file'>snippet source</a> | <a href='#snippet-ConventionBuildSignature' title='Start of snippet'>anchor</a></sup>
117117
<!-- endSnippet -->
118118

119119
With these parameters the database name is the derived as follows:
@@ -150,7 +150,7 @@ If full control over the database name is required, there is an overload that ta
150150
/// </summary>
151151
public async Task<SqlDatabase> Build(string dbName)
152152
```
153-
<sup><a href='/src/LocalDb/SqlInstance.cs#L170-L177' title='Snippet source file'>snippet source</a> | <a href='#snippet-ExplicitBuildSignature' title='Start of snippet'>anchor</a></sup>
153+
<sup><a href='/src/LocalDb/SqlInstance.cs#L175-L182' title='Snippet source file'>snippet source</a> | <a href='#snippet-ExplicitBuildSignature' title='Start of snippet'>anchor</a></sup>
154154
<!-- endSnippet -->
155155

156156
Which can be used as follows:
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
<Project Sdk="Microsoft.NET.Sdk">
2+
<PropertyGroup>
3+
<OutputType>Exe</OutputType>
4+
<TargetFramework>net10.0</TargetFramework>
5+
<SignAssembly>true</SignAssembly>
6+
<AssemblyOriginatorKeyFile>..\key.snk</AssemblyOriginatorKeyFile>
7+
<RootNamespace>LocalDb.MultiProcessHelper</RootNamespace>
8+
<GeneratePackageOnBuild>false</GeneratePackageOnBuild>
9+
<ImplicitUsings>enable</ImplicitUsings>
10+
<Nullable>enable</Nullable>
11+
</PropertyGroup>
12+
<ItemGroup>
13+
<ProjectReference Include="..\LocalDb\LocalDb.csproj" />
14+
<PackageReference Include="ProjectDefaults" PrivateAssets="all" />
15+
</ItemGroup>
16+
</Project>
Lines changed: 189 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,189 @@
1+
// Child-process driver for the multi-process race tests.
2+
// Mode "wrapper-start": <wrapper-start> <instanceName> <directory> <signalFile>
3+
// Reproduces the symmetric race — runs Wrapper.Start once and reports the outcome.
4+
// Mode "killer": <killer> <instanceName> <signalFile> <durationMs>
5+
// Calls LocalDbApi.StopAndDelete(name) in a tight loop for the given duration.
6+
// Mode "victim": <victim> <instanceName> <signalFile> <durationMs>
7+
// Opens a SqlConnection to (LocalDb)\name in a tight loop, captures the first
8+
// exception whose Win32 native error code is 0x89C50107 (LOCALDB_ERROR_INSTANCE_DOES_NOT_EXIST)
9+
// and exits 0 to signal "race observed". Any other failure exits 1. If no error fires
10+
// within the duration, exits 2 ("race not observed in window").
11+
12+
using System.ComponentModel;
13+
using LocalDb;
14+
using Microsoft.Data.SqlClient;
15+
16+
if (args.Length < 1)
17+
{
18+
Console.Error.WriteLine("Usage: <mode> <args...> (mode is wrapper-start | killer | victim)");
19+
return 64;
20+
}
21+
22+
var mode = args[0];
23+
return mode switch
24+
{
25+
"wrapper-start" => await RunWrapperStartAsync(args.AsSpan()[1..].ToArray()),
26+
"killer" => await RunKillerAsync(args.AsSpan()[1..].ToArray()),
27+
"victim" => await RunVictimAsync(args.AsSpan()[1..].ToArray()),
28+
_ => Fail($"Unknown mode: {mode}")
29+
};
30+
31+
int Fail(string message)
32+
{
33+
Console.Error.WriteLine(message);
34+
return 64;
35+
}
36+
37+
async Task<int> RunWrapperStartAsync(string[] args)
38+
{
39+
if (args.Length < 3)
40+
{
41+
return Fail("wrapper-start usage: <instanceName> <directory> <signalFile>");
42+
}
43+
var instanceName = args[0];
44+
var directory = args[1];
45+
var signalFile = args[2];
46+
47+
await WaitForSignalAsync(signalFile);
48+
49+
try
50+
{
51+
using var wrapper = new Wrapper(instanceName, directory);
52+
Func<SqlConnection, Task> noOp = _ => Task.CompletedTask;
53+
wrapper.Start(new DateTime(2000, 1, 1), noOp);
54+
await wrapper.AwaitStart();
55+
Console.Out.WriteLine($"pid {Environment.ProcessId}: success");
56+
return 0;
57+
}
58+
catch (Exception exception)
59+
{
60+
ReportException(exception);
61+
return 1;
62+
}
63+
}
64+
65+
async Task<int> RunKillerAsync(string[] args)
66+
{
67+
if (args.Length < 3)
68+
{
69+
return Fail("killer usage: <instanceName> <signalFile> <durationMs>");
70+
}
71+
var instanceName = args[0];
72+
var signalFile = args[1];
73+
var durationMs = int.Parse(args[2]);
74+
75+
await WaitForSignalAsync(signalFile);
76+
77+
var deadline = Environment.TickCount64 + durationMs;
78+
var killCount = 0;
79+
while (Environment.TickCount64 < deadline)
80+
{
81+
try
82+
{
83+
LocalDbApi.StopAndDelete(instanceName);
84+
killCount++;
85+
}
86+
catch
87+
{
88+
// Expected — the instance may already be gone, or a victim is using it. Keep hammering.
89+
}
90+
}
91+
Console.Out.WriteLine($"pid {Environment.ProcessId} killer: {killCount} StopAndDelete cycles");
92+
return 0;
93+
}
94+
95+
async Task<int> RunVictimAsync(string[] args)
96+
{
97+
if (args.Length < 3)
98+
{
99+
return Fail("victim usage: <instanceName> <signalFile> <durationMs>");
100+
}
101+
var instanceName = args[0];
102+
var signalFile = args[1];
103+
var durationMs = int.Parse(args[2]);
104+
105+
await WaitForSignalAsync(signalFile);
106+
107+
var connectionString = $@"Data Source=(LocalDb)\{instanceName};Initial Catalog=master;Pooling=False;Connect Timeout=2";
108+
var deadline = Environment.TickCount64 + durationMs;
109+
var attempts = 0;
110+
Exception? otherError = null;
111+
112+
while (Environment.TickCount64 < deadline)
113+
{
114+
attempts++;
115+
try
116+
{
117+
await using var connection = new SqlConnection(connectionString);
118+
await connection.OpenAsync();
119+
}
120+
catch (SqlException sql)
121+
{
122+
if (HasNativeCode(sql, unchecked((int)0x89C50107)))
123+
{
124+
Console.Out.WriteLine(
125+
$"pid {Environment.ProcessId} victim: observed LOCALDB_ERROR_INSTANCE_DOES_NOT_EXIST (0x89C50107) on attempt {attempts}: {FirstLine(sql.Message)}");
126+
return 0;
127+
}
128+
otherError = sql;
129+
}
130+
catch (Exception other)
131+
{
132+
otherError = other;
133+
}
134+
}
135+
136+
if (otherError == null)
137+
{
138+
Console.Error.WriteLine($"pid {Environment.ProcessId} victim: no errors after {attempts} attempts in {durationMs}ms");
139+
return 2;
140+
}
141+
142+
Console.Error.WriteLine($"pid {Environment.ProcessId} victim: {attempts} attempts, no 0x89C50107; last other error: {otherError.GetType().Name}: {FirstLine(otherError.Message)}");
143+
var inner = otherError.InnerException;
144+
while (inner != null)
145+
{
146+
Console.Error.WriteLine($" inner: {inner.GetType().Name}: {FirstLine(inner.Message)}");
147+
inner = inner.InnerException;
148+
}
149+
return 1;
150+
}
151+
152+
bool HasNativeCode(Exception exception, int code)
153+
{
154+
var current = exception;
155+
while (current != null)
156+
{
157+
if (current is Win32Exception win32 && win32.NativeErrorCode == code)
158+
{
159+
return true;
160+
}
161+
current = current.InnerException;
162+
}
163+
return false;
164+
}
165+
166+
async Task WaitForSignalAsync(string signalFile)
167+
{
168+
while (!File.Exists(signalFile))
169+
{
170+
await Task.Delay(20);
171+
}
172+
}
173+
174+
void ReportException(Exception exception)
175+
{
176+
Console.Error.WriteLine($"pid {Environment.ProcessId}: {exception.GetType().Name}: {FirstLine(exception.Message)}");
177+
var inner = exception.InnerException;
178+
while (inner != null)
179+
{
180+
Console.Error.WriteLine($" inner: {inner.GetType().Name}: {FirstLine(inner.Message)}");
181+
if (inner is Win32Exception win32)
182+
{
183+
Console.Error.WriteLine($" NativeErrorCode: 0x{win32.NativeErrorCode:X8}");
184+
}
185+
inner = inner.InnerException;
186+
}
187+
}
188+
189+
string FirstLine(string message) => message.Replace("\r", "").Split('\n')[0];
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
# Multi-process race reproducers
2+
3+
This folder is a regression-test scaffold, not a shipped artifact. It exists to deterministically reproduce a class of races in `Wrapper.InnerStart` that surface when multiple OS processes share a single LocalDB user instance.
4+
5+
## Symptom
6+
7+
When two test-host processes (e.g. two `dotnet test` invocations, or Rider's runner concurrent with a CLI run) target the same `SqlInstance<T>` for the same Windows user, intermittent failures appear with stack traces like:
8+
9+
```
10+
SetUp : Microsoft.Data.SqlClient.SqlException :
11+
A network-related or instance-specific error occurred while establishing a connection to SQL Server.
12+
... error: 50 - Local Database Runtime error occurred.
13+
The specified LocalDB instance does not exist.
14+
----> System.ComponentModel.Win32Exception : Unknown error (0x89c50107)
15+
at Wrapper.OpenMasterConnection() in C:\projects\localdb\src\LocalDb\Wrapper.cs:line 274
16+
at Wrapper.CreateAndDetachTemplate(...) in ...:line 229
17+
at Wrapper.CreateDatabaseFromTemplate(String name) in ...:line 83
18+
at EfLocalDb.SqlInstance`1.Build(String, IEnumerable`1) in ...:line 73
19+
at EfLocalDbNunit.LocalDbTestBase`1.Reset() in ...:line 85
20+
```
21+
22+
The `0x89C50107` native code is `LOCALDB_ERROR_INSTANCE_DOES_NOT_EXIST`. Other manifestations of the same underlying race include SQL deadlocks during `CREATE DATABASE [template]` and `Operating system error 2: cannot find the file specified` on `template.mdf`.
23+
24+
Once a machine is in this state it tends to stay broken: every subsequent `dotnet test` triggers the same race because the wrapper directory is empty (no `template.mdf`), so every process re-runs the destructive `StopAndDelete + CleanStart` branch.
25+
26+
## Root cause
27+
28+
`Wrapper.InnerStart` (LocalDb.csproj, `Wrapper.cs`):
29+
30+
```csharp
31+
var info = LocalDbApi.GetInstance(instance);
32+
if (!info.Exists) { CleanStart(); return; }
33+
if (!info.IsRunning) { LocalDbApi.StartInstance(instance); }
34+
if (!File.Exists(DataFile))
35+
{
36+
LocalDbApi.StopAndDelete(instance);
37+
CleanStart(); // CreateInstance + StartInstance + CreateAndDetachTemplate
38+
return;
39+
}
40+
```
41+
42+
There are two unsynchronized concurrency surfaces here:
43+
44+
1. **In-process**`Wrapper.semaphoreSlim` is declared but never `WaitAsync`'d; two `Wrapper` instances for the same instance name running in the same process race on `LocalDbApi.*` calls and on the SQL DDL inside `CreateAndDetachTemplate`.
45+
2. **Cross-process** — even if (1) were fixed with an in-process lock, the `LocalDbApi.*` calls reach into the per-Windows-user LocalDB metadata, which is shared across all processes belonging to that user. Two processes both running `InnerStart` against the same instance race on `StopAndDelete` / `CreateInstance` / `StartInstance` and on the same master DB.
46+
47+
Both surfaces dissolve under one fix: serialize `InnerStart` per instance name with an in-process lock **and** a named cross-process mutex.
48+
49+
## The three reproducer tests
50+
51+
| Test | Race surface | Failure surfaced |
52+
|---|---|---|
53+
| `ConcurrentStartTests.ConcurrentStartWithMissingTemplateShouldNotRace` | In-process (two `Wrapper` instances, one process, no helper exe) | SQL deadlock 1205 during `CREATE DATABASE [template]` |
54+
| `MultiProcessConcurrentStartTests.MultiProcessConcurrentStartShouldNotRace` | Multi-process, symmetric (3 child processes all running `Wrapper.Start`) | SQL deadlock OR `template.mdf` not found OR `0x89C50107` (varies by timing) |
55+
| `InstanceDoesNotExistRaceTests.KillerVsVictimSurfacesInstanceDoesNotExist` | Multi-process, asymmetric (one killer hammering `StopAndDelete`, one victim opening `SqlConnection`) | **Exact `0x89C50107` deterministically** — victim only exits 0 when it observes that specific code |
56+
57+
## Why each part exists
58+
59+
### `LocalDb.MultiProcessHelper` project
60+
61+
The asymmetric/multi-process tests need to spawn separate Windows processes via `Process.Start`. A Windows process needs an executable; an executable needs an entry point; that entry point lives in `Program.cs`.
62+
63+
We can't reuse `LocalDb.Tests.exe` for this — its entry point is owned by the test runner (NUnit + Microsoft.Testing.Platform), and we'd have to either fight the runner or invoke `dotnet test --filter` recursively (slow and awkward). A purpose-built console exe is simpler and faster.
64+
65+
### `Program.cs` with three modes (`wrapper-start`, `killer`, `victim`)
66+
67+
Different tests need different child behaviors. Rather than ship three executables, the same exe takes a mode argument:
68+
69+
- **`wrapper-start`** — full `Wrapper.Start` cycle. Used by the symmetric multi-process test where every child runs the same code path.
70+
- **`killer`** — bare `LocalDbApi.StopAndDelete(name)` in a tight loop. Maximizes the chance of catching a victim mid-handshake.
71+
- **`victim`**`SqlConnection.OpenAsync` in a tight loop, walking exception chains for `Win32Exception.NativeErrorCode == 0x89C50107`. Exits 0 the first time it observes that exact code, exits 1/2 otherwise.
72+
73+
Splitting the killer and victim into separate processes is what makes `0x89C50107` reliably reproducible — symmetric children all running `Wrapper.Start` race on multiple things at once and surface a mix of error types; the asymmetric setup isolates the specific race window where the LocalDB API resolves the instance name as "does not exist."
74+
75+
### Strong-name signing (`SignAssembly` + `..\key.snk` in the .csproj)
76+
77+
`Wrapper`, `LocalDbApi`, and `DirectoryFinder` are `internal` types in the LocalDb assembly. The LocalDb assembly is strong-named and grants `InternalsVisibleTo` only to assemblies whose public key matches a specific `PublicKey=...` blob. For the helper to use those internal types, it must be signed with the same key. `..\key.snk` is the existing project-wide signing key (the same one Benchmark uses).
78+
79+
Alternative considered: drive the race entirely through `EfLocalDb.SqlInstance<T>` (a public API). That works but requires defining a `DbContext` and adds EF Core to the helper's dependency surface. Reaching for `Wrapper` directly keeps the helper minimal and exercises exactly the layer where the race lives.
80+
81+
### `InternalsVisibleTo` entry for `LocalDb.MultiProcessHelper`
82+
83+
Standard IVT plumbing — added next to the existing entries in `src/LocalDb/InternalsVisibleTo.cs`. Same `PublicKey=` blob as the others (it's the public half of `key.snk`).
84+
85+
### `<ProjectReference ... ReferenceOutputAssembly="false" Private="false" />` in `LocalDb.Tests.csproj`
86+
87+
The test project does **not** want to link the helper's assembly into its own output — it only wants the helper exe to exist on disk before tests run. `ReferenceOutputAssembly="false"` says "build it, but don't add a reference to its DLL in my compile inputs." `Private="false"` says "don't copy its outputs into my bin folder." With both set, the helper builds whenever the test project does (so a fresh `dotnet test` always finds an up-to-date helper), but there's no compile-time coupling between them.
88+
89+
The test resolves the helper path at runtime via `HelperExeResolver.cs`, which walks up from the test's `bin/<Config>/net10.0/` to find the sibling project's matching `bin/<Config>/net10.0/LocalDb.MultiProcessHelper.exe`.
90+
91+
### `LocalDb.slnx` entry
92+
93+
Nothing surprising — registers the new project so tooling (Rider, Visual Studio, `dotnet sln` operations) sees it. Without this the project still builds via the test project's `ProjectReference`, but it won't appear in solution-level views.
94+
95+
### Signal-file barrier (`signalFile` argument)
96+
97+
`Process.Start` spin-up jitter is on the order of 100–300 ms — wider than the actual race window for `0x89C50107`, which is microseconds. If children just started running their work immediately, the slowest child would always lose the race in a predictable order, and the test would be flaky.
98+
99+
The barrier flips this around: each child spawns, waits in a polling loop for a signal file to appear, and only proceeds once the parent test creates that file. The parent waits 750 ms after spawning all children (giving them enough time to load their CLR and reach the wait loop), then writes the signal — releasing them within a few ms of each other. That's tight enough to land the children in the actual race window reliably.
100+
101+
### `HelperExeResolver` (shared lookup)
102+
103+
Both multi-process tests need to find the helper exe at runtime, and the path resolution is non-trivial enough to want one place to update if the build layout changes. Pulling it out also avoids duplicate logic that could drift between the two tests.
104+
105+
## Suggested fix in `LocalDb`
106+
107+
Wire up the existing `Wrapper.semaphoreSlim` field around `InnerStart`'s body to handle the in-process race, and add a named OS mutex keyed on the instance name (e.g. `Global\\LocalDb_Wrapper_InnerStart_{instanceName}`) around the entire `InnerStart` operation to handle the cross-process race. Both tests in this folder should pass once that lock is in place; if either still fails, the lock isn't covering the right span.
108+
109+
## Running the tests
110+
111+
```powershell
112+
dotnet test src/LocalDb.Tests/LocalDb.Tests.csproj `
113+
--configuration Release `
114+
--filter "FullyQualifiedName~ConcurrentStart|FullyQualifiedName~MultiProcessConcurrentStart|FullyQualifiedName~KillerVsVictim"
115+
```
116+
117+
The deterministic `KillerVsVictimSurfacesInstanceDoesNotExist` finishes in ~8 s. The symmetric `MultiProcessConcurrentStartShouldNotRace` finishes in ~15-30 s. The in-process `ConcurrentStartWithMissingTemplateShouldNotRace` finishes in ~2 minutes (it intentionally rebuilds the template 5× for a non-flaky signal).

0 commit comments

Comments
 (0)