Skip to content
Open
Show file tree
Hide file tree
Changes from 66 commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
acdf1c6
feat: Update schema to version 104 and enhance MergeSearchParams stor…
jestradaMS Feb 26, 2026
a76ad96
Merge branch 'main' into users/jestrada/atomicsearchparameteroperations
jestradaMS Mar 3, 2026
b9e1733
fixing 104.sql
jestradaMS Mar 3, 2026
efc97f3
Enhance reindexing and search parameter operations with retry policie…
jestradaMS Mar 3, 2026
ce3d1af
Refactor search parameter operations to consolidate validation logic …
jestradaMS Mar 4, 2026
5d806a2
fixing appsettings that should not have been checked in
jestradaMS Mar 4, 2026
ef9b530
fixing white space after comment
jestradaMS Mar 4, 2026
35cd84b
Implement pending search parameter status handling in bundle operatio…
jestradaMS Mar 6, 2026
0d7ca29
Refactor transaction handling in MergeSearchParams to improve concurr…
jestradaMS Mar 6, 2026
64fef20
Clarify comments regarding pending search parameter status handling i…
jestradaMS Mar 6, 2026
3ba9312
Update ADR 2602 to implement atomic SearchParameter CRUD operations, …
jestradaMS Mar 6, 2026
fd27c7b
Update schema version to 108 and adjust related constants and conditions
jestradaMS Mar 6, 2026
318591e
ADR 2602: Atomic SearchParameter CRUD Operations
jestradaMS Mar 6, 2026
d994a32
Add ADR 2603: Atomic SearchParameter CRUD Operations and Cache Refres…
jestradaMS Mar 6, 2026
ea6c08f
Remove ADR 2602: Atomic SearchParameter CRUD Operations document
jestradaMS Mar 6, 2026
59e0acb
Merge branch 'main' into users/jestrada/atomicsearchparameteradr
jestradaMS Mar 10, 2026
0a24ed5
Merge branch 'main' into users/jestrada/atomicsearchparameteroperations
jestradaMS Mar 10, 2026
9b74e37
Update schema versioning to V107
jestradaMS Mar 10, 2026
67bef92
Adding OR ALTER to migration diff file.
jestradaMS Mar 10, 2026
04aa622
Remove redundant search parameter update calls in reindex handlers an…
jestradaMS Mar 10, 2026
999ac0c
Merge branch 'users/jestrada/atomicsearchparameteradr' into users/jes…
jestradaMS Mar 11, 2026
c5fe031
removing old 2602 adr for atomic search parameters
jestradaMS Mar 11, 2026
f1e5520
adding test for ensuring cache is not mutated when status manager is …
jestradaMS Mar 11, 2026
c9957c7
removing 107 files
jestradaMS Mar 17, 2026
7f912b9
Merge branch 'main' into users/jestrada/atomicsearchparameteroperations
jestradaMS Mar 17, 2026
5784146
Bump schema version to 108
jestradaMS Mar 17, 2026
b14f3cd
Implement search parameter operations to handle reindexing conflicts …
jestradaMS Mar 17, 2026
b00ce93
Refactor GetSearchParametersByUrls method to simplify URL resolution …
jestradaMS Mar 17, 2026
be7f304
Improve URL resolution logic from GetSearchParameterByUrls method
jestradaMS Mar 18, 2026
47cc2ac
Fixing SP State update handler tests
jestradaMS Mar 18, 2026
09a1c10
Fixing tests
jestradaMS Mar 18, 2026
7cae0ad
adding logging to reindex failures
jestradaMS Mar 18, 2026
5d83f96
adding Sergey's test for 500 serach parameters
jestradaMS Mar 18, 2026
a9239b7
Refactor reindex job to batch update disabled and deleted search para…
jestradaMS Mar 18, 2026
213164c
Updating new test for large custom search params
jestradaMS Mar 19, 2026
82245bb
Refactor ReindexTests to support dynamic search parameter handling an…
jestradaMS Mar 19, 2026
d0daa62
skipping flaky test to test run again
jestradaMS Mar 19, 2026
ade3cfe
updating to use output in ReindexTests opposed to System Diag
jestradaMS Mar 19, 2026
2734794
Updates per Sergey's comments
jestradaMS Mar 19, 2026
46df45c
removing test
jestradaMS Mar 19, 2026
899dc01
fixing MergeResourcesWrapperAsync
jestradaMS Mar 19, 2026
cd0d5c9
testing increased retries for reliability of tests
jestradaMS Mar 20, 2026
c4c8bb8
Removing old stored proc per feedback on PR
jestradaMS Mar 20, 2026
8dda219
Enhance search parameter operations with cache refresh cycle handling
jestradaMS Mar 20, 2026
a3290ea
Add ADR 2603 to address reindex cache race conditions and improve not…
jestradaMS Mar 20, 2026
58f9b6f
Fixing integration tests
jestradaMS Mar 20, 2026
0f5145b
adding timeout to ensure we don't hang if reresh never signals.
jestradaMS Mar 20, 2026
00d7275
Disposal update in integration tests
jestradaMS Mar 20, 2026
459adee
Ensure proper disposal of timer in SearchParameterCacheRefreshBackgro…
jestradaMS Mar 20, 2026
dddee07
fixing refreshtimer disoposal
jestradaMS Mar 20, 2026
baedead
fixing tests
jestradaMS Mar 20, 2026
4d97074
Refactor Search Parameter Cache Refresh Logic and Enhance Cache Consi…
jestradaMS Mar 21, 2026
224b41f
Fixing style cop issue
jestradaMS Mar 21, 2026
192e009
Fixing signals to ensure they log even if no changes.
jestradaMS Mar 23, 2026
9b6402c
Testing with LegacyInitializeSearchParameterStatuses
jestradaMS Mar 23, 2026
0f2d317
comment out code
jestradaMS Mar 23, 2026
92e7c98
updating convergence logic in operations
jestradaMS Mar 23, 2026
a076be6
trailing space :(
jestradaMS Mar 23, 2026
7583792
adding logging to cache consistency check
jestradaMS Mar 24, 2026
2e31e83
fixing casting in proc
jestradaMS Mar 24, 2026
c47c185
fix white space diff
jestradaMS Mar 24, 2026
6d442cc
Changing convergence check time to 30 second delay
jestradaMS Mar 24, 2026
95ddad7
Enhance cache consistency checks with sync timestamps and active host…
jestradaMS Mar 24, 2026
9ad7a76
Refactor cache update logic to ensure accurate timestamp logging and …
jestradaMS Mar 24, 2026
01ea4db
Reducing logging only when there is an active waiter
jestradaMS Mar 24, 2026
da818c1
Update syncStartDate to align with active host detection for cache co…
jestradaMS Mar 24, 2026
c2033df
Adding back blind wait for Cosmos
jestradaMS Mar 25, 2026
a1e6977
Fix: Update resource type check to remove unnecessary whitespace cond…
jestradaMS Mar 25, 2026
218172b
removing 108 on this branch to merge in from main
jestradaMS Mar 27, 2026
aebd351
Merge branch 'main' into users/jestrada/atomicsearchparameteroperations
jestradaMS Mar 27, 2026
42a1235
updated to schema 109 post main merge
jestradaMS Mar 27, 2026
dd30efc
Refactor transaction handling in MergeSearchParams to simplify rollba…
jestradaMS Mar 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# ADR 2603: Atomic SearchParameter CRUD Operations
Labels: [SQL](https://github.com/microsoft/fhir-server/labels/Area-SQL) | [Core](https://github.com/microsoft/fhir-server/labels/Area-Core) | [SearchParameter](https://github.com/microsoft/fhir-server/labels/Area-SearchParameter)

## Context
SearchParameter create, update, and delete operations require two coordinated writes: the SearchParameter status row (`dbo.SearchParam` table) and the resource itself (`dbo.Resource` via `dbo.MergeResources`). Previously, these writes occurred in separate steps in the request pipeline, creating partial-failure windows where one could succeed while the other fails — producing orphaned or inconsistent state.

Composing these writes into a single atomic SQL operation introduced a new problem for transaction bundles: `dbo.MergeSearchParams` acquires an exclusive lock on `dbo.SearchParam` per entry, and that lock is held by the outer bundle transaction. When the next entry's behavior pipeline calls `GetAllSearchParameterStatus` on a separate connection, it blocks on the same table, causing a timeout. This required a deferred-flush approach for transaction bundles.

Additionally, SearchParameter CRUD behaviors previously performed direct in-memory cache mutations (`AddNewSearchParameters`, `DeleteSearchParameter`, etc.) during the request pipeline. This duplicated responsibility with the `SearchParameterCacheRefreshBackgroundService`, which already polls the database and applies cache updates across all instances.

Key considerations:
- Eliminating partial-commit windows between status and resource persistence.
- Handling the lock contention on `dbo.SearchParam` introduced by composed writes in transaction bundles.
- Simplifying cache ownership by removing direct cache mutations from the CRUD request path.
- Preserving existing behavior for non-SearchParameter resources and Cosmos DB paths.

## Decision
We implement three complementary changes:

### 1. Composed writes for single operations (SQL)
For individual SearchParameter CRUD, behaviors queue pending status updates in request context (`SearchParameter.PendingStatusUpdates`) instead of persisting directly. `SqlServerFhirDataStore` detects pending statuses and calls `dbo.MergeSearchParams` (which internally calls `dbo.MergeResources`) so both writes execute in one stored-procedure-owned transaction.

### 2. Deferred flush for transaction bundles
For transaction bundles, per-entry resource upserts call `dbo.MergeResources` only (no SearchParam table touch), avoiding the exclusive lock. `BundleHandler` accumulates pending statuses across all entries and flushes them in a single `dbo.MergeSearchParams` call at the end of the bundle, still within the outer transaction scope.

```mermaid
graph TD;
A[SearchParameter Operation] -->|Single operation| B[Behavior queues pending status in request context];
B --> C[SqlServerFhirDataStore calls MergeSearchParams];
C --> D[MergeSearchParams: status + MergeResources in one transaction];
A -->|Transaction bundle| E[Per-entry: Behavior queues status, UpsertAsync calls MergeResources only];
E --> F[BundleHandler drains pending statuses after each entry];
F --> G[After all entries: single MergeSearchParams flush within outer transaction];
G --> H[Transaction commits atomically];
```

### 3. Cache update removal from CRUD path
Copy link
Copy Markdown
Contributor

@SergeyGaluzo SergeyGaluzo Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update search param workflow calls validate logic that triggers cache update, so, currently, cache update is run 2 times for an update workflow. How will proposed change work in regards of validate?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently these checks are still required due to the limitations in Cosmos. I did look to removing them and handle with Sql Deadlock / OptimisticConcurrency error handling and throw a 409, but Cosmos would still require these upper-level checks. For now, it's acceptable to ensure consistency at this layer until we are able to remove the Cosmos code path and move this all to SQL layer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think situation is worse than you describe :-) as any validation (does not matter with Cosmos or with SQL) requires cache being in sync. I don't think sync will go away after gen 1 is retired.
This means, we need to add work item to ensure that sync via background worker and on demand sync required by validation can coexist without cache pollution.

SearchParameter CRUD behaviors no longer perform direct in-memory cache mutations. All cache updates (`AddNewSearchParameters`, `DeleteSearchParameter`, `UpdateSearchParameterStatus`) are now solely owned by the `SearchParameterCacheRefreshBackgroundService`, which periodically polls the database via `GetAndApplySearchParameterUpdates`. This simplifies the CRUD path and ensures consistent cache convergence across distributed instances.

### Scope
- **SQL Server**: Full atomic guarantees for create, update, and delete of SearchParameter resources, both single and in transaction bundles.
- **Cosmos DB**: Pending statuses are flushed after resource upsert (improved sequencing, not a single transactional unit).
- **Unchanged**: Non-SearchParameter CRUD, existing SearchParameter status lifecycle states, cache convergence model.

## Status
Pending acceptance

## Consequences
- **Positive Impacts:**
- Eliminates orphaned status/resource records from partial commits.
- Clarifies ownership: behaviors queue intent, data stores persist atomically, background service owns cache.
- Deferred-flush approach avoids lock contention introduced by composed writes in transaction bundles.
- Removing cache mutations from CRUD simplifies the request path and eliminates a class of cache-divergence bugs.

- **Potential Drawbacks:**
- Increased complexity in request context, data store, and bundle handler coordination.
- SQL schema migration required (`MergeSearchParams` expanded to accept resource TVPs).
- Eventual consistency window: cache may lag behind database until the next background refresh cycle.
- Cosmos DB path remains best-effort sequencing rather than true atomic commit.

Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,9 @@
using Microsoft.Health.Fhir.Core.Features.Operations.BulkDelete;
using Microsoft.Health.Fhir.Core.Features.Persistence;
using Microsoft.Health.Fhir.Core.Features.Search;
using Microsoft.Health.Fhir.Core.Features.Search.Parameters;
using Microsoft.Health.Fhir.Core.Messages.Delete;
using Microsoft.Health.Fhir.Core.Models;
using Microsoft.Health.Fhir.Core.UnitTests.Extensions;
using Microsoft.Health.Fhir.Tests.Common;
using Microsoft.Health.JobManagement;
Expand All @@ -24,6 +26,8 @@
using NSubstitute;
using Xunit;

using FhirJobConflictException = global::Microsoft.Health.Fhir.Core.Features.Operations.JobConflictException;

namespace Microsoft.Health.Fhir.Core.UnitTests.Features.Operations.BulkDelete
{
[Trait(Traits.OwningTeam, OwningTeam.Fhir)]
Expand All @@ -32,6 +36,7 @@
{
private IDeletionService _deleter;
private BulkDeleteProcessingJob _processingJob;
private ISearchParameterOperations _searchParameterOperations;
private ISearchService _searchService;
private IQueueClient _queueClient;

Expand All @@ -42,7 +47,8 @@
.Returns(Task.FromResult(new SearchResult(5, new List<Tuple<string, string>>())));
_queueClient = Substitute.For<IQueueClient>();
_deleter = Substitute.For<IDeletionService>();
_processingJob = new BulkDeleteProcessingJob(_deleter.CreateMockScopeFactory(), Substitute.For<RequestContextAccessor<IFhirRequestContext>>(), Substitute.For<IMediator>(), _searchService.CreateMockScopeFactory(), _queueClient);
_searchParameterOperations = Substitute.For<ISearchParameterOperations>();
_processingJob = new BulkDeleteProcessingJob(_deleter.CreateMockScopeFactory(), Substitute.For<RequestContextAccessor<IFhirRequestContext>>(), Substitute.For<IMediator>(), _searchParameterOperations, _searchService.CreateMockScopeFactory(), _queueClient);
}

[Fact]
Expand Down Expand Up @@ -103,5 +109,23 @@
var actualDefinition = JsonConvert.DeserializeObject<BulkDeleteDefinition>(definitions[0]);
Assert.Equal(2, actualDefinition.Type.SplitByOrSeparator().Count());
}

[Fact]
public async Task GivenProcessingJobForSearchParameter_WhenReindexStartsBeforeExecution_ThenConflictIsThrown()
{
var definition = new BulkDeleteDefinition(JobType.BulkDeleteProcessing, DeleteOperation.HardDelete, KnownResourceTypes.SearchParameter, new List<Tuple<string, string>>(), new List<string>(), "https:\\test.com", "https:\\test.com", "test");
var jobInfo = new JobInfo
{
Id = 1,
Definition = JsonConvert.SerializeObject(definition),
};

_searchParameterOperations
.When(x => x.EnsureNoActiveReindexJobAsync(Arg.Any<CancellationToken>()))
.Do(_ => throw new FhirJobConflictException("reindex running"));

await Assert.ThrowsAsync<FhirJobConflictException>(() => _processingJob.ExecuteAsync(jobInfo, CancellationToken.None));
await _deleter.DidNotReceiveWithAnyArgs().DeleteMultipleAsync(default, default, default);
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -205,5 +205,54 @@ public async Task GivenASPStatusManager_WhenInitializingAndResolverThrowsExcepti
Assert.False(list[2].IsSupported);
Assert.False(list[2].IsPartiallySupported);
}

[Fact]
public async Task GivenASPStatusManager_WhenUpdatingStatus_ThenInMemoryCacheIsNotMutatedAndMediatorIsNotPublished()
{
// Arrange - Initialize so search parameters have known in-memory state
await _manager.EnsureInitializedAsync(CancellationToken.None);

var list = _searchParameterDefinitionManager.GetSearchParameters("Account").ToList();

// Capture initial in-memory state for the Enabled parameter (index 0: ResourceId)
bool initialIsSearchable = list[0].IsSearchable;
bool initialIsSupported = list[0].IsSupported;
bool initialIsPartiallySupported = list[0].IsPartiallySupported;
SortParameterStatus initialSortStatus = list[0].SortStatus;

// Clear any mediator calls from initialization
_mediator.ClearReceivedCalls();

// Act - Call UpdateSearchParameterStatusAsync to change status to Supported
await _manager.UpdateSearchParameterStatusAsync(
new[] { ResourceId },
SearchParameterStatus.Supported,
CancellationToken.None);

// Assert - DB write occurred
await _searchParameterStatusDataStore
.Received(1)
.UpsertStatuses(
Arg.Is<List<ResourceSearchParameterStatus>>(statuses =>
statuses.Count == 1 &&
statuses[0].Uri.OriginalString == ResourceId &&
statuses[0].Status == SearchParameterStatus.Supported),
Arg.Any<CancellationToken>());

// Assert - In-memory SearchParameterInfo was NOT modified
// Re-fetch from the definition manager to make the assertion intent explicit
var refreshedList = _searchParameterDefinitionManager.GetSearchParameters("Account").ToList();
Assert.Equal(initialIsSearchable, refreshedList[0].IsSearchable);
Assert.Equal(initialIsSupported, refreshedList[0].IsSupported);
Assert.Equal(initialIsPartiallySupported, refreshedList[0].IsPartiallySupported);
Assert.Equal(initialSortStatus, refreshedList[0].SortStatus);

// Assert - Mediator was NOT called (no SearchParametersUpdatedNotification)
await _mediator
.DidNotReceive()
.Publish(
Arg.Any<SearchParametersUpdatedNotification>(),
Arg.Any<CancellationToken>());
}
}
}
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// -------------------------------------------------------------------------------------------------
// -------------------------------------------------------------------------------------------------
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License (MIT). See LICENSE in the repo root for license information.
// -------------------------------------------------------------------------------------------------
Expand Down Expand Up @@ -63,7 +63,8 @@ internal static void Build(
searchParameters,
uriDictionary,
modelInfoProvider,
isSystemDefined).ToLookup(
isSystemDefined,
logger).ToLookup(
entry => entry.ResourceType,
entry => entry.SearchParameter);

Expand Down Expand Up @@ -121,7 +122,8 @@ private static SearchParameterInfo GetOrCreateSearchParameterInfo(SearchParamete
IReadOnlyCollection<ITypedElement> searchParamCollection,
ConcurrentDictionary<string, SearchParameterInfo> uriDictionary,
IModelInfoProvider modelInfoProvider,
bool isSystemDefined = false)
bool isSystemDefined,
ILogger logger)
{
var issues = new List<OperationOutcomeIssue>();
var searchParameters = searchParamCollection.Select((x, entryIndex) =>
Expand Down Expand Up @@ -151,8 +153,27 @@ private static SearchParameterInfo GetOrCreateSearchParameterInfo(SearchParamete
{
SearchParameterInfo searchParameterInfo = GetOrCreateSearchParameterInfo(searchParameter, uriDictionary);

// Mark spec-defined search parameters as system-defined
searchParameterInfo.IsSystemDefined = isSystemDefined;
// Mark spec-defined search parameters as system-defined.
// Once marked, this should remain true across subsequent Build calls.
bool wasSystemDefined = searchParameterInfo.IsSystemDefined;
searchParameterInfo.IsSystemDefined |= isSystemDefined;

if (!wasSystemDefined && searchParameterInfo.IsSystemDefined)
{
logger.LogDebug(
"SearchParameter IsSystemDefined enabled: Url={Url}, Code={Code}, BuildIsSystemDefined={BuildIsSystemDefined}",
searchParameterInfo.Url?.OriginalString,
searchParameterInfo.Code,
isSystemDefined);
}
else if (wasSystemDefined && !isSystemDefined)
{
logger.LogWarning(
"SearchParameter IsSystemDefined downgrade ignored: Url={Url}, Code={Code}, BuildIsSystemDefined={BuildIsSystemDefined}",
searchParameterInfo.Url?.OriginalString,
searchParameterInfo.Code,
isSystemDefined);
}

if (searchParameterInfo.Code == "_profile" && searchParameterInfo.Type == SearchParamType.Reference)
{
Expand All @@ -174,6 +195,8 @@ private static SearchParameterInfo GetOrCreateSearchParameterInfo(SearchParamete

EnsureNoIssues();

SearchParameterInfo.ResourceTypeSearchParameter.IsSystemDefined = true;

var validatedSearchParameters = new List<(string ResourceType, SearchParameterInfo SearchParameter)>
{
// _type is currently missing from the search params definition bundle, so we inject it in here.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,9 @@
using Microsoft.Health.Fhir.Core.Features.Operations.BulkDelete.Messages;
using Microsoft.Health.Fhir.Core.Features.Persistence;
using Microsoft.Health.Fhir.Core.Features.Search;
using Microsoft.Health.Fhir.Core.Features.Search.Parameters;
using Microsoft.Health.Fhir.Core.Messages.Delete;
using Microsoft.Health.Fhir.Core.Models;
using Microsoft.Health.JobManagement;
using Newtonsoft.Json;

Expand All @@ -31,19 +33,22 @@ public class BulkDeleteProcessingJob : IJob
private readonly Func<IScoped<IDeletionService>> _deleterFactory;
private readonly RequestContextAccessor<IFhirRequestContext> _contextAccessor;
private readonly IMediator _mediator;
private readonly ISearchParameterOperations _searchParameterOperations;
private readonly Func<IScoped<ISearchService>> _searchService;
private readonly IQueueClient _queueClient;

public BulkDeleteProcessingJob(
Func<IScoped<IDeletionService>> deleterFactory,
RequestContextAccessor<IFhirRequestContext> contextAccessor,
IMediator mediator,
ISearchParameterOperations searchParameterOperations,
Func<IScoped<ISearchService>> searchService,
IQueueClient queueClient)
{
_deleterFactory = EnsureArg.IsNotNull(deleterFactory, nameof(deleterFactory));
_contextAccessor = EnsureArg.IsNotNull(contextAccessor, nameof(contextAccessor));
_mediator = EnsureArg.IsNotNull(mediator, nameof(mediator));
_searchParameterOperations = EnsureArg.IsNotNull(searchParameterOperations, nameof(searchParameterOperations));
_searchService = EnsureArg.IsNotNull(searchService, nameof(searchService));
_queueClient = EnsureArg.IsNotNull(queueClient, nameof(queueClient));
}
Expand Down Expand Up @@ -78,6 +83,13 @@ public async Task<string> ExecuteAsync(JobInfo jobInfo, CancellationToken cancel
Exception exception = null;
List<string> types = definition.Type.SplitByOrSeparator().ToList();

if (types.Count > 0
&& string.Equals(types[0], KnownResourceTypes.SearchParameter, StringComparison.OrdinalIgnoreCase)
&& !(definition.ExcludedResourceTypes?.Any(x => string.Equals(x, KnownResourceTypes.SearchParameter, StringComparison.OrdinalIgnoreCase)) ?? false))
{
await _searchParameterOperations.EnsureNoActiveReindexJobAsync(cancellationToken);
Copy link
Copy Markdown
Contributor

@SergeyGaluzo SergeyGaluzo Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several problems with this implementation of "running reindex" check:

  1. This check is not transacted with search param write, so there are always chances for racing - successful check, reindex started, search param write, resources are indexed incorrectly.
  2. It is up to each workflow that allow search param writes to call this check. This increases chances that it is missed in code, and reindex is not protected.

Implementations that is free from the above problems is to move this check to MergeSearchParams stored procedure in the same SQL transaction that runs write, and rollback this transaction if check does not pass. For reindex orchestrator to bypass this check we can add its job id as input.

MergeSearchParam is already rewritten in this PR significantly, so it is right time to add above logic and get robust processing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change closes the BulkDelete gap and ensures this path now checks for active reindex instead of bypassing the rule entirely. Moving the enforcement further down the stack is a larger follow-up: we would need to handle Cosmos correctly and define an internal override mechanism for flows like reindex, which must be able to update search parameter status while reindex is running. I think those extra considerations warrant a separate follow-up item rather than adding that additional improvement now as the PR is already pretty large. It is at least centralized in SearchParameterOperations which clarifies the intent and creates a pattern of centralized check.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that moving down the stack is more work, but it is not that more.
For example, we do not need to deal with Cosmos in the same way as with SQL. Just add a separate check call in the corresponding Cosmos Data Store. Passing extra nullable job id parameter is not big either, especially if AI generates tests.
Please point me to the test added to check bulk delete and reindex coordination.

Copy link
Copy Markdown
Contributor Author

@jestradaMS jestradaMS Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to create a follow-up item for this work. Below is the test for bulk delete reindex coordination.

src/Microsoft.Health.Fhir.Core.UnitTests/Features/Operations/BulkDelete/BulkDeleteProcessingJobTests.cs

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add repair item.
As far as tests, I expected e2e test that tests this functionality. Do we have one?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not e2e, I will add an e2e test to cover this.

Copy link
Copy Markdown
Contributor

@SergeyGaluzo SergeyGaluzo Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this task https://microsofthealth.visualstudio.com/Health/_workitems/edit/186796/

BTW Would we absolutely need e2e tests if above task was implemented? Could not we just rely on current exception trapping?

}

try
{
resourcesDeleted = await deleter.Value.DeleteMultipleAsync(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -69,11 +69,6 @@ public async Task<CreateReindexResponse> Handle(CreateReindexRequest request, Ca
return new CreateReindexResponse(existingJob);
}

// We need to pull in latest search parameter updates from the data store before creating a reindex job.
// There could be a potential delay of <see cref="ReindexJobConfiguration.JobPollingFrequency"/> before
// search parameter updates on one instance propagates to other instances.
await _searchParameterOperations.GetAndApplySearchParameterUpdates(cancellationToken);

// What this handles is the scenario where a user is effectively forcing a reindex to run by passing
// in a parameter of targetSearchParameterTypes. From those we can identify the base resource types.
var searchParameterResourceTypes = new HashSet<string>();
Expand Down
Loading
Loading