-
Notifications
You must be signed in to change notification settings - Fork 648
Update health check to ensure blob containers created at right time #9159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hmm, we had this debate when @sebastienros did the database creation and I thought we decided to use the ResourceReady event (which runs after health checks). |
For databases, do we: aspire/src/Aspire.Hosting.SqlServer/SqlServerBuilderExtensions.cs Lines 58 to 77 in 88ff939
aspire/src/Aspire.Hosting.PostgreSQL/PostgresBuilderExtensions.cs Lines 66 to 90 in 88ff939
In general, I dislike using health checks to mutate state. They should just be used to check health, not make something "healthy". |
I am not super fond of this result, but I couldn't find a way to add a healthcheck for individual blob containers. Any suggestions? |
I don't think we do healthchecks for child resources anywhere else. For example, CosmosDB seems to honly have it for the whole service. aspire/src/Aspire.Hosting.Azure.CosmosDB/AzureCosmosDBExtensions.cs Lines 104 to 122 in e3d170c
Because the I think we should be able to follow the existing patterns in Sql, Postgres, and in Azure CosmosDB. What doesn't work about the existing pattern? Any other suggestions here @sebastienros or @mitchdenny ? |
SqlServer/Postgres databases have one. It's done by using their own connection string which has the |
538fc7d
to
dae66f7
Compare
dae66f7
to
ec3a2df
Compare
Thanks @sebastienros for help and guidance. How does this look now? |
src/Aspire.Hosting.Azure.Storage/AzureBlobStorageContainerHealthCheck.cs
Outdated
Show resolved
Hide resolved
...ents/Aspire.Azure.Storage.Blobs/AspireBlobStorageExtensions.BlobStorageContainerComponent.cs
Show resolved
Hide resolved
tests/Aspire.Hosting.Azure.Tests/AzureStorageEmulatorFunctionalTests.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
{ | ||
throw new DistributedApplicationException($"BlobServiceClient was not created for the '{builder.Resource.Name}' resource."); | ||
} | ||
// This event is triggered when the health check is healthy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// This event is triggered when the health check is healthy. | |
// This event is triggered when the emulator has started, and BlobServiceClient is marked as healthy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with the change, that's not what I wanted to convey. I am saying that this event happens after the health check, only if it's healthy. Yes it implied the emulator has started (with more information than just "started") and "BlobServiceClient" is healthy doesn't mean much. The storage itself is healthy, the client is not "marked" anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, but "health check is healthy" isn't providing much information either. There are multiple health checks now; it would be good to clarify which health check is triggering this event.
var healthCheckKey = $"{resource.Name}_check"; | ||
|
||
BlobServiceClient? blobServiceClient = null; | ||
builder.ApplicationBuilder.Services.AddHealthChecks().AddAzureBlobStorage(sp => | ||
{ | ||
return blobServiceClient ??= CreateBlobServiceClient(connectionString ?? throw new InvalidOperationException("Connection string is not initialized.")); | ||
}, name: healthCheckKey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to duplicate the HC here? Or why do we need to keep the HC on lines:160-167?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it's on the Blobs resource it. Line 160 is on the Emulator resource. Doing it on the storage is not sufficient as the WaitForHealthyAsync
doesn't bubble up to the parent resources.
If it were just for the existing tests we could probably not have this specific one. But it's more consistent to keep it if we do it for containers.
@@ -135,9 +135,10 @@ public async Task VerifyAzureStorageEmulatorResource() | |||
|
|||
[Fact] | |||
[RequiresDocker] | |||
[QuarantinedTest("https://github.com/dotnet/aspire/issues/9139")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure that this can be dropped? For quarantined tests we want to take it out after it has been green for a certain number of runs (~100 right now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add it back then.
What else? Reopening the issues? Is tracking automatic or is there a process to follow to unquarantine like for aspnet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, re-open the issue. And it will be tracked automatically. And I will take care of taking it out of quarantine for now. It will get semi-automated in medium term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Resolves #9139
Resolves #9145
The underlying issue is attributed to a lack of health checks for child resources (as those have no lifetime of their own). In the nutshell, whenever an Azurite emulator is starting up, the readiness of the emulator is indicated by the readiness of the "blobs" resources (represented by
BloblServiceClient
). Previously, child blob contrainers were created onResourceReadyEvent
, but this created an opportunity for a race condition - a client could attempt to connect after the resource reported healthy but before child resources were created - the flaky test highlighted this problem.To fix the issue we're now creating an individual health check for each blob container resource. To make it simple, a blob container is being created within the health check itself.