-
Notifications
You must be signed in to change notification settings - Fork 2k
feat(azure_blob sink): Update crates and migrate to the new SDK #24255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
thomasqueirozb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this contribution! I haven't had the time to look closely at everything yet, but I'm leaving some comments now because there is a lot to discuss.
As per your PR description it seems like we're now supporting new auth methods with this upgrade too, so it would be nice to include a changelog in this PR. Let me know if I misunderstood this.
I do have some concerns about using these new azure crates though. The azure_storage_blob crate has this warning:
Note: The azure_storage_blob crate is currently under active development and not all features may be implemented or work as intended. This crate is in beta and not suitable for Production environments. For any general feedback or usage issues, please open a GitHub issue
I see that we are using just some parts of that crate. Namely only BlobContainerClient/BlobContainerClientOptions, clients::BlobClient and models::BlockBlobClientUploadOptions.
Have you checked if they are just a port from what was being used in the old azure_storage_blobs crate or is it a full rewrite? If it is a port I'd have more confidence in using it. Either way I think our hands are tied here because MS doesn't seem to put enough effort into releasing good and stable Azure crates so anything as good as or better than the old azure_storage_blobs will do.
I haven't, but I suspect it is a full rewrite since I think it is just adding support for auth from a url to the SDK. I wrote a fix for the SDK and submitted it to Microsoft, but was informed that they had fixed what I was trying to fix in a release of azure_storage_blob the day before.
If I'm remembering properly, the only thing keeping SAS from working before is that the client constructor required the AD auth method. I think now it is making that credential optional, which then opened the door for auth from url. Anyway, I went about refactoring my original Vector PR to use the newly released azure_storage_blob SDK and broke it up into two PRs. I'm mainly doing all of this because we need per-sink proxy support and the legacy crates just didn't really support that so much. I have another PR that builds on this one to add proxy support: #24256
Same auth methods, SAS and Access key, are supported, but I think the implementation is different. I think this PR should help address #24167 though, so it can lead to additional auth methods. |
Changelog not needed then There are a few failing checks (and merge conflicts), I'll circle back to this once there are new commits on this branch. |
|
Any updates on this? @thomasqueirozb @joshcoughlan |
I'm working through this today getting the clippy whatnot passing and resolving merge conflicts. I didn't really realize there were other PRs waiting on this. Also, full disclosure here, I'd be happy if more eyeballs were on this PR as a good amount of it is faith-based coding with AI guidance. I'm none too familiar with rust and having more knowledgeable humans involved wouldn't be a bad thing. |
|
All contributors have signed the CLA ✍️ ✅ |
3cd1ced to
7785983
Compare
|
I have read the CLA Document and I hereby sign the CLA |
|
recheck |
Cargo.toml
Outdated
| # Azure | ||
| azure_core = { version = "0.25", default-features = false, features = ["reqwest", "hmac_openssl"], optional = true } | ||
| azure_identity = { version = "0.25", default-features = false, features = ["reqwest"], optional = true } | ||
| azure_core = { version = "0.30", default-features = false, features = ["reqwest", "hmac_openssl"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| azure_core = { version = "0.30", default-features = false, features = ["reqwest", "hmac_openssl"] } | |
| azure_core = { version = "0.30", default-features = false, features = ["reqwest", "hmac_openssl"], optional = true } |
This should remain optional as we don't want this to be compiled in when Vector is built with minimal features
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, looking at https://docs.rs/crate/azure_core/latest/features I'm worried that without reqwest_native_tls and maybe also without reqwest_deflate we can run into some issues...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW reqwest now sets its default TLS feature to use rustls, instead of native-tls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/sinks/azure_common/config.rs
Outdated
| // Force Azure SDK to use reqwest_0_12_24 transport to avoid affecting global reqwest | ||
| options.client_options.transport = Some(azure_core::http::Transport::new(std::sync::Arc::new( | ||
| reqwest_0_12_24::ClientBuilder::new() | ||
| .build() | ||
| .map_err(|e| format!("Failed to build reqwest_0_12_24 client: {e}"))?, | ||
| ))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using another version of reqwest here? I'd assume that this would work just fine if we used reqwest::ClientBuilder instead - or is there a version issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a version issue. The MS SDK uses reqwest 0.12 and trying to use 0.11 results in the following error: expected Arc<dyn HttpClient + ’static, Global>, found Arc<Client, Global> (rust-analyzer E0308). From what I understand this is due to HttpClient being implemented in the SDK and the SDK uses reqwest 0.12. I'm going pause here to reiterate that I am a rust novice and there may be better ways to do this. The most straight-forward way I could come up with to use reqwest 0.12 without changing it globally was to use a renamed dependency. Also, I did look at using reqwest 0.12 everywhere, but there would have had to have been updates made to other components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reqwest is only being used in as a dev dependency in tests. I will bump it to use 0.12 everywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like I'm not able to push to your branch. I made a PR here: Automattic#4
|
Hi @joshcoughlan I'm not able to get azure integration tests to pass by running |
thomasqueirozb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is almost good to go. Please merge Automattic#4 and commit these suggestions afterwards. Thanks for your work on this!
Bump reqwest to 0.12 and consolidate reqwest versions
|
Done, but several integration tests for other components are now failing with reqwest 0.12 everywhere. |
This is due to reqwest using http v1 and I didn't run clippy for all features locally - my bad! It was at the top of the release notes I just missed it... I'll follow up with another PR to your branch to fix this which is going to restore some of your code. |
Use reqwest 0.12 and 0.11. Allow CDLA-Permissive-2.0
thomasqueirozb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
|
Hi. There are some failing tests with Also, if you'd rather undo the last commit and just handle that separately that's fine by me. |
This PR migrates the Azure Blob sink to the latest official Azure SDK crates and aligns our internal auth/HTTP plumbing with the new pipeline model.
Highlights:
Follow-ups planned:
azure_blobproxy support #21314)Vector configuration
Examples used during testing.
SAS-based connection string:
Shared Key (access key) connection string:
How did you test this PR?
Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changeloglabel to this PR.References
Notes
ERROR vector::topology::builder: Configuration error. error=Sink "testing-azure_blob": Account name missing in connection string