-
Notifications
You must be signed in to change notification settings - Fork 144
Improve metadata request handling #285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Improve metadata request handling #285
Conversation
| /// Session actor for metadata requests with redirect handling. | ||
| /// | ||
| /// Static to share a single URLSession across all HubApi instances, preventing resource | ||
| /// exhaustion when many instances are created. Persists for process lifetime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go this way, it might be useful to document that this is only being used for HEAD requests.
Sources/Hub/HubApi.swift
Outdated
| /// Cache for metadata responses with configurable expiration. | ||
| /// | ||
| /// Shared cache across all HubApi instances for better hit rates and lower memory usage. | ||
| /// Reduces redundant network requests for file metadata. Default TTL is 1 minute (60 seconds). | ||
| /// Entries auto-expire based on TTL on next get call for any key. | ||
| internal static let metadataCache: MetadataCache = .init(defaultTTL: 1 * 60) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A 1 minute cache sounds reasonable, but I wonder if this will introduce issues or edge cases that will be difficult to debug. It also introduces some complexity, in exchange for solving a very specific problem: allow unauthenticated CIs to proceed. Have you explored how difficult it would be to fix this problem as part of the testing suite, or would that be impractical?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking out loud: The Foundation URL Loading System has built-in response caching via URLCache. If we're getting cache misses, there are two possibilities, either:
- The cache isn't big enough
- The server isn't sending the right headers
The default cache is tiny (IIRC, <1MB in-memory, 10MB on disk), so setting a larger, explicit size in the URLSessionConfiguration is a good first step.
Unfortunately, the API isn't sending the headers needed to get server-negotiated cache behavior:
$ curl -I https://huggingface.co/api/models/bert-base-uncased
Access-Control-Allow-Origin: https://huggingface.co
Access-Control-Expose-Headers: X-Repo-Commit,X-Request-Id,X-Error-Code,X-Error-Message,X-Total-Count,ETag,Link,Accept-Ranges,Content-Range,X-Linked-Size,X-Linked-ETag,X-Xet-Hash
Access-Control-Max-Age: 86400
Connection: close
Content-Length: 4464
Content-Type: application/json; charset=utf-8
Cross-Origin-Opener-Policy: same-origin
Date: Tue, 04 Nov 2025 15:38:16 GMT
Etag: W/"1170-c8f41jvZ6In8xh7lkgxKmGX+dKI"
Ratelimit: "api";r=498;t=176
Ratelimit-Policy: "fixed window";"api";q=500;w=300
Referrer-Policy: strict-origin-when-cross-origin
Vary: Origin
Via: 1.1 4d89e7f6870714b602988e2ed1135996.cloudfront.net (CloudFront)
X-Amz-Cf-Id: 5f5unAR82VYVA3CC1Vo5g6NCp1SYkkeqt78EX_zO9ugLTbYNhBcjUA==
X-Amz-Cf-Pop: IAD55-P8
X-Cache: Miss from cloudfront
X-Powered-By: huggingface-moon
X-Request-Id: Root=1-690a1de8-66e423c063a9090f15c12908We get ETag which we can use for validation, but not Cache-Control, Expires, or Last-Modified. So we'd need to configure request.cachePolicy = .returnCacheDataElseLoad.
It'd be great to avoid implementing our own caching mechanism, but I know a lot of devs end up doing that because of how mysterious URLCache can be in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points, this is probably an unnecessary addition to this PR, just figured it would be useful because I was working on the SPM project's metadata cache recently. Caching the HEAD response in particular is interesting because it is often specifically used to determine if a cache should be used or not. Any caching at all for this HEAD request could definitely lead to a model/file being out of date.
One hybrid option we could take is using the URLCache and .returnCacheDataElseLoad but clearing the cache manually on some interval with an override.
Something like this
override func cachedResponse(for request: URLRequest) -> CachedURLResponse? {
let key = cacheKey(for: request)
// Check if expired
if isExpired(key) {
super.removeCachedResponse(for: request)
return nil
}
// Return from underlying cache if still valid
return super.cachedResponse(for: request)
}Perhaps this would be better suited for it's own PR, though.
| /// Performs an HTTP HEAD request to retrieve metadata without downloading content. | ||
| /// | ||
| /// Allows relative redirects but ignores absolute ones for LFS files. | ||
| /// Uses a shared URLSession with custom redirect handling that only allows relative redirects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, the use of a shared session is documented here 👍 I would still mention it when the var is defined.
- Better handled with a follow up PR to prevent unintended edge cases
This PR attempts to fix a couple issues with the httpHead request, particularly with unit tests in our project, but also useful in general.
URLSessionappears to have an internal limit on how many can be instantiated in a short amount of time. New sessions are necessary for the httpGet requests because the sessionConfig can change when using background mode (potential for improvement here), but for httpHead, we are using the default config with a single redirect delegate, so it can be a shared singleton for the process.This also cleans up the interface of the httpHead function to return only the
HTTPURLResponseinstead of(Data, HTTPURLResponse), since it's an internal function and was being ignored at all call sites so far. The.gitignorechange was to exclude the build folder for thetransformers-cliexample.Interested in feedback on the TTL time and any interface changes, as well as general comments.