Skip to content

Conversation

@ZachNagengast
Copy link
Collaborator

This PR attempts to fix a couple issues with the httpHead request, particularly with unit tests in our project, but also useful in general.

  1. URLSession appears to have an internal limit on how many can be instantiated in a short amount of time. New sessions are necessary for the httpGet requests because the sessionConfig can change when using background mode (potential for improvement here), but for httpHead, we are using the default config with a single redirect delegate, so it can be a shared singleton for the process.
  2. With the addition of httpHead checks we increased the volume of API requests to HF which may have led to some rate limiting w/o an hf token - this will now cache them with a 1 minute ttl to specifically reduce repeated requests within a short timeframe to the same endpoint.

This also cleans up the interface of the httpHead function to return only the HTTPURLResponse instead of (Data, HTTPURLResponse), since it's an internal function and was being ignored at all call sites so far. The .gitignore change was to exclude the build folder for the transformers-cli example.

Interested in feedback on the TTL time and any interface changes, as well as general comments.

Comment on lines +82 to +85
/// Session actor for metadata requests with redirect handling.
///
/// Static to share a single URLSession across all HubApi instances, preventing resource
/// exhaustion when many instances are created. Persists for process lifetime.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go this way, it might be useful to document that this is only being used for HEAD requests.

Comment on lines 88 to 93
/// Cache for metadata responses with configurable expiration.
///
/// Shared cache across all HubApi instances for better hit rates and lower memory usage.
/// Reduces redundant network requests for file metadata. Default TTL is 1 minute (60 seconds).
/// Entries auto-expire based on TTL on next get call for any key.
internal static let metadataCache: MetadataCache = .init(defaultTTL: 1 * 60)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A 1 minute cache sounds reasonable, but I wonder if this will introduce issues or edge cases that will be difficult to debug. It also introduces some complexity, in exchange for solving a very specific problem: allow unauthenticated CIs to proceed. Have you explored how difficult it would be to fix this problem as part of the testing suite, or would that be impractical?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud: The Foundation URL Loading System has built-in response caching via URLCache. If we're getting cache misses, there are two possibilities, either:

  1. The cache isn't big enough
  2. The server isn't sending the right headers

The default cache is tiny (IIRC, <1MB in-memory, 10MB on disk), so setting a larger, explicit size in the URLSessionConfiguration is a good first step.

Unfortunately, the API isn't sending the headers needed to get server-negotiated cache behavior:

$ curl -I https://huggingface.co/api/models/bert-base-uncased
Access-Control-Allow-Origin: https://huggingface.co
Access-Control-Expose-Headers: X-Repo-Commit,X-Request-Id,X-Error-Code,X-Error-Message,X-Total-Count,ETag,Link,Accept-Ranges,Content-Range,X-Linked-Size,X-Linked-ETag,X-Xet-Hash
Access-Control-Max-Age: 86400
Connection: close
Content-Length: 4464
Content-Type: application/json; charset=utf-8
Cross-Origin-Opener-Policy: same-origin
Date: Tue, 04 Nov 2025 15:38:16 GMT
Etag: W/"1170-c8f41jvZ6In8xh7lkgxKmGX+dKI"
Ratelimit: "api";r=498;t=176
Ratelimit-Policy: "fixed window";"api";q=500;w=300
Referrer-Policy: strict-origin-when-cross-origin
Vary: Origin
Via: 1.1 4d89e7f6870714b602988e2ed1135996.cloudfront.net (CloudFront)
X-Amz-Cf-Id: 5f5unAR82VYVA3CC1Vo5g6NCp1SYkkeqt78EX_zO9ugLTbYNhBcjUA==
X-Amz-Cf-Pop: IAD55-P8
X-Cache: Miss from cloudfront
X-Powered-By: huggingface-moon
X-Request-Id: Root=1-690a1de8-66e423c063a9090f15c12908

We get ETag which we can use for validation, but not Cache-Control, Expires, or Last-Modified. So we'd need to configure request.cachePolicy = .returnCacheDataElseLoad.

It'd be great to avoid implementing our own caching mechanism, but I know a lot of devs end up doing that because of how mysterious URLCache can be in practice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points, this is probably an unnecessary addition to this PR, just figured it would be useful because I was working on the SPM project's metadata cache recently. Caching the HEAD response in particular is interesting because it is often specifically used to determine if a cache should be used or not. Any caching at all for this HEAD request could definitely lead to a model/file being out of date.

One hybrid option we could take is using the URLCache and .returnCacheDataElseLoad but clearing the cache manually on some interval with an override.

Something like this

    override func cachedResponse(for request: URLRequest) -> CachedURLResponse? {
        let key = cacheKey(for: request)
        
        // Check if expired
        if isExpired(key) {
           super.removeCachedResponse(for: request)
            return nil
        }
        
        // Return from underlying cache if still valid
        return super.cachedResponse(for: request)
    }

Perhaps this would be better suited for it's own PR, though.

/// Performs an HTTP HEAD request to retrieve metadata without downloading content.
///
/// Allows relative redirects but ignores absolute ones for LFS files.
/// Uses a shared URLSession with custom redirect handling that only allows relative redirects
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the use of a shared session is documented here 👍 I would still mention it when the var is defined.

- Better handled with a follow up PR to prevent unintended edge cases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants