Skip to content

OpenCLIP bootstrap downloader and hardening#70

Merged
tazarov merged 13 commits into
mainfrom
codex/issue-65-openclip-bootstrap
Mar 6, 2026
Merged

OpenCLIP bootstrap downloader and hardening#70
tazarov merged 13 commits into
mainfrom
codex/issue-65-openclip-bootstrap

Conversation

@tazarov

@tazarov tazarov commented Mar 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes #65.

This change introduces OpenCLIP model bootstrap and hardening in embeddings/openclip, plus follow-up fixes from iterative review.

Change Log

Added

  • Add OpenCLIP default HF bootstrap path for the standard 4 ONNX artifacts (text model, vision model, tokenizer, preprocessor config).
  • Add README/documentation updates describing bootstrap defaults and offline artifact flow.
  • Add local cache layout and download orchestration in EnsureDefaultAssets.

Changed

  • Add cache integrity checks with optional override support and robust stale-file handling.
  • Add HTTP redirect policy with allowlisted host enforcement and HTTPS checks when authenticated.
  • Make lock-based bootstrap concurrency safe with stale lock cleanup and deterministic error reporting.
  • Improve checksum verification logic and stale artifact cleanup behavior.

Security and Reliability

  • Enforce checksum/size validation before using cached assets.
  • Add retry behavior for transient errors in download pipeline and explicit non-retry classification for non-transient DNS failures.
  • Replace insecure path/token behavior with stricter base URL validation and redirection safety.

Files

  • embeddings/openclip/bootstrap.go
  • embeddings/openclip/bootstrap_test.go
  • embeddings/openclip/embedder.go
  • embeddings/openclip/embedder_test.go
  • embeddings/openclip/embedder_integration_test.go
  • README.md

Verification

  • go test ./embeddings/openclip/... -short
  • go test ./...
  • go vet ./embeddings/openclip/...

Reviewer Checklist

  • Confirm bootstrap default model/revision behavior
  • Confirm checksum and stale cache error paths are deterministic
  • Confirm redirect and HTTPS policy is compatible with intended HF workflows
  • Confirm new tests cover retry and streaming max-size behavior

Notes

  • This PR includes the full sequence of review-driven fixes (security, reliability, and error-handling), and has no known blocking issues remaining.

@tazarov

tazarov commented Mar 5, 2026

Copy link
Copy Markdown
Contributor Author

@claude review

@tazarov tazarov requested a review from Copilot March 5, 2026 08:09
@claude

claude Bot commented Mar 5, 2026

Copy link
Copy Markdown

Claude finished @tazarov's task in 2m 37s —— View job


Code Review: OpenCLIP Bootstrap Downloader and Hardening

Tasks

  • Understand the request
  • Read and review all changed files
  • Post final review

Overall this is a well-structured, security-conscious implementation. The checksum-by-default, HTTPS enforcement with tokens, atomic temp-file-then-rename, and redirect allowlisting are all solid choices. Below are findings organized by severity.


bootstrap.go

Medium Issues

1. ONNXRUNTIME_TEST_MODEL_CACHE_DIR in production config (defaultBootstrapConfig, line 243)

if cacheDir == "" {
    cacheDir = strings.TrimSpace(os.Getenv("ONNXRUNTIME_TEST_MODEL_CACHE_DIR"))
}

A TEST_-prefixed env var is leaking into the production default config path. If this is a test convenience, it should live only in test setup (e.g., in setupORTTestEnvironment), not in defaultBootstrapConfig. Callers who set this env var for tests could silently influence production behavior.
Fix this →

2. Redundant security setup between ensureHTTPClientSecurity and ensureModelAssets (lines 323–347 vs 424–437)

EnsureDefaultAssets calls ensureHTTPClientSecurity which sets CheckRedirect, then ensureModelAssets calls parseBootstrapBaseHost and makeRedirectPolicy again (lines 424–437) when httpClient.CheckRedirect == nil. The two code paths are nearly identical, making it easy to diverge. The internal ensureModelAssets fallback was presumably added for direct-call safety, but having two places that do the same setup obscures the invariants. Consider documenting the contract (or asserting it at the start of ensureModelAssets instead of silently patching the client).

3. No context propagation in HTTP requests (downloadFileOnce, line 558)

req, err := http.NewRequest(http.MethodGet, assetURL, nil)

There is no way for callers to cancel an in-progress download. For large ONNX files (vision model is ~335 MB), a hung connection can block indefinitely even with the 60 s http.Client.Timeout (which is a total-response timeout, not a read-idle timeout). Using http.NewRequestWithContext with a passed-in context would let callers handle signals and shutdowns.
Fix this →

Minor / Low Issues

4. isTransientURLSubError uses error message string matching (lines 657–674)

Matching on lowercase substrings of err.Error() is fragile and locale-sensitive. While Go's net package doesn't always export specific error types for every case, wrapping with errors.As against *net.OpError or checking syscall.ECONNREFUSED / syscall.ECONNRESET would be more robust. At minimum, document why string matching is used here.

5. acquireFileLock busy-polls at a fixed 200 ms interval (line 708)

With a 30 s wait timeout, this can loop up to 150 times. A short exponential backoff (e.g., starting at 50 ms, capped at 500 ms) would reduce unnecessary syscalls during parallel bootstrap.

6. URL path construction uses unsanitized cfg.repoID and cfg.revision (lines 513–519)

assetURL := fmt.Sprintf(
    "%s/%s/resolve/%s/%s",
    strings.TrimRight(cfg.baseURL, "/"),
    cfg.repoID,
    cfg.revision,
    fileName,
)

Path traversal sequences like ../../ in repoID or revision would produce unexpected URLs. The file name is protected by validateAssetFileName, but there's no equivalent guard on repoID/revision. The default values are safe (a pinned SHA), but a custom WithBootstrapRevision("../../etc/passwd") would not be caught. Consider calling url.PathEscape on path segments, or validating that neither field contains / (except the single slash separating namespace and repo in repoID).
Fix this →


embedder.go

Minor / Low Issues

7. EmbedImages unnecessarily checks e.tokenizer == nil (line 456)

if e.tokenizer == nil || e.textSessionsByBatch == nil || e.visionSessionsByBatch == nil {
    return nil, fmt.Errorf("embedder has been closed")
}

The tokenizer is irrelevant to image embeddings. The check is harmless but misleading — it will give a "has been closed" error for a state that wasn't closed, if somehow only the tokenizer was nil. A separate check per modality or checking only e.visionSessionsByBatch == nil for the image path would be clearer.

8. resizeImage uses nearest-neighbor interpolation only (lines 930–962)

This matches PyTorch's default InterpolationMode.BICUBIC only by coincidence for most inputs. HuggingFace's CLIP preprocessor uses bicubic by default. Nearest-neighbor may produce slightly different embeddings compared to reference implementations. This is worth documenting as a known limitation.

9. Image preprocessing is pixel-by-pixel (preprocessImagesInto, lines 836–854)

For 224×224 images at a batch of 16, this is ~800K At() calls. The standard library's image.At() allocates and converts per call. Using type assertions to access raw pixel arrays (*image.NRGBA, *image.RGBA, *image.YCbCr) would dramatically improve throughput. This is a performance concern, not a correctness one.

10. Hardcoded assumption that padding token ID is 0 (deriveAttentionMask, lines 1019–1025)

func deriveAttentionMask(dst []int64, tokenIDs []int64) {
    for i := range dst {
        if tokenIDs[i] != 0 {
            dst[i] = 1
        }
    }
}

This is correct for CLIP's BPE tokenizer, but would silently produce wrong masks for other tokenizers. Consider adding a comment documenting this assumption.


Tests

11. LRU eviction is not unit-tested

textSessionForBatchLocked and visionSessionForBatchLocked implement LRU eviction when maxCachedBatchCount is hit, but there are no tests for this path. Given the complexity of the LRU bookkeeping (textSessionsByBatch, textSessionLRU, textSessionLRUIndex), a unit test that fills the cache past the limit and verifies the oldest entry is evicted would reduce regression risk.

12. assertApproxUnitNorm is duplicated across test files

embedder_test.go:331 and embedder_integration_test.go:142 have near-identical helpers (assertApproxUnitNorm / assertApproxUnitNormIntegration) in the same package. They could be unified into one with a consistent epsilon float64 signature.

13. Redirect policy not exercised in the integration test flow

TestEnsureModelAssetsDownloadsAndCaches constructs bootstrapConfig directly with a custom *http.Client (no CheckRedirect), so the redirect allowlisting logic is never exercised in the end-to-end flow. TestRedirectPolicyRejectsHTTPSDowngrade and TestIsAllowedRedirectHost cover the unit-level behavior, which is fine, but a test that drives EnsureDefaultAssets (or ensureModelAssets) with a server that issues a redirect would complete the coverage.


Summary

Severity Count Topics
Medium 3 Test env var in prod config, redundant security setup, no context propagation
Minor/Low 7 URL path sanitization, string-based transient detection, lock spin interval, image resize quality, pixel-by-pixel preprocessing, closed-check accuracy, pad token assumption
Test gaps 3 LRU eviction, redirect integration, duplicated helpers

The overall implementation quality is high. The most impactful fix before merge would be removing ONNXRUNTIME_TEST_MODEL_CACHE_DIR from the production default config, as it could silently cross test/production environment boundaries.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an embeddings/openclip package that can (a) bootstrap OpenCLIP ONNX assets from Hugging Face into a local cache with integrity/redirect hardening, and (b) run text+image embedding inference with session caching and preprocessing aligned to the exported artifacts.

Changes:

  • Introduce OpenCLIP asset bootstrapper with checksum/size verification, safe(ish) redirects, retry logic, and lock-based concurrency control.
  • Add OpenCLIP embedder implementation (text + vision) including tokenizer integration, image preprocessing, LRU session caching, and similarity helpers.
  • Add unit + integration tests and README documentation for OpenCLIP usage and default bootstrap behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
embeddings/openclip/bootstrap.go Implements default asset download/cache with integrity checks, retries, redirects policy, and file locks.
embeddings/openclip/bootstrap_test.go Adds coverage for caching, corrupt-file replacement, oversize protection, retry classification, lock behavior, and redirect allowlisting.
embeddings/openclip/embedder.go Implements the OpenCLIP embedder (tokenization, preprocessing, ONNX sessions, normalization, similarity helpers).
embeddings/openclip/embedder_test.go Adds unit tests for options/validation, preprocessing helpers, normalization, and similarity helpers.
embeddings/openclip/embedder_integration_test.go End-to-end embedding test using bootstrapped artifacts (or env-provided paths) against ONNX Runtime.
README.md Documents OpenCLIP artifacts, defaults, bootstrap usage, and similarity helpers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread embeddings/openclip/bootstrap.go Outdated
Comment on lines +440 to +445
revisionSlug := strings.ReplaceAll(cfg.revision, "/", "--")
baseDir := filepath.Join(cfg.cacheDir, repoSlug, revisionSlug)
if err := os.MkdirAll(baseDir, 0o700); err != nil {
return ModelAssets{}, fmt.Errorf("failed to create bootstrap cache directory %q: %w", baseDir, err)
}

Copilot AI Mar 5, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repoID/revision are used to form baseDir via filepath.Join(cfg.cacheDir, repoSlug, revisionSlug), but repoSlug/revisionSlug are only sanitized by replacing "/" with "--". Values like ".." (or platform-specific separators) will cause filepath.Join to clean the path and can escape cacheDir, allowing writes outside the intended cache root. Consider rejecting unsafe values and/or verifying the resulting baseDir stays within cacheDir (e.g., compare filepath.Abs + filepath.Rel), and sanitize both repo and revision for all OS path separators.

Suggested change
revisionSlug := strings.ReplaceAll(cfg.revision, "/", "--")
baseDir := filepath.Join(cfg.cacheDir, repoSlug, revisionSlug)
if err := os.MkdirAll(baseDir, 0o700); err != nil {
return ModelAssets{}, fmt.Errorf("failed to create bootstrap cache directory %q: %w", baseDir, err)
}
repoSlug = strings.ReplaceAll(repoSlug, "\\", "--")
if repoSlug == "." || repoSlug == ".." {
return ModelAssets{}, fmt.Errorf("invalid repo ID %q: resolved slug %q is not allowed", cfg.repoID, repoSlug)
}
revisionSlug := strings.ReplaceAll(cfg.revision, "/", "--")
revisionSlug = strings.ReplaceAll(revisionSlug, "\\", "--")
if revisionSlug == "." || revisionSlug == ".." {
return ModelAssets{}, fmt.Errorf("invalid revision %q: resolved slug %q is not allowed", cfg.revision, revisionSlug)
}
baseDir := filepath.Join(cfg.cacheDir, repoSlug, revisionSlug)
absBaseDir, err := filepath.Abs(baseDir)
if err != nil {
return ModelAssets{}, fmt.Errorf("failed to resolve absolute path for bootstrap cache directory %q: %w", baseDir, err)
}
absCacheDir, err := filepath.Abs(cfg.cacheDir)
if err != nil {
return ModelAssets{}, fmt.Errorf("failed to resolve absolute path for cache root %q: %w", cfg.cacheDir, err)
}
rel, err := filepath.Rel(absCacheDir, absBaseDir)
if err != nil {
return ModelAssets{}, fmt.Errorf("failed to compute relative path from cache root %q to %q: %w", absCacheDir, absBaseDir, err)
}
if rel == ".." || strings.HasPrefix(rel, ".."+string(os.PathSeparator)) {
return ModelAssets{}, fmt.Errorf("bootstrap cache directory %q escapes cache root %q", absBaseDir, absCacheDir)
}
if err := os.MkdirAll(absBaseDir, 0o700); err != nil {
return ModelAssets{}, fmt.Errorf("failed to create bootstrap cache directory %q: %w", absBaseDir, err)
}

Copilot uses AI. Check for mistakes.
Comment on lines +861 to +869
func getRGB(c color.Color, convertRGB bool) (float32, float32, float32) {
if !convertRGB {
gray := color.GrayModel.Convert(c).(color.Gray)
v := float32(gray.Y)
return v, v, v
}
rgba := color.NRGBAModel.Convert(c).(color.NRGBA)
return float32(rgba.R), float32(rgba.G), float32(rgba.B)
}

Copilot AI Mar 5, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getRGB treats do_convert_rgb=false as “convert to grayscale” by converting to color.Gray and returning (v,v,v). That changes the semantics of the preprocessor flag: disabling RGB conversion should usually mean “don’t perform a colorspace conversion step”, not “force grayscale”. This will silently distort color inputs when do_convert_rgb is false. Consider always extracting R/G/B channels (via color.NRGBAModel.Convert) and only using a special-case conversion when the model truly expects grayscale.

Copilot uses AI. Check for mistakes.
Comment on lines +505 to +511
maxBytes := cfg.maxDownloadBytes
if expectedSize > 0 && expectedSize < maxBytes {
maxBytes = expectedSize
}
if maxBytes <= 0 {
return fmt.Errorf("invalid max download limit for %s: %d", fileName, maxBytes)
}

Copilot AI Mar 5, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a caller sets expectedSize larger than cfg.maxDownloadBytes, maxBytes remains capped while downloadFileOnce still enforces written == expectedSize, making the download impossible to succeed (it will always hit size-mismatch after being limited). Consider validating expectedSize <= maxDownloadBytes up-front (or auto-bumping maxBytes to expectedSize when verification is enabled) so misconfiguration fails with a clear error.

Copilot uses AI. Check for mistakes.
@tazarov

tazarov commented Mar 5, 2026

Copy link
Copy Markdown
Contributor Author

@claude review

@claude

claude Bot commented Mar 5, 2026

Copy link
Copy Markdown

Claude finished @tazarov's task in 2m 43s —— View job


Tasks

  • Understand the request
  • Read and review all changed files
  • Post final review

Code Review: OpenCLIP Bootstrap Downloader and Hardening

This is a well-constructed, security-conscious implementation overall. The checksum-by-default, HTTPS enforcement with tokens, atomic temp-file-then-rename pattern, and redirect allowlisting are solid choices. Below are findings organized by severity, including items that supersede or add to the previous automated review.


bootstrap.go

Medium Issues

1. ONNXRUNTIME_TEST_MODEL_CACHE_DIR in production default config (defaultBootstrapConfig, line 243)

if cacheDir == "" {
    cacheDir = strings.TrimSpace(os.Getenv("ONNXRUNTIME_TEST_MODEL_CACHE_DIR"))
}

A TEST_-prefixed env var is read in the production default config path. The integration test already handles this explicitly at line 24 of embedder_integration_test.go, so this fallback in production code is unnecessary and creates an environment bleed risk.

Fix this →

2. URL construction uses raw (unescaped) cfg.repoID and cfg.revision (ensureAssetFile, lines 530–536)

assetURL := fmt.Sprintf(
    "%s/%s/resolve/%s/%s",
    strings.TrimRight(cfg.baseURL, "/"),
    cfg.repoID,   // raw value — not URL-escaped
    cfg.revision, // raw value — not URL-escaped
    fileName,
)

sanitizeBootstrapPathComponent correctly sanitizes these values for the filesystem cache path, but cfg.repoID and cfg.revision go into the URL without url.PathEscape. A caller using WithBootstrapRevision("branch with spaces") would generate a broken URL. More critically, since WithBootstrapRepoID only rejects empty strings, a value like "owner/repo?token=malicious&" would inject query parameters. Consider running path segments through url.PathEscape before composing the URL.

Fix this →

3. No context propagation through the download chain (downloadFileOnce, line 616)

req, err := http.NewRequest(http.MethodGet, assetURL, nil)

The 60s http.Client.Timeout is a total-response timeout, not a stall-detection timeout. For large files (vision model is ~335 MB), a server that drips data byte-by-byte could hold the goroutine indefinitely. Using http.NewRequestWithContext and threading a context.Context through EnsureDefaultAssets → ensureModelAssets → ensureAssetFile → downloadFileWithRetry → downloadFileOnce would allow callers to cancel in-progress downloads on shutdown or signal.

Fix this →

Minor / Low Issues

4. Redundant HTTP client setup between ensureHTTPClientSecurity and ensureModelAssets (lines 323–347 vs 424–437)

Both functions independently call parseBootstrapBaseHost and makeRedirectPolicy when CheckRedirect == nil. The guard comment says it's for "direct-call safety", but having two divergent setup paths makes it easy for them to drift. Consider asserting at the start of ensureModelAssets that the client is already secured (i.e., panic or an internal error if CheckRedirect == nil after EnsureDefaultAssets called ensureHTTPClientSecurity), or document why the fallback must stay.

5. isTransientURLSubError uses error message string matching (lines 715–732)

Matching on lowercase substrings of err.Error() is fragile across Go versions and locales. The existing logic is reasonable for the common cases, but at minimum add a comment explaining why errors.As against *net.OpError + syscall.ECONNRESET/ECONNREFUSED wasn't used (e.g., portability across operating systems).

6. acquireFileLock busy-polls with a fixed 200ms sleep (line 766)

With a 30s timeout, this can spin up to ~150 times. For concurrent downloads of the same repo (e.g., multiple services starting simultaneously), a short exponential backoff with jitter (e.g., starting at 25ms, capped at 300ms) would reduce unnecessary polling.

7. downloadFileWithRetry sleep has no jitter (line 603)

time.Sleep(time.Duration(attempt) * 500 * time.Millisecond)

Linear backoff without jitter (500ms, 1000ms) is acceptable for sequential calls. If multiple embedder instances start simultaneously and hit the same 503, they'll retry in lockstep, causing a mini-thundering-herd against the HF CDN. Adding jitter (e.g., attempt*500ms + rand.Intn(200ms)) would spread retries out.

8. staleReason is silently lost on successful re-download (ensureAssetFile, lines 490–543)

If a cached file fails checksum validation (staleReason is set), the stale file is removed and a re-download is attempted. If the re-download succeeds, staleReason is never surfaced. This is generally correct behavior (success means success), but a log.Printf or structured log at the staleReason point would help operators debug cache corruption without needing to add debug builds. Consider at least a debug/trace log here.


embedder.go

Minor / Low Issues

9. Both EmbedTexts and EmbedImages check all three resources regardless of modality (lines 403, 456)

// In both EmbedTexts and EmbedImages:
if e.tokenizer == nil || e.textSessionsByBatch == nil || e.visionSessionsByBatch == nil {
    return nil, fmt.Errorf("embedder has been closed")
}

EmbedImages checking e.tokenizer == nil and EmbedTexts checking e.visionSessionsByBatch == nil are misleading — they could produce "embedder has been closed" for a state that isn't a proper closure. Since Close() sets all three to nil atomically, this works in practice, but per-modality guards would be clearer:

  • EmbedTexts: check e.tokenizer == nil || e.textSessionsByBatch == nil
  • EmbedImages: check e.visionSessionsByBatch == nil

10. getRGB with doConvertRGB=false forces grayscale, not a no-op (lines 861–868)

func getRGB(c color.Color, convertRGB bool) (float32, float32, float32) {
    if !convertRGB {
        gray := color.GrayModel.Convert(c).(color.Gray)
        v := float32(gray.Y)
        return v, v, v
    }
    ...
}

do_convert_rgb: false in HuggingFace preprocessor configs means "the input image is already in the expected color space — skip the RGB conversion step." It does not mean "convert to grayscale." For a color image with do_convert_rgb=false, this silently discards the G and B channels, which would produce wrong embeddings. The copilot review also flagged this.

The correct behavior for convertRGB=false should simply read R/G/B from the pixel as-is (same as the convertRGB=true path). The flag should only control whether a colorspace conversion is applied before extracting channels.

Fix this →

11. resizeImage uses nearest-neighbor interpolation (lines 930–962)

HuggingFace CLIP preprocessors default to bicubic interpolation. Nearest-neighbor will produce visually different preprocessed images, which may cause subtle embedding differences compared to the Python reference implementation. This is worth documenting as a known limitation, especially since the checksum verification pins specific ONNX artifacts that were exported using bicubic preprocessing.

12. Image preprocessing is pixel-by-pixel via image.Color.At() (preprocessImagesInto, lines 836–854)

For a 224×224 image at batch size 16, this is ~800K image.At() calls, each potentially allocating a color.Color interface value. Type-asserting to *image.NRGBA (which all intermediate results are after resizeImage and centerCropImage return *image.NRGBA) and accessing the raw pixel array directly would avoid this allocation overhead.

13. deriveAttentionMask relies on pad token ID = 0 (lines 1019–1025)

func deriveAttentionMask(dst []int64, tokenIDs []int64) {
    for i := range dst {
        if tokenIDs[i] != 0 {
            dst[i] = 1
        }
    }
}

This is correct for CLIP's BPE tokenizer, but would silently produce wrong masks for any tokenizer whose padding token isn't ID 0. A comment documenting this CLIP-specific assumption would prevent future misuse.


Tests

14. assertApproxUnitNorm is duplicated across test files

embedder_test.go:346 and embedder_integration_test.go:142 contain near-identical helpers (assertApproxUnitNorm / assertApproxUnitNormIntegration) in the same package. They could be unified into one function.

15. LRU eviction path has no unit test

textSessionForBatchLocked / visionSessionForBatchLocked include LRU eviction when maxCachedBatchCount is reached. The eviction bookkeeping across textSessionsByBatch, textSessionLRU, and textSessionLRUIndex is non-trivial but has no test covering the eviction path. A unit test that calls with more batch sizes than maxCachedBatchCount and verifies the oldest is evicted would reduce regression risk.

16. TestEnsureModelAssetsDownloadsAndCaches uses bootstrapConfig directly, bypassing redirect policy

The test constructs bootstrapConfig directly with an http.Client that has no CheckRedirect, so the redirect allowlist logic is never exercised in the download flow. The unit tests for redirect policy (TestRedirectPolicyRejectsHTTPSDowngrade, TestIsAllowedRedirectHost) cover the individual functions, but an end-to-end test driving EnsureDefaultAssets (or ensureModelAssets) through a server that issues a redirect would confirm the wiring is correct.


Summary

Severity # Topics
Medium 3 Test env var in prod config, URL path injection risk, no context propagation
Minor/Low 10 Redundant HTTP setup, string-based error classification, lock spin interval, retry jitter, silent stale reason, cross-modal nil checks, doConvertRGB semantics, resize quality, pixel-by-pixel throughput, pad token assumption
Test gaps 3 LRU eviction, redirect integration, duplicated helpers

The most impactful fixes before merge:

  1. Remove ONNXRUNTIME_TEST_MODEL_CACHE_DIR from defaultBootstrapConfig (env bleed)
  2. Fix getRGB for doConvertRGB=false (correctness bug — grayscale conversion is wrong)
  3. Apply url.PathEscape to URL path segments (injection hardening)

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@tazarov tazarov merged commit c0d56dc into main Mar 6, 2026
15 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[OpenCLIP] Implement embeddings/openclip (text + image encoders)

2 participants