Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that artifact cache files will not have long file names #406

Merged
merged 3 commits into from
Mar 29, 2025
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Ensure that artifact cache files will not have long file names
Resolves #287

The previous fix here used a sha256 hash of the full URI as a way to ensure that the resulting file name for the artifact in the dependency cache won't exceed filesystem file name limits (i.e. it won't generate a file name that is too long).

This causes some issues, mostly that you can't easily see what's in the dependency cache. This can be painful when building offline buildpacks. This would also cause a lot of test breakage as it would change the format of the dependency cache on disk and we have a lot of buildpacks that mock this format for testing purposes, if we change that, it would require updating a lot of tests.

I think we can get away without doing that though. The real problem is that you might have a URL with query parameters on it, where the query params can be very long. What I believe we can do is to simply not include the query params in the name, which I believe is the actual intent of our implementation anyway.

This fixes the issue with offline downloads because you still have a human readable name as the file name in the cache.

This is OK with our test cases because test cases don't as far as I know mock any URIs with query parameters. If this does happen, it is much less frequent and shouldn't be a big deal to resolve.

This should be safe because the path to the artifact file in the cache consists of the sha256 of the file being downloaded and then the last part of the URL's path. For example, `https://example.com/foo/bar/file.tgz` where the sha256 hash of `file.tgz` is `sha256-foo` will generate a path on disk of `cache-path/sha256-foo/file.tgz`. Since the dependency configured in `buildpack.toml` consists of the url & the sha256 of the file, it would be very odd to have a second dependency configured in `buildpack.toml` with a different URL that points to the exact same file (I can't think of a reason anyway). In fact, if you had that what libpak would do is to download the first file and then reuse the first file from the cache, because the sha256 hashes of the file match (i.e. it would never download the second URL).

I debated on dropping the file name altogether and just using the dependency's sha256 hash, but for the offline scenario, it is nice to have.

This solves the reported problem as well because it removes the arbitrarily long parts (i.e. the query params) which were causing the reported issue.

Signed-off-by: Daniel Mikusa <dan@mikusa.com>
dmikusa committed Mar 22, 2025
commit bb5c454205c43d9fbf536464b1752f5a92200628
4 changes: 2 additions & 2 deletions dependency_cache.go
Original file line number Diff line number Diff line change
@@ -264,7 +264,7 @@ func (d *DependencyCache) Artifact(dependency BuildModuleDependency, mods ...Req
color.New(color.FgYellow, color.Bold).Sprint("Warning:"))

d.Logger.Bodyf("%s from %s", color.YellowString("Downloading"), urlP.Redacted())
artifact = filepath.Join(d.DownloadPath, filepath.Base(uri))
artifact = filepath.Join(d.DownloadPath, filepath.Base(urlP.Path))
if err := d.download(urlP, artifact, mods...); err != nil {
return nil, fmt.Errorf("unable to download %s\n%w", urlP.Redacted(), err)
}
@@ -296,7 +296,7 @@ func (d *DependencyCache) Artifact(dependency BuildModuleDependency, mods ...Req
}

d.Logger.Bodyf("%s from %s", color.YellowString("Downloading"), urlP.Redacted())
artifact = filepath.Join(d.DownloadPath, dependency.SHA256, filepath.Base(uri))
artifact = filepath.Join(d.DownloadPath, dependency.SHA256, filepath.Base(urlP.Path))
if err := d.download(urlP, artifact, mods...); err != nil {
return nil, fmt.Errorf("unable to download %s\n%w", urlP.Redacted(), err)
}
26 changes: 26 additions & 0 deletions dependency_cache_test.go
Original file line number Diff line number Diff line change
@@ -18,6 +18,7 @@ package libpak_test

import (
"bytes"
"crypto/sha256"
"fmt"
"io"
"net/http"
@@ -560,6 +561,31 @@ func testDependencyCache(t *testing.T, context spec.G, it spec.S) {
Expect(io.ReadAll(a)).To(Equal([]byte("alternate-fixture")))
})

it("sets downloaded file name to uri's sha256", func() {
server.AppendHandlers(ghttp.RespondWith(http.StatusOK, "test-fixture"))

a, err := dependencyCache.Artifact(dependency)
Expect(err).NotTo(HaveOccurred())

Expect(io.ReadAll(a)).To(Equal([]byte("test-fixture")))
Expect(filepath.Base(a.Name())).To(Equal("test-path"))
})

it("sets downloaded file name to uri's sha256 with empty SHA256 and query parameters in the uri", func() {
dependency.SHA256 = ""
dependency.URI = fmt.Sprintf("%s/test-path?param1=value1&param2=value2", server.URL())
server.AppendHandlers(ghttp.RespondWith(http.StatusOK, "alternate-fixture"))

hasher := sha256.New()
hasher.Write([]byte(dependency.URI))

a, err := dependencyCache.Artifact(dependency)
Expect(err).NotTo(HaveOccurred())

Expect(io.ReadAll(a)).To(Equal([]byte("alternate-fixture")))
Expect(filepath.Base(a.Name())).To(Equal("test-path"))
})

it("sets User-Agent", func() {
server.AppendHandlers(ghttp.CombineHandlers(
ghttp.VerifyHeaderKV("User-Agent", "test-user-agent"),