Skip to content

Conversation

@neilotoole
Copy link
Contributor

This PR proposes adding a method FSCache.MapFile that maps an existing on-disk file into a StandardFS-backed FSCache without performing an (expensive) copy operation. I wasn't able to figure out how to avoid the copy without adding new API surface (although it's entirely possible I've missed something obvious).

Scenario

An example scenario is demonstrated in TestMapFile. We've got a Getter type:

// Getter gets a reader for a URL or filepath.
type Getter interface {
	Get(urlOrFilepath string) (io.ReadCloser, error)
}

An implementation of Getter gets a reader for a URL, or a file, storing the contents in an FSCache. There's two implementations of Getter provided:

  • StdGetter uses the previously available mechanism to copy contents into the cache using the w returned by FSCache.Get.
  • MapGetter uses the proposed FSCache.MapFile mechanism.

The problem with StdGetter can be seen here:

	// Thus, urlOrFilepath must be a filepath.
	fp := urlOrFilepath
	r, w, err := g.fc.Get(fp)
	if err != nil {
		return nil, err
	}

	if w == nil {
		g.logf("Cache hit: %s", fp)
		return r, nil
	}

	g.logf("Cache miss: %s", fp)

	f, err := os.Open(fp)
	if err != nil {
		return nil, err
	}
	defer f.Close()

	// We copy the contents of f to w and thus into the cache.
	// But, for our use-case, this is useless work.
	// We're already using a filesystem FSCache, so we're just
	// copying the file from disk to memory and back to disk.
	// Boo!
	var n int64
	if n, err = io.Copy(w, f); err != nil {
		return nil, err
	}

	if err = w.Close(); err != nil {
		return nil, err
	}
	g.logf("EXPENSIVE: Copied %d bytes from %s to cache", n, fp)
	return r, nil

In the real-world scenario that inspired this proposal, the file was large (several GB), and this copy operation became a performance chokepoint.

Proposal

Instead, we can do something like this:

	if g.fc.Exists(fp) {
		g.logf("Cache hit: %s", fp)
		r, _, err := g.fc.Get(fp)
		return r, err
	}

	g.logf("Cache miss: %s", fp)
	g.logf("Mapping file into cache: %s", fp)

	if err := g.fc.MapFile(fp); err != nil {
		return nil, err
	}

	r, _, err := g.fc.Get(fp)
	if err != nil {
		return nil, err
	}

	return r, nil

In the very limited tests that I've run so far in my own project, this completely addresses the issue.

@djherbis: What are your thoughts on this mechanism? Am I missing some other existing way of avoiding the copy?

@neilotoole
Copy link
Contributor Author

Note that this PR incorporates the stretchr/testify package... I can remove it if there's a strict stdlib-only policy, but I've found it makes the test code much more succinct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant