Skip to content

[HOPSFS-364] Add support for configurable virtual root directory #119

Open
gibchikafa wants to merge 6 commits into
logicalclocks:masterfrom
gibchikafa:add_support_for_hopsfs-virtual-root
Open

[HOPSFS-364] Add support for configurable virtual root directory #119
gibchikafa wants to merge 6 commits into
logicalclocks:masterfrom
gibchikafa:add_support_for_hopsfs-virtual-root

Conversation

@gibchikafa
Copy link
Copy Markdown
Collaborator

Introduce an optional synthetic directory layer at the mount root so callers can expose a user-defined top-level directory without using symlinks.

The new configuration is driven by --virtualDirectoryName, --virtualDirectoryPaths, and --virtualDirectoryBackendRoot. When enabled, the FUSE root merges the synthetic directory into ls output, and lookups beneath that directory are resolved back to the configured backend HopsFS paths.

The virtual directory is fully opt-in: if the name or path list is omitted, the mount behaves exactly as before. Synthetic directory entries reuse backend ownership and metadata where available so the exposed tree is consistent with the rest of HopsFS.

Add regression tests covering enabled and disabled virtual-root behavior, nested lookups, and ownership mapping.

Introduce an optional synthetic directory layer at the mount root so callers can expose a user-defined top-level directory without using symlinks.

The new configuration is driven by --virtualDirectoryName, --virtualDirectoryPaths, and --virtualDirectoryBackendRoot. When enabled, the FUSE root merges the synthetic directory into ls output, and lookups beneath that directory are resolved back to the configured backend HopsFS paths.

The virtual directory is fully opt-in: if the name or path list is omitted, the mount behaves exactly as before. Synthetic directory entries reuse backend ownership and metadata where available so the exposed tree is consistent with the rest of HopsFS.

Add regression tests covering enabled and disabled virtual-root behavior, nested lookups, and ownership mapping.
@gibchikafa gibchikafa requested a review from smkniazi May 22, 2026 12:05
@gibchikafa gibchikafa changed the title [HOPSFS-364] Add configurable virtual root directory for shared datasets [HOPSFS-364] Add support for configurable virtual root directory May 22, 2026
@gibchikafa gibchikafa requested a review from Copilot May 22, 2026 12:07
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in “virtual root” directory layer to the HopsFS FUSE mount so operators can expose a synthetic top-level directory that maps to a configurable set of backend paths (avoiding symlink-based layouts).

Changes:

  • Introduces FileSystemOption + WithVirtualDirectory(...) to configure a synthetic root directory name, exposed paths, and backend root resolution.
  • Extends DirINode lookup and readdir logic to surface the synthetic directory and resolve child lookups into the configured backend paths.
  • Adds CLI flags (--virtualDirectoryName, --virtualDirectoryPaths, --virtualDirectoryBackendRoot) and regression tests for enabled/disabled behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
internal/hopsfsmount/VirtualRoot_test.go New tests validating virtual-root listing, nested lookup resolution, and disabled-by-default behavior.
internal/hopsfsmount/FileSystem.go Adds virtual-root configuration options, normalization helpers, and path allowance logic for virtual-backed paths.
internal/hopsfsmount/Dir.go Implements synthetic directory nodes at the mount root and virtual subtree traversal (lookup/readdir) backed by configured HopsFS paths.
internal/hopsfsmount/config.go Adds new CLI flag configuration variables for the virtual-root feature.
cmd/main.go Parses virtual-root flags, builds the filesystem option, and adds a CSV-splitting helper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +52 to +54
func WithVirtualDirectory(name string, paths []string, backendRoot string) FileSystemOption {
return func(filesystem *FileSystem) {
filesystem.VirtualDirectoryName = strings.TrimSpace(name)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 11ed664: NewFileSystem now validates the virtual directory name eagerly and rejects invalid names like foo/bar, . and .. so the feature fails fast instead of exposing an ambiguous root entry.

Comment on lines +172 to +176
for _, p := range paths {
p = strings.TrimSpace(p)
p = strings.Trim(p, "/")
if p == "" {
continue
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 11ed664: virtual paths are now normalized and rejected if they contain traversal segments like .., and the backend root is validated before the feature is enabled.

Comment on lines +238 to +245
if dir.Parent == nil && dir.FileSystem.HasVirtualDirectory() {
if child := dir.ensureVirtualDirectoryRootChild(ReadDir); child != nil {
entries = append(entries, fuse.Dirent{
Inode: child.Attrs.Inode,
Name: child.Attrs.Name,
Type: child.Attrs.FuseNodeType(),
})
}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 11ed664: root listing now detects collisions with a real backend child of the same name, suppresses the synthetic entry, and Lookup resolves the real backend node when that collision exists.

Comment thread internal/hopsfsmount/Dir.go Outdated
Comment on lines +330 to +352
func (dir *DirINode) syntheticDirectoryAttrs(relPath, name, statPath string, fallback Attrs) Attrs {
attrs := Attrs{
Name: name,
Mode: os.ModeDir | 0755,
Inode: syntheticInode(path.Join("/", dir.FileSystem.VirtualDirectoryName, relPath)),
Uid: fallback.Uid,
Gid: fallback.Gid,
Mtime: dir.FileSystem.Clock.Now(),
Ctime: dir.FileSystem.Clock.Now(),
}

if statPath != "" {
if backendAttrs, err := dir.FileSystem.getDFSConnector().Stat(statPath); err == nil {
attrs.Mode = backendAttrs.Mode
attrs.Uid = backendAttrs.Uid
attrs.Gid = backendAttrs.Gid
attrs.DFSUserName = backendAttrs.DFSUserName
attrs.DFSGroupName = backendAttrs.DFSGroupName
attrs.Mtime = backendAttrs.Mtime
attrs.Ctime = backendAttrs.Ctime
attrs.Size = backendAttrs.Size
}
}
Validate virtual root configuration eagerly at filesystem construction time so invalid names and paths fail fast. Reject synthetic directory names that contain path separators or resolve to . or .., and reject virtual paths that contain traversal segments or malformed path elements.

Detect collisions between the configured synthetic root name and a real backend entry at the mount root. If the backend already exposes that name, prefer the real directory entry and suppress the synthetic one so the virtual layer cannot hide backend data.

Cache synthetic directory metadata on the in-memory inode and only refresh it when the cached attributes expire. This avoids repeated backend stat calls for ls/lookup on the synthetic tree while still keeping ownership and mode information aligned with the backend.

Add tests covering invalid configuration, root collisions, and repeated lookup/readdir behavior for the synthetic tree.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Comment on lines +323 to +342
if dir.FileSystem.virtualDirectoryLeafExists(childRelPath) {
if node := dir.getChildInode(ReadDir, childName); node != nil && nodeAttrsFresh(node, dir.FileSystem.Clock.Now()) {
attrs, ok := nodeAttrs(node)
if !ok {
return nil, fmt.Errorf("unexpected cached node type for %s", childName)
}
entries = append(entries, fuse.Dirent{
Inode: attrs.Inode,
Name: childName,
Type: attrs.FuseNodeType(),
})
continue
}

var attrs Attrs
node, err := dir.statInodeInHopsFS(ReadDir, childName, &attrs)
if err != nil {
return nil, err
}
entries = append(entries, fuse.Dirent{
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 11ed664: synthetic leaf entries no longer stat during ReadDirAll; we now emit dirents directly and defer backend resolution to Lookup/Attr, which removes the N+1 pattern.

Comment on lines 202 to 244
func (dir *DirINode) LookupInt(opName string, name string) (fs.Node, error) {
if dir.Parent == nil && dir.FileSystem.HasVirtualDirectory() && name == dir.FileSystem.VirtualDirectoryName {
now := dir.FileSystem.Clock.Now()
if node := dir.getChildInode(opName, name); node != nil {
switch cached := node.(type) {
case *DirINode:
if cached.VirtualKind == VirtualDirSynthetic {
if !dir.VirtualRootCollision && now.Before(cached.Attrs.Expires) {
return cached, nil
}
} else if now.Before(cached.Attrs.Expires) {
return cached, nil
}
case *FileINode:
if now.Before(cached.Attrs.Expires) {
return cached, nil
}
}
}

dir.removeChildInode(opName, name)
var attrs Attrs
node, err := dir.statInodeInHopsFS(opName, name, &attrs)
if err == nil {
return node, nil
}
if err != syscall.ENOENT {
return nil, err
}
child, childErr := dir.ensureVirtualDirectoryRootChild(opName)
if childErr != nil {
return nil, childErr
}
return child, nil
}

if dir.VirtualKind == VirtualDirSynthetic {
return dir.lookupVirtualDirectoryChild(opName, name)
}

if !dir.FileSystem.IsPathAllowed(dir.AbsolutePathForChild(name)) {
return nil, syscall.ENOENT
}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 11ed664: mutating operations now reject targets outside the configured virtual leaf tree with EPERM, so the synthetic root cannot be used to write arbitrary backend paths.

Comment on lines +72 to +80
hdfsAccessor.EXPECT().Stat("/Projects/other-project/shared-a").Return(Attrs{Name: "shared-a", Mode: os.ModeDir}, nil).AnyTimes()
sharedDataset, err := sharedProject.(*DirINode).Lookup(nil, "shared-a")
assert.Nil(t, err)
assert.NotNil(t, sharedDataset)

projectDirents, err := sharedProject.(*DirINode).ReadDirAll(nil)
assert.Nil(t, err)
assert.Equal(t, []string{"shared-a", "shared-b"}, direntNames(projectDirents))

Reject invalid virtual directory names and traversal-heavy virtual paths at filesystem construction time. Normalize the configured backend root once and keep the virtual-directory feature opt-in, so bad configuration fails fast instead of producing ambiguous FUSE entries.

Avoid N+1 backend stats for synthetic leaf listings by returning placeholder dirents from ReadDirAll and deferring backend resolution to Lookup and Attr. Keep the synthetic node metadata cached with TTL-based refresh, but refresh it from the backend when the cached attrs expire.

Protect the synthetic tree from writes that escape the configured virtual path set. Mutating operations under the synthetic directory now reject mkdir/create/remove/rename/setattr requests unless the target stays within a configured virtual leaf subtree. Also keep the synthetic root collision handling aligned with the real backend root so an existing backend child cannot be hidden by the virtual entry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants