Skip to content

Conversation

@Goddhi
Copy link
Contributor

@Goddhi Goddhi commented Dec 5, 2025

implements the guppy explore command, allowing users to interactively traverse and list the contents of a UnixFS DAG stored on the network.
i added consumer/get capability because while implementing the explore feature, I encountered issues where the Indexer (indexer.storacha.network) would return "no locations found" for known valid CIDs.

Upon investigation (and referencing issue #226 ), it became clear that the Indexer now enforces an authorization check requiring the consumer/get capability. it seems the previous space/content/retrieve capability is no longer sufficient for these indexer queries.

To try to resolve this, I added the pkg/capabilities/consumer package to model this new capability.
Usage:

guppy explore <root-cid> --space 

Closes #222

@Peeja
Copy link
Member

Peeja commented Dec 5, 2025

Ah: the capabilities are in storacha/go-libstoracha#76, which I only just merged.

This is good, but might have some issues at larger data sizes. I wasn't really intending anyone to pick up that issue yet; it still needs some thought and has some context behind it that hasn't been written out yet. In general, safer to check in on issues before you start working on them; I wouldn't want you to spend a bunch of effort on something we end up having to throw out because of something you weren't aware of.

@Goddhi
Copy link
Contributor Author

Goddhi commented Dec 7, 2025

Thanks for the feedback! I appreciate the advice on checking in first.
Regarding the "large data" concerns: I suspect this is because my current implementation attempts to recursively walk the entire DAG (fs.WalkDir), which would be expensive/slow for massive directory trees.
Do you have a preferred pattern in mind to solve this?
Dhruv suggested some commands that need to be implemented in Guppy. I’d love to hear your thoughts on them as well. Could you kindly highlight which ones should be prioritized so I can focus on those first?
below is the link to the notion page guppy-proposed-commads

@Peeja
Copy link
Member

Peeja commented Dec 10, 2025

@Goddhi Yep, that's it exactly: every access in that walk is (potentially) a network call, which will be slow.

But actually: an ls-like command could be great! Listing deeply can get rough, but listing a single directory would be pretty useful. We have a lot of commands called ls, so let's call this one unixfs ls for now; that prefix might change later, but that's easy to do. Here's what I'm imagining:

$ guppy unixfs ls <space-did> <root-cid>
a-subdirectory
someFile.txt
some_other_file.csv
one_more_file.json

$ guppy unixfs ls -l <space-did> <root-cid>
drwx------ bafybeiepqeefjxfmy2lyqa2sqhatwo2js5v64w5yszgcpevuvnhmqhplda    160 Jun 21 15:37 a-subdirectory
-rw-r--r-- bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi    328 Mar 13  2024 someFile.txt
-rw-r--r-- bafybeihdwdcefgh4dqkjv67uzcmw7ojee6xedzdetojuzjevtenxquvyku  13513 Mar 13  2024 some_other_file.csv
-rw-r--r-- bafybeif4lztwxrdbwk2axtqn3aueqaueq7o2aaa4ijj74b6wscuabvqyma  33855 Mar  2  2025 one_more_file.json

$ guppy unixfs ls -l <space-did> <root-cid>/a-subdirectory
drwx------ bafybeiabc123def456ghi789jkl012mno345pqr678stu901vwx234yz56     256 Jul 15 14:22 deeper-directory
-rw-r--r-- bafybeif7h8i9j0k1l2m3n4o5p6q7r8s9t0u1v2w3x4y5z6a7b8c9d0e1f2    1024 Jul 15 14:20 document.pdf
-rw-r--r-- bafybeig2h3i4j5k6l7m8n9o0p1q2r3s4t5u6v7w8x9y0z1a2b3c4d5e6f7   45678 Jul 15 14:18 data.json

# Equivalent:
$ guppy unixfs ls -l <space-did> bafybeiepqeefjxfmy2lyqa2sqhatwo2js5v64w5yszgcpevuvnhmqhplda
drwx------ bafybeiabc123def456ghi789jkl012mno345pqr678stu901vwx234yz56     256 Jul 15 14:22 deeper-directory
-rw-r--r-- bafybeif7h8i9j0k1l2m3n4o5p6q7r8s9t0u1v2w3x4y5z6a7b8c9d0e1f2    1024 Jul 15 14:20 document.pdf
-rw-r--r-- bafybeig2h3i4j5k6l7m8n9o0p1q2r3s4t5u6v7w8x9y0z1a2b3c4d5e6f7   45678 Jul 15 14:18 data.json

$ guppy unixfs ls -l <space-did> <root-cid>/a-subdirectory/data.json
-rw-r--r-- bafybeig2h3i4j5k6l7m8n9o0p1q2r3s4t5u6v7w8x9y0z1a2b3c4d5e6f7   45678 Jul 15 14:18 data.json

So:

  • The arguments are similar to guppy retrieve.
  • There's a simple output and a long output (-l) like actual ls. I've adapted that format here to contain what we actually should have available: mode, CID, size, modtime, and filename.
  • It would be neat for the simple output to format with columns like real ls when the output is a tty, but I don't see an easy tool to do that, and it's probably a bunch of work (and code that'll have to live somewhere) for not a lot of value. But if you see an easy way to do it and feel like it, go for it. 😄

What's great about this is that each command results in fetching ~1 node—except for large directories that are HAMTs, but it's still a lot less than enumerating the entire space contents in one go.

Also would be nice:

  • Incremental output, so for large directories you'll see the first entries as soon as possible, rather than waiting for the whole thing to finish.
  • A --limit option to limit the number of returned entries. But that's probably better if we have an --offset or something as well, and that's probably not currently easy to do, and probably needs some thought. We can leave this for a future PR.

@Goddhi
Copy link
Contributor Author

Goddhi commented Dec 13, 2025

Hi @Peeja, I implemented the guppy unixfs ls command as requested, including the --long flag and incremental streaming for directories.

regarding the output format, I matched the standard Unix ls -l columns (Mode, Size, Time, Name). I noted your example output included a CID column.

Currently, pkg/dagfs relies on the standard fs.FileInfo interface, which doesn't expose the underlying CID. To include the CID column, we would need to update some files under pkg/dagfs to pass the CID via the Sys() method.

I decided to keep this PR focused on the CLI command implementation. I can implement the dagfs update to expose CIDs in a follow-up PR if you think that adds enough value.

command usage

guppy unixfs ls <space-did> <root-cid>

@Goddhi Goddhi changed the title feat: add guppy explore command and consumer/get capability support feat: add guppy explore command Dec 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Guppy should be able to explore a tree from a CID

2 participants