Scalability: number of files

Use case: collection information about all files across many datasets

For an imaging research consortium this is easily 10s of millions without doing much.

A (metadata) file for each file that exists, all gathered in a single git repo doesn't work -- we have explored that space with metalad. The current spec already has provisions for splitting a long list of files into a tree, which could be assembled with a series of nested datalad datasets to host the metadata dump. We could also think about making it possible to store multiple records in a single file. Such a file would be an JSON/YAML object with record IDs as top-level keys -- we can do this here, we have identifiers for everything.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scalability: number of files #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Scalability: number of files #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions