Skip to content

Avoid exceeding a user's configured max entries per DFS directory #6413

@ctubbsii

Description

@ctubbsii

In DFS, we use a structure like:

<volume>/tables/<tableId>/<tablet>/<file>.rf

Some DFS implementations have limits on the number of entries per directory, and this may be tunable. Some users may tune this to be a small value, relative to the number of tablets on their system. The number of files in a tablet is tunable by users using compactions. But the number of tables and the number of tablets per table are areas that could pose a problem when considering the number of entries in a directory. The number of tablets is particularly important, since it's generally advantageous to have many tablets, in order to distribute work and avoid hotspots.

I propose a directory structure more like:

<volume>/tables/<namespace>/<tableId>/<prefix(tablet)>/<suffix(tablet)>/<file>.rf

For example, a srv:dir for a tablet in table 1a in namespace a that used to look like t-1234567:

hdfs://server:port/accumulo/tables/1a/t-1234567/C0000000a.rf

instead uses srv:dir that looks like t-12/34567 or t-123/4567 or even t-123/456/789 for longer tablet IDs, if needed, resulting in an absolute path that looks like:

hdfs://server:port/accumulo/tables/a/1a/t-12/34567/C0000000a.rf

The exact structure would need to be carefully considered during implementation to ensure that the hierarchy is sufficiently deep so that a reasonably configured max entry limit in DFS will not be at risk of being exceeded by Accumulo, even when users create many splits.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementThis issue describes a new feature, improvement, or optimization.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions