In DFS, we use a structure like:
<volume>/tables/<tableId>/<tablet>/<file>.rf
Some DFS implementations have limits on the number of entries per directory, and this may be tunable. Some users may tune this to be a small value, relative to the number of tablets on their system. The number of files in a tablet is tunable by users using compactions. But the number of tables and the number of tablets per table are areas that could pose a problem when considering the number of entries in a directory. The number of tablets is particularly important, since it's generally advantageous to have many tablets, in order to distribute work and avoid hotspots.
I propose a directory structure more like:
<volume>/tables/<namespace>/<tableId>/<prefix(tablet)>/<suffix(tablet)>/<file>.rf
For example, a srv:dir for a tablet in table 1a in namespace a that used to look like t-1234567:
hdfs://server:port/accumulo/tables/1a/t-1234567/C0000000a.rf
instead uses srv:dir that looks like t-12/34567 or t-123/4567 or even t-123/456/789 for longer tablet IDs, if needed, resulting in an absolute path that looks like:
hdfs://server:port/accumulo/tables/a/1a/t-12/34567/C0000000a.rf
The exact structure would need to be carefully considered during implementation to ensure that the hierarchy is sufficiently deep so that a reasonably configured max entry limit in DFS will not be at risk of being exceeded by Accumulo, even when users create many splits.
In DFS, we use a structure like:
Some DFS implementations have limits on the number of entries per directory, and this may be tunable. Some users may tune this to be a small value, relative to the number of tablets on their system. The number of files in a tablet is tunable by users using compactions. But the number of tables and the number of tablets per table are areas that could pose a problem when considering the number of entries in a directory. The number of tablets is particularly important, since it's generally advantageous to have many tablets, in order to distribute work and avoid hotspots.
I propose a directory structure more like:
For example, a
srv:dirfor a tablet in table1ain namespaceathat used to look liket-1234567:instead uses
srv:dirthat looks liket-12/34567ort-123/4567or event-123/456/789for longer tablet IDs, if needed, resulting in an absolute path that looks like:The exact structure would need to be carefully considered during implementation to ensure that the hierarchy is sufficiently deep so that a reasonably configured max entry limit in DFS will not be at risk of being exceeded by Accumulo, even when users create many splits.