Skip to content

[Go] Allow prepending dictionary #47

Open
apache/arrow
#37634
@brancz

Description

@brancz

Describe the enhancement requested

The dictionary builders already have methods to insert whole arrays, but unfortunately they cause a lot of potentially unnecessary CPU time.

Take the following scenario: I have two sources of data, one of them is already dictionary encoded, the other is not, so I would like to initialize the dictionary builder with the existing dictionary, and only insert new items for the non-dictionary encodede items. Now comes the important part: I'm ok with inserts potentially creating duplicates in the dictionary.

I would like to propose a new API PrependInitialDict, that takes an array and must be called before inserting into the indices array, otherwise it errors, and then any new dictionary item inserted start at len(initialDict)+i.

Theoretically it could even be designed to insert dicts multiple times, but I would suggest to start the API like this and only extend when we have the use cases.


Alternative I have considered: Prepending the dictionary after building the "new" dictionary and have any indices start at the length. I've found this to not really be workable, for two reasons:

  1. There would still have to be an API to set the initial index.
  2. It would rely on the user actually prepending the dictionary afterward (easy to misuse).
  3. It would be quite awkward to use in scenarios where there are deeply nested lists and structs, where building the final record is primarily done using a record builder, but only this array would be the exception.

cc @zeroshade

Component(s)

Go

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions