Skip to content

Data_Pack level attribute(/"primitive-like") direct access interface (for user) design/requirements/considerations #924

Open
@J007X

Description

@J007X

Is your feature request related to a problem? Please describe.
Per discussion in earlier meetings and emails, a "Data_Pack" level attribute(/"primitive-like") direct access interface (without using classes) for batch-like/mass retrieval is preferred and thus we need a high-level design/considerations/requirement ticket for this, as this interface will be exposed (like and API) to user , so more discussion is needed. Also this ticket is for organizing sub tasks identified during the requirement/design phase.

Describe the solution you'd like
This (data_pack level) attribute(/primitive-like) direct access interface , will provide higher performance for some typical batch-like/mass retrieval scenarios such as NLP pipeline (such as for POS tagging and NER) using Forte. It also extends the capability for accessing attributes "as range/batch" for one or more tid(s), or using specific type, so that the data can be accessed without the need to using classes (thus avoiding related performance overheads).

Describe alternatives you've considered
several overall design is considered, (including discussion around cached data in data_pack), per recent discussion (with Hector) it is now preferred to focus on the current data_store related implementation to first provide some basic interface (and maybe then later to expand its capabilities).

Some current method design/considerations and sub tasks

  • Using specified list of attribute names, and type name for accessing the attributes/primitive-like data for most frequently used data types in typical scenarios (such as NLP pipeline) (Name suggestion: get_attr_of_type, similar to the "get" method of data_pack but adding attr_names: List[str] and optional attr_ids list as parameter)
  • Using specified list of attribute names (or list of attr_id) and tid (or list of tid) for "range-selecting" for attributes for access (Name suggestion: get_attr_data, this will combine the tid/tids methods and attr_name and attr_id options all into one method, as suggested)
  • return format (for attributes) can be dict for easy access using attribute name (and can be together with entry for compatibility/mixed usage scenarios which could be common)
  • Also, write-access is very likely be needed in additional for read-access to further boost performance (in batch mode)
  • Demo python script
  • Documentation (in source code)

Additional context
This is a higher level interface for user to access , unlike (lower level) interfaces in Data_Store (for provided related services)

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions