Description
Is your feature request related to a problem? Please describe.
The resources used to define the test cases in data_store_test.py
and data_store_serialization_test.py
are the same in many cases. For example: The structure of _type_attributes
for most entries in both the above mentioned files is quire similar in most cases
DataStore._type_attributes = {
"ft.onto.base_ontology.Document": {
"attributes": {
"begin": {"index": 2, "type": (None, (int,))},
"end": {"index": 3, "type": (None, (int,))},
"payload_idx": {"index": 4, "type": (None, (int,))},
"document_class": {"index": 5, "type": (list, (str,))},
"sentiment": {"index": 6, "type": (dict, (str, float))},
"classifications": {
"index": 7,
"type": (FDict, (str, Classification)),
},
},
"parent_entry": "forte.data.ontology.top.Annotation",
},
}
Describe the solution you'd like
In order to reduce this redundancy, there should be a central file that can store these configurations and a clear format for them to be accessed by these tests. Note that although the configurations look quite similar, there are subtle differences in some cases that are intentional. For example, in data_store_serialization_test.py
, the _type_attributes
for Document
is given by
"ft.onto.base_ontology.Document": {
"attributes": {
"begin": {"index": 2, "type": (None, (int,))},
"end": {"index": 3, "type": (None, (int,))},
"payload_idx": {"index": 4, "type": (None, (int,))},
"sentiment": {"index": 5, "type": (dict, (str, float))},
"classifications": {
"index": 6,
"type": (FDict, (str, Classification)),
},
},
"parent_entry": "forte.data.ontology.top.Annotation",
},
Note that this configuration misses the document_class
attribute intentionally. Thus, the proposed solution needs to have provisions to handle the slight changes in structure.
Additional Context
- This is part of the
data efficiency
project - This PR should be made to the
master
branch.
Activity