Skip to content

Commit 51c0337

Browse files
#316 improve task type (#369)
* made task_types only accept transformers tasks * renamed base_model_spec * added additional task_type options * updated docu and changes * added unit test case with illegal task_type for delete models udf --------- Co-authored-by: Ariel Schulz <43442541+ArBridgeman@users.noreply.github.com>
1 parent 529c90a commit 51c0337

28 files changed

Lines changed: 824 additions & 495 deletions

doc/changes/unreleased.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ Code name: T.B.D
44

55
## Summary
66

7+
T.B.D
8+
79
### BREAKING CHANGES:
810

911
* The `max_length` parameter has been renamed to `max_new_tokens`, and its behavior changed. Both of these changes where done in accordance with changes in the [transformers library](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.TextGenerationPipeline).
@@ -20,6 +22,21 @@ Code name: T.B.D
2022
| TE_TOKEN_CLASSIFICATION_UDF | AI_EXTRACT_EXTENDED |
2123
| TE_ZERO_SHOT_CLASSIFICATION_UDF | AI_CLASSIFY_EXTENDED |
2224

25+
* `task_type` handling has been changed.
26+
The Transformers extension now allows only specific transformers task types in
27+
the installation and execution of models.
28+
You may need to re-install you models from HuggingFace using the new `task_types` in order to use them.
29+
Models installed with legacy task_types can still be listed and deleted using the respective UDFs.
30+
31+
* Allowed task_types are:
32+
"fill-mask" (previously "filling_mask"),
33+
"translation",
34+
"zero-shot-classification",
35+
"text-classification" (previously "sequence_classification"),
36+
"question-answering",
37+
"text-generation",
38+
"token-classification"
39+
2340
## Features
2441

2542
* #351: Added functionality for installing default models.
@@ -31,6 +48,7 @@ Code name: T.B.D
3148

3249
## Security
3350

51+
* Updated urllib3 (2.5.0 -> 2.6.3)
3452
* Updated exasol-integration-test-docker-environment (4.4.1 -> 5.0.0)
3553
* Updated exasol-script-languages-container-tool (3.4.1 -> 3.5.0)
3654
* Updated exasol-saas-api (2.3.0 -> 2.6.0)
@@ -83,3 +101,4 @@ This release fixes vulnerabilities by updating dependencies:
83101
* #372: Added Transformation Protocol and extracted GetPredictionFromBatch into Transformations
84102
* #374: Extracted Span handling into Transformations
85103
* #375: Added implementation for a generalized extract_unique_param_based_dataframes function
104+
* #316: Changed task_types to only allow transformers task_types, allows underscores and dashes

doc/user_guide/manage_models.md

Lines changed: 66 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -104,21 +104,67 @@ SELECT TE_MODEL_DOWNLOADER_UDF(
104104
* `bucketfs_conn`
105105

106106
Specific parameters
107-
* `token_conn`: The connection name containing the token required for private models. You can use an empty string ('') for public models. For details on how to create a connection object with token information, please check the [Getting Started](./setup#getting-started) section.
107+
* `token_conn`: The connection name containing the token required for private models.
108+
You can use an empty string ('') for public models. For details on how to create a
109+
connection object with token information, please see the
110+
[Getting Started](./setup#getting-started) section.
108111
* `task_type`: See below.
109112

110113
#### Selecting the Task Type
111114

112-
Some models can be used for multiple types of tasks, but Hugging Face Transformers stores different metadata depending on the task of the model, which affects how the model is loaded later. Setting an incorrect task type, or leaving the task type empty may affect the models performance severely.
115+
Some models can be used for multiple types of tasks, but Hugging Face Transformers
116+
stores different metadata depending on the task of the model, which affects how the
117+
model is loaded later. Setting an incorrect task type, or leaving the task type empty
118+
may affect the model's performance severely.
113119

114120
Available task types are:
115-
* `filling_mask`
116-
* `question_answering`
117-
* `sequence_classification`
118-
* `text_generation`
119-
* `token_classification`
120-
* `translation`
121-
* `zero_shot_classification`
121+
122+
| task_type | UDFs using this task_type |
123+
|-------------------------------|-----------------------------|
124+
| `fill-mask` | AI_FILL_MASK_EXTENDED |
125+
| `question-answering` | AI_ANSWER_EXTENDED |
126+
| `text-classification` | AI_CUSTOM_CLASSIFY_EXTENDED |
127+
| `text-classification` | AI_ENTAILMENT_EXTENDED |
128+
| `text-classification` | AI_SENTIMENT |
129+
| `text-classification` | AI_ENTAILMENT_EXTENDED |
130+
| `text-generation` | AI_COMPLETE_EXTENDED |
131+
| `token-classification` | AI_EXTRACT_EXTENDED |
132+
| `token-classification` | AI_EXTRACT_ENTITIES |
133+
| `translation` | AI_TRANSLATE_EXTENDED |
134+
| `zero-shot-classification` | AI_CLASSIFY_EXTENDED |
135+
| `zero-shot-classification` | AI_CLASSIFY |
136+
137+
138+
Note that you may use underscores (`_`) instead of dashes (`-`).
139+
140+
We also support the installation of models using the following transformers tasks:
141+
142+
* `document-question-answering`
143+
* `mask-generation`
144+
* `table-question-answering`
145+
* `feature-extraction`
146+
147+
However, we do not offer built in UDFs for using these models. So if you need models
148+
supporting these tasks, you will need to write your own UDF for running it.
149+
150+
More information about transformers tasks can be found in the [pipeline task parameter description](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.pipeline.task).
151+
152+
##### Legacy Task Types
153+
154+
We previously supported the followin task_types:
155+
156+
* `ai_fill_mask_extended`
157+
* `filling_mask`
158+
* `sequence_classification`
159+
* `ai_complete_extended`
160+
* `ai_extract_extended`
161+
162+
Models installed with these task types are no longer compatible with our
163+
prediction UDFs. If you still need these models, you will need to re-install
164+
them using the new task-types described above.
165+
166+
However, listing and deleting these models will still work using both the UDF and the Python API.
167+
122168

123169
### Model Uploader Script
124170

@@ -164,10 +210,14 @@ Similar to [Store Models in BucketFS](#store-models-in-bucketfs), you have two o
164210
In order to do this, you might need to find out which models are safed in the Exasol BucketFS. To do this,
165211
we provide the `TE_LIST_MODELS_UDF`. See details at the end of this section.
166212

213+
Note: If you installed models using a custom task_type, or a task_type we do
214+
not support anymore, you are still able to list and then delete these models using both the UDF and the Python API.
215+
167216

168217
### Delete Model UDF
169218

170-
Using the `TE_DELETE_MODEL_UDF` below, you can delete a model from BucketFS. The parameter values are similar to that one used in [Store Models in BucketFS](#store-models-in-bucketfs).
219+
Using the `TE_DELETE_MODEL_UDF` below, you can delete a model from BucketFS.
220+
The parameter values are similar to that one used in [Store Models in BucketFS](#store-models-in-bucketfs).
171221

172222
Run the UDF with:
173223

@@ -223,6 +273,10 @@ for potential error messages, in addition to the input.
223273
This UDF will fail to return a model if it was saved with the sub_dir parameter empty,
224274
or if no config.json file can be found in the model files.
225275

276+
Note: If you installed models using a custom task_type, or a task_type we do
277+
not support anymore, you are still able to list these models using the UDF.
278+
279+
226280
Call the UDF like this:
227281

228282
```sql
@@ -233,8 +287,8 @@ SELECT TE_LIST_MODELS_UDF(
233287
```
234288
Example Output:
235289

236-
| BUCKETFS_CONN | SUB_DIR | MODEL_NAME | TASK_NAME | MODEL_PATH | ERROR_MESSAGE |
290+
| BUCKETFS_CONN | SUB_DIR | MODEL_NAME | TASK_TYPE | MODEL_PATH | ERROR_MESSAGE |
237291
|---------------|---------|------------|-----------|--------------------------|---------------|
238-
| conn_name | dir/ | model_name | task_name | dir/model_name_task_name | None |
292+
| conn_name | dir/ | model_name | task_type | dir/model_name_task_type | None |
239293
| ... | ... | ... | ... | ... | ... |
240294

exasol_transformers_extension/deployment/default_udf_parameters.py

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -40,10 +40,4 @@
4040
bucketfs_conn_name=DEFAULT_BUCKETFS_CONN_NAME,
4141
sub_dir=Path(DEFAULT_SUBDIR),
4242
),
43-
"model_for_another_udf": model_spec_factory.create(
44-
model_name="prajjwal1/bert-tiny",
45-
task_type="different_task",
46-
bucketfs_conn_name=DEFAULT_BUCKETFS_CONN_NAME,
47-
sub_dir=Path(DEFAULT_SUBDIR),
48-
),
4943
}

exasol_transformers_extension/resources/templates/ls_models_udf.jinja.sql

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ CREATE OR REPLACE {{ language_alias }} SCALAR SCRIPT "TE_LIST_MODELS_UDF"(
55
bucketfs_conn VARCHAR(2000000),
66
sub_dir VARCHAR(2000000),
77
model_name VARCHAR(2000000),
8-
task_name VARCHAR(2000000),
8+
task_type VARCHAR(2000000),
99
model_path VARCHAR(2000000),
1010
error_message VARCHAR(2000000) ) AS
1111

exasol_transformers_extension/udfs/models/delete_model_udf.py

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -46,9 +46,26 @@ def _delete_model(self, ctx) -> tuple[str, str, str, str, bool, str]:
4646
ctx.task_type,
4747
)
4848

49-
current_model_specification = self._current_model_specification_factory.create(
50-
model_name, task_type, bucketfs_conn, Path(sub_dir)
51-
) # specifies details of Huggingface model
49+
try:
50+
current_model_specification = (
51+
self._current_model_specification_factory.create(
52+
model_name, task_type, bucketfs_conn, Path(sub_dir)
53+
)
54+
) # specifies details of Huggingface model
55+
except ValueError as e:
56+
# if task_type is not allowed for model_specification, use a placeholder for creation
57+
# and then replace using the legacy_set_task_type_from_udf_name.
58+
# needed to allow for deletion of already installed models with illegal task_types
59+
current_model_specification = (
60+
self._current_model_specification_factory.create(
61+
model_name, "fill-mask", bucketfs_conn, Path(sub_dir)
62+
)
63+
) # specifies details of Huggingface model
64+
current_model_specification.task_type = (
65+
current_model_specification.legacy_set_task_type_from_udf_name(
66+
task_type
67+
)
68+
)
5269
try:
5370
# create bucketfs location
5471
bfs_conn_obj = self._exa.get_connection(bucketfs_conn)

exasol_transformers_extension/udfs/models/ls_models_udf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ class ListModelsUDF:
2323
| directory where models are | BucketFS connection |
2424
2525
returns a table of:
26-
bucketfs_conn, sub_dir, model_name, task_name, path of model in BucketFS
26+
bucketfs_conn, sub_dir, model_name, task_type, path of model in BucketFS
2727
"""
2828

2929
def __init__(

exasol_transformers_extension/utils/bucketfs_model_specification.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,9 @@ def __init__(
2020
Name of the model. This is the same name as it's seen on the Huggingface
2121
model card, for example 'cross-encoder/nli-deberta-base'.
2222
task_type:
23-
Name of an NLP task, filling_mask, question_answering,
24-
text_classification, text_generation, token_classification,
25-
translation, zero_shot_classification.
23+
Name of an NLP task, fill-mask, question-answering,
24+
text-classification, text-generation, token-classification,
25+
translation, zero-shot-classification.
2626
bucketfs_conn_name:
2727
Name of the BucketFS connection to retrieve the BucketFS location from.
2828
sub_dir:

0 commit comments

Comments
 (0)