You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<h1style="margin-top:auto;"> Hugging Face Inference Toolkit <h1>
5
4
</div>
6
5
7
-
Hugging Face Inference Toolkit is for serving 🤗 Transformers models in containers. This library provides default pre-processing, predict and postprocessing for Transformers, Sentence Tranfsformers. It is also possible to define custom `handler.py` for customization. The Toolkit is build to work with the [Hugging Face Hub](https://huggingface.co/models).
8
-
9
-
---
6
+
Hugging Face Inference Toolkit is for serving 🤗 Transformers models in containers. This library provides default pre-processing, predict and postprocessing for Transformers, Sentence Tranfsformers. It is also possible to define custom `handler.py` for customization. The Toolkit is build to work with the [Hugging Face Hub](https://huggingface.co/models) and is used as "default" option in [Inference Endpoints](https://ui.endpoints.huggingface.co/)
10
7
11
-
## 💻 Getting Started with Hugging Face Inference Toolkit
8
+
## 💻 Getting Started with Hugging Face Inference Toolkit
12
9
13
-
* Clone the repository `git clone <https://github.com/huggingface/huggingface-inference-toolkit``>
14
-
* Install the dependencies in dev mode `pip install -e ".[torch,st,diffusers,test,quality]"`
15
-
* If you develop on AWS inferentia2 install with `pip install -e ".[test,quality]" optimum-neuron[neuronx] --upgrade`
16
-
* If you develop on Google Cloud install with `pip install -e ".[torch,st,diffusers,google,test,quality]"`
17
-
* Unit Testing: `make unit-test`
18
-
* Integration testing: `make integ-test`
10
+
- Clone the repository `git clone https://github.com/huggingface/huggingface-inference-toolkit`
11
+
- Install the dependencies in dev mode `pip install -e ".[torch,st,diffusers,test,quality]"`
12
+
- If you develop on AWS Inferentia2 install with `pip install -e ".[inf2,test,quality]" --upgrade`
13
+
- If you develop on Google Cloud install with `pip install -e ".[torch,st,diffusers,google,test,quality]"`
14
+
- Unit Testing: `make unit-test`
15
+
- Integration testing: `make integ-test`
19
16
20
17
### Local run
21
18
@@ -68,18 +65,18 @@ curl --request POST \
68
65
69
66
The Hugging Face Inference Toolkit allows user to provide a custom inference through a `handler.py` file which is located in the repository.
70
67
71
-
For an example check [philschmid/custom-pipeline-text-classification](https://huggingface.co/philschmid/custom-pipeline-text-classification):
68
+
For an example check [philschmid/custom-pipeline-text-classification](https://huggingface.co/philschmid/custom-pipeline-text-classification):
72
69
73
70
```bash
74
71
model.tar.gz/
75
72
|- pytorch_model.bin
76
73
|- ....
77
74
|- handler.py
78
-
|- requirements.txt
75
+
|- requirements.txt
79
76
```
80
77
81
78
In this example, `pytroch_model.bin` is the model file saved from training, `handler.py` is the custom inference handler, and `requirements.txt` is a requirements file to add additional dependencies.
82
-
The custom module can override the following methods:
79
+
The custom module can override the following methods:
83
80
84
81
### Vertex AI Support
85
82
@@ -136,9 +133,9 @@ curl --request POST \
136
133
137
134
The Hugging Face Inference Toolkit provides support for deploying Hugging Face on AWS Inferentia2. To deploy a model on Inferentia2 you have 3 options:
138
135
139
-
* Provide `HF_MODEL_ID`, the model repo id on huggingface.co which contains the compiled model under `.neuron` format e.g. `optimum/bge-base-en-v1.5-neuronx`
140
-
* Provide the `HF_OPTIMUM_BATCH_SIZE` and `HF_OPTIMUM_SEQUENCE_LENGTH` environment variables to compile the model on the fly, e.g. `HF_OPTIMUM_BATCH_SIZE=1 HF_OPTIMUM_SEQUENCE_LENGTH=128`
141
-
* Include `neuron` dictionary in the [config.json](https://huggingface.co/optimum/tiny_random_bert_neuron/blob/main/config.json) file in the model archive, e.g. `neuron: {"static_batch_size": 1, "static_sequence_length": 128}`
136
+
- Provide `HF_MODEL_ID`, the model repo id on huggingface.co which contains the compiled model under `.neuron` format e.g. `optimum/bge-base-en-v1.5-neuronx`
137
+
- Provide the `HF_OPTIMUM_BATCH_SIZE` and `HF_OPTIMUM_SEQUENCE_LENGTH` environment variables to compile the model on the fly, e.g. `HF_OPTIMUM_BATCH_SIZE=1 HF_OPTIMUM_SEQUENCE_LENGTH=128`
138
+
- Include `neuron` dictionary in the [config.json](https://huggingface.co/optimum/tiny_random_bert_neuron/blob/main/config.json) file in the model archive, e.g. `neuron: {"static_batch_size": 1, "static_sequence_length": 128}`
142
139
143
140
The currently supported tasks can be found [here](https://huggingface.co/docs/optimum-neuron/en/package_reference/supported_models). If you plan to deploy an LLM, we recommend taking a look at [Neuronx TGI](https://huggingface.co/blog/text-generation-inference-on-inferentia2), which is purposly build for LLMs.
144
141
@@ -148,14 +145,14 @@ Start Hugging Face Inference Toolkit with the following environment variables.
148
145
149
146
_Note: You need to run this on an Inferentia2 instance._
150
147
151
-
* transformers `text-classification` with `HF_OPTIMUM_BATCH_SIZE` and `HF_OPTIMUM_SEQUENCE_LENGTH`
148
+
- transformers `text-classification` with `HF_OPTIMUM_BATCH_SIZE` and `HF_OPTIMUM_SEQUENCE_LENGTH`
Copy file name to clipboardExpand all lines: setup.py
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@
5
5
# We don't declare our dependency on transformers here because we build with
6
6
# different packages for different variants
7
7
8
-
VERSION="0.4.3"
8
+
VERSION="0.5.0"
9
9
10
10
# Ubuntu packages
11
11
# libsndfile1-dev: torchaudio requires the development version of the libsndfile package which can be installed via a system package manager. On Ubuntu it can be installed as follows: apt install libsndfile1-dev
0 commit comments