Skip to content

Commit 4307021

Browse files
authored
Merge pull request #85 from huggingface/prep-os
update
2 parents 28d0018 + 33b84b2 commit 4307021

14 files changed

+46
-50
lines changed

README.md

Lines changed: 22 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,18 @@
1-
21
<div style="display:flex; text-align:center; justify-content:center;">
32
<img src="https://huggingface.co/front/assets/huggingface_logo.svg" width="100"/>
43
<h1 style="margin-top:auto;"> Hugging Face Inference Toolkit <h1>
54
</div>
65

7-
Hugging Face Inference Toolkit is for serving 🤗 Transformers models in containers. This library provides default pre-processing, predict and postprocessing for Transformers, Sentence Tranfsformers. It is also possible to define custom `handler.py` for customization. The Toolkit is build to work with the [Hugging Face Hub](https://huggingface.co/models).
8-
9-
---
6+
Hugging Face Inference Toolkit is for serving 🤗 Transformers models in containers. This library provides default pre-processing, predict and postprocessing for Transformers, Sentence Tranfsformers. It is also possible to define custom `handler.py` for customization. The Toolkit is build to work with the [Hugging Face Hub](https://huggingface.co/models) and is used as "default" option in [Inference Endpoints](https://ui.endpoints.huggingface.co/)
107

11-
## 💻 Getting Started with Hugging Face Inference Toolkit
8+
## 💻 Getting Started with Hugging Face Inference Toolkit
129

13-
* Clone the repository `git clone <https://github.com/huggingface/huggingface-inference-toolkit``>
14-
* Install the dependencies in dev mode `pip install -e ".[torch,st,diffusers,test,quality]"`
15-
* If you develop on AWS inferentia2 install with `pip install -e ".[test,quality]" optimum-neuron[neuronx] --upgrade`
16-
* If you develop on Google Cloud install with `pip install -e ".[torch,st,diffusers,google,test,quality]"`
17-
* Unit Testing: `make unit-test`
18-
* Integration testing: `make integ-test`
10+
- Clone the repository `git clone https://github.com/huggingface/huggingface-inference-toolkit`
11+
- Install the dependencies in dev mode `pip install -e ".[torch,st,diffusers,test,quality]"`
12+
- If you develop on AWS Inferentia2 install with `pip install -e ".[inf2,test,quality]" --upgrade`
13+
- If you develop on Google Cloud install with `pip install -e ".[torch,st,diffusers,google,test,quality]"`
14+
- Unit Testing: `make unit-test`
15+
- Integration testing: `make integ-test`
1916

2017
### Local run
2118

@@ -68,18 +65,18 @@ curl --request POST \
6865

6966
The Hugging Face Inference Toolkit allows user to provide a custom inference through a `handler.py` file which is located in the repository.
7067

71-
For an example check [philschmid/custom-pipeline-text-classification](https://huggingface.co/philschmid/custom-pipeline-text-classification):
68+
For an example check [philschmid/custom-pipeline-text-classification](https://huggingface.co/philschmid/custom-pipeline-text-classification):
7269

7370
```bash
7471
model.tar.gz/
7572
|- pytorch_model.bin
7673
|- ....
7774
|- handler.py
78-
|- requirements.txt
75+
|- requirements.txt
7976
```
8077

8178
In this example, `pytroch_model.bin` is the model file saved from training, `handler.py` is the custom inference handler, and `requirements.txt` is a requirements file to add additional dependencies.
82-
The custom module can override the following methods:
79+
The custom module can override the following methods:
8380

8481
### Vertex AI Support
8582

@@ -136,9 +133,9 @@ curl --request POST \
136133

137134
The Hugging Face Inference Toolkit provides support for deploying Hugging Face on AWS Inferentia2. To deploy a model on Inferentia2 you have 3 options:
138135

139-
* Provide `HF_MODEL_ID`, the model repo id on huggingface.co which contains the compiled model under `.neuron` format e.g. `optimum/bge-base-en-v1.5-neuronx`
140-
* Provide the `HF_OPTIMUM_BATCH_SIZE` and `HF_OPTIMUM_SEQUENCE_LENGTH` environment variables to compile the model on the fly, e.g. `HF_OPTIMUM_BATCH_SIZE=1 HF_OPTIMUM_SEQUENCE_LENGTH=128`
141-
* Include `neuron` dictionary in the [config.json](https://huggingface.co/optimum/tiny_random_bert_neuron/blob/main/config.json) file in the model archive, e.g. `neuron: {"static_batch_size": 1, "static_sequence_length": 128}`
136+
- Provide `HF_MODEL_ID`, the model repo id on huggingface.co which contains the compiled model under `.neuron` format e.g. `optimum/bge-base-en-v1.5-neuronx`
137+
- Provide the `HF_OPTIMUM_BATCH_SIZE` and `HF_OPTIMUM_SEQUENCE_LENGTH` environment variables to compile the model on the fly, e.g. `HF_OPTIMUM_BATCH_SIZE=1 HF_OPTIMUM_SEQUENCE_LENGTH=128`
138+
- Include `neuron` dictionary in the [config.json](https://huggingface.co/optimum/tiny_random_bert_neuron/blob/main/config.json) file in the model archive, e.g. `neuron: {"static_batch_size": 1, "static_sequence_length": 128}`
142139

143140
The currently supported tasks can be found [here](https://huggingface.co/docs/optimum-neuron/en/package_reference/supported_models). If you plan to deploy an LLM, we recommend taking a look at [Neuronx TGI](https://huggingface.co/blog/text-generation-inference-on-inferentia2), which is purposly build for LLMs.
144141

@@ -148,14 +145,14 @@ Start Hugging Face Inference Toolkit with the following environment variables.
148145

149146
_Note: You need to run this on an Inferentia2 instance._
150147

151-
* transformers `text-classification` with `HF_OPTIMUM_BATCH_SIZE` and `HF_OPTIMUM_SEQUENCE_LENGTH`
148+
- transformers `text-classification` with `HF_OPTIMUM_BATCH_SIZE` and `HF_OPTIMUM_SEQUENCE_LENGTH`
152149

153150
```bash
154151
mkdir tmp2/
155152
HF_MODEL_ID="distilbert/distilbert-base-uncased-finetuned-sst-2-english" HF_TASK="text-classification" HF_OPTIMUM_BATCH_SIZE=1 HF_OPTIMUM_SEQUENCE_LENGTH=128 HF_MODEL_DIR=tmp2 uvicorn src.huggingface_inference_toolkit.webservice_starlette:app --port 5000
156153
```
157154

158-
* sentence transformers `feature-extration` with `HF_OPTIMUM_BATCH_SIZE` and `HF_OPTIMUM_SEQUENCE_LENGTH`
155+
- sentence transformers `feature-extration` with `HF_OPTIMUM_BATCH_SIZE` and `HF_OPTIMUM_SEQUENCE_LENGTH`
159156

160157
```bash
161158
HF_MODEL_ID="sentence-transformers/all-MiniLM-L6-v2" HF_TASK="feature-extraction" HF_OPTIMUM_BATCH_SIZE=1 HF_OPTIMUM_SEQUENCE_LENGTH=128 HF_MODEL_DIR=tmp2 uvicorn src.huggingface_inference_toolkit.webservice_starlette:app --port 5000
@@ -284,19 +281,12 @@ HF_OPTIMUM_SEQUENCE_LENGTH="128"
284281

285282
## ⚙ Supported Front-Ends
286283

287-
* [x] Starlette (HF Endpoints)
288-
* [x] Starlette (Vertex AI)
289-
* [ ] Starlette (Azure ML)
290-
* [ ] Starlette (SageMaker)
284+
- [x] Starlette (HF Endpoints)
285+
- [x] Starlette (Vertex AI)
286+
- [ ] Starlette (Azure ML)
287+
- [ ] Starlette (SageMaker)
291288

292-
---
289+
## 📜 License
293290

294-
## 🤝 Contributing
291+
This project is licensed under the Apache-2.0 License.
295292

296-
---
297-
298-
## 📜 License
299-
300-
TBD.
301-
302-
---

pyproject.toml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,15 @@ no_implicit_optional = true
44
scripts_are_modules = true
55

66
[tool.ruff]
7-
lint.select = [
7+
select = [
88
"E", # pycodestyle errors
99
"W", # pycodestyle warnings
1010
"F", # pyflakes
1111
"I", # isort
1212
"C", # flake8-comprehensions
1313
"B", # flake8-bugbear
1414
]
15-
lint.ignore = [
15+
ignore = [
1616
"E501", # Line length (handled by ruff-format)
1717
"B008", # do not perform function calls in argument defaults
1818
"C901", # too complex
@@ -21,13 +21,13 @@ lint.ignore = [
2121
line-length = 119
2222

2323
# Allow unused variables when underscore-prefixed.
24-
lint.dummy-variable-rgx = "^(_+|(_+[a-zA-Z0-9_]*[a-zA-Z0-9]+?))$"
24+
dummy-variable-rgx = "^(_+|(_+[a-zA-Z0-9_]*[a-zA-Z0-9]+?))$"
2525

2626
# Assume Python 3.11.
2727
target-version = "py311"
2828

29-
lint.per-file-ignores = {"__init__.py" = ["F401"]}
29+
per-file-ignores = { "__init__.py" = ["F401"] }
3030

3131
[tool.isort]
3232
profile = "black"
33-
known_third_party = ["transformers", "starlette", "huggingface_hub"]
33+
known_third_party = ["transformers", "starlette", "huggingface_hub"]

setup.cfg

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,6 @@ known_third_party =
1010
datasets
1111
tensorflow
1212
torch
13-
robyn
1413

1514
line_length = 119
1615
lines_after_imports = 2

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
# We don't declare our dependency on transformers here because we build with
66
# different packages for different variants
77

8-
VERSION = "0.4.3"
8+
VERSION = "0.5.0"
99

1010
# Ubuntu packages
1111
# libsndfile1-dev: torchaudio requires the development version of the libsndfile package which can be installed via a system package manager. On Ubuntu it can be installed as follows: apt install libsndfile1-dev

tests/integ/conftest.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@
77
import docker
88
import pytest
99
import tenacity
10-
from huggingface_inference_toolkit.utils import _load_repository_from_hf
1110
from transformers.testing_utils import _run_slow_tests
1211

12+
from huggingface_inference_toolkit.utils import _load_repository_from_hf
1313
from tests.integ.config import task2model
1414

1515
HF_HUB_CACHE = os.environ.get("HF_HUB_CACHE", "/home/ubuntu/.cache/huggingface/hub")

tests/integ/helpers.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,9 @@
88
import pytest
99
import requests
1010
from docker import DockerClient
11-
from huggingface_inference_toolkit.utils import _load_repository_from_hf
1211
from transformers.testing_utils import _run_slow_tests, require_tf, require_torch
1312

13+
from huggingface_inference_toolkit.utils import _load_repository_from_hf
1414
from tests.integ.config import task2input, task2model, task2output, task2validation
1515

1616
IS_GPU = _run_slow_tests

tests/integ/test_pytorch_local_inf2.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
import pytest
2-
from huggingface_inference_toolkit.optimum_utils import is_optimum_neuron_available
32
from transformers.testing_utils import require_torch
43

4+
from huggingface_inference_toolkit.optimum_utils import is_optimum_neuron_available
55
from tests.integ.helpers import verify_task
66

77
require_inferentia = pytest.mark.skipif(

tests/unit/test_diffusers.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
import logging
22
import tempfile
33

4-
from huggingface_inference_toolkit.diffusers_utils import IEAutoPipelineForText2Image
5-
from huggingface_inference_toolkit.utils import _load_repository_from_hf, get_pipeline
64
from PIL import Image
75
from transformers.testing_utils import require_torch, slow
86

7+
from huggingface_inference_toolkit.diffusers_utils import IEAutoPipelineForText2Image
8+
from huggingface_inference_toolkit.utils import _load_repository_from_hf, get_pipeline
9+
910
logging.basicConfig(level="DEBUG")
1011

1112
@require_torch

tests/unit/test_handler.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
import tempfile
22

33
import pytest
4+
from transformers.testing_utils import require_tf, require_torch
5+
46
from huggingface_inference_toolkit.handler import (
57
HuggingFaceHandler,
68
get_inference_handler_either_custom_or_default_handler,
@@ -9,7 +11,6 @@
911
_is_gpu_available,
1012
_load_repository_from_hf,
1113
)
12-
from transformers.testing_utils import require_tf, require_torch
1314

1415
TASK = "text-classification"
1516
MODEL = "hf-internal-testing/tiny-random-distilbert"

tests/unit/test_optimum_utils.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,14 @@
22
import tempfile
33

44
import pytest
5+
from transformers.testing_utils import require_torch
6+
57
from huggingface_inference_toolkit.optimum_utils import (
68
get_input_shapes,
79
get_optimum_neuron_pipeline,
810
is_optimum_neuron_available,
911
)
1012
from huggingface_inference_toolkit.utils import _load_repository_from_hf
11-
from transformers.testing_utils import require_torch
1213

1314
require_inferentia = pytest.mark.skipif(
1415
not is_optimum_neuron_available(),

tests/unit/test_sentence_transformers.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
import tempfile
22

3+
from transformers.testing_utils import require_torch
4+
35
from huggingface_inference_toolkit.sentence_transformers_utils import (
46
SentenceEmbeddingPipeline,
57
get_sentence_transformers_pipeline,
@@ -8,7 +10,6 @@
810
_load_repository_from_hf,
911
get_pipeline,
1012
)
11-
from transformers.testing_utils import require_torch
1213

1314

1415
@require_torch

tests/unit/test_serializer.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,10 @@
22

33
import numpy as np
44
import pytest
5-
from huggingface_inference_toolkit.serialization import Audioer, Imager, Jsoner
65
from PIL import Image
76

7+
from huggingface_inference_toolkit.serialization import Audioer, Imager, Jsoner
8+
89

910
def test_json_serialization():
1011
t = {"res": np.array([2.0]), "text": "I like you.", "float": 1.2}

tests/unit/test_utils.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,9 @@
33
import tempfile
44
from pathlib import Path
55

6+
from transformers.file_utils import is_torch_available
7+
from transformers.testing_utils import require_tf, require_torch, slow
8+
69
from huggingface_inference_toolkit.handler import get_inference_handler_either_custom_or_default_handler
710
from huggingface_inference_toolkit.utils import (
811
_get_framework,
@@ -11,8 +14,6 @@
1114
check_and_register_custom_pipeline_from_directory,
1215
get_pipeline,
1316
)
14-
from transformers.file_utils import is_torch_available
15-
from transformers.testing_utils import require_tf, require_torch, slow
1617

1718
TASK_MODEL = "sshleifer/tiny-dbmdz-bert-large-cased-finetuned-conll03-english"
1819

tests/unit/test_vertex_ai_utils.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ def _load_repository_from_gcs(artifact_uri: str, target_dir: Path) -> str:
2020
import re
2121

2222
from google.cloud import storage
23+
2324
from huggingface_inference_toolkit.vertex_ai_utils import GCS_URI_PREFIX
2425

2526
if isinstance(target_dir, str):

0 commit comments

Comments
 (0)