Skip to content

update #85

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 22 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,18 @@

<div style="display:flex; text-align:center; justify-content:center;">
<img src="https://huggingface.co/front/assets/huggingface_logo.svg" width="100"/>
<h1 style="margin-top:auto;"> Hugging Face Inference Toolkit <h1>
</div>

Hugging Face Inference Toolkit is for serving 🤗 Transformers models in containers. This library provides default pre-processing, predict and postprocessing for Transformers, Sentence Tranfsformers. It is also possible to define custom `handler.py` for customization. The Toolkit is build to work with the [Hugging Face Hub](https://huggingface.co/models).

---
Hugging Face Inference Toolkit is for serving 🤗 Transformers models in containers. This library provides default pre-processing, predict and postprocessing for Transformers, Sentence Tranfsformers. It is also possible to define custom `handler.py` for customization. The Toolkit is build to work with the [Hugging Face Hub](https://huggingface.co/models) and is used as "default" option in [Inference Endpoints](https://ui.endpoints.huggingface.co/)

## 💻 Getting Started with Hugging Face Inference Toolkit
## 💻 Getting Started with Hugging Face Inference Toolkit

* Clone the repository `git clone <https://github.com/huggingface/huggingface-inference-toolkit``>
* Install the dependencies in dev mode `pip install -e ".[torch,st,diffusers,test,quality]"`
* If you develop on AWS inferentia2 install with `pip install -e ".[test,quality]" optimum-neuron[neuronx] --upgrade`
* If you develop on Google Cloud install with `pip install -e ".[torch,st,diffusers,google,test,quality]"`
* Unit Testing: `make unit-test`
* Integration testing: `make integ-test`
- Clone the repository `git clone https://github.com/huggingface/huggingface-inference-toolkit`
- Install the dependencies in dev mode `pip install -e ".[torch,st,diffusers,test,quality]"`
- If you develop on AWS Inferentia2 install with `pip install -e ".[inf2,test,quality]" --upgrade`
- If you develop on Google Cloud install with `pip install -e ".[torch,st,diffusers,google,test,quality]"`
- Unit Testing: `make unit-test`
- Integration testing: `make integ-test`

### Local run

Expand Down Expand Up @@ -68,18 +65,18 @@ curl --request POST \

The Hugging Face Inference Toolkit allows user to provide a custom inference through a `handler.py` file which is located in the repository.

For an example check [philschmid/custom-pipeline-text-classification](https://huggingface.co/philschmid/custom-pipeline-text-classification):
For an example check [philschmid/custom-pipeline-text-classification](https://huggingface.co/philschmid/custom-pipeline-text-classification):

```bash
model.tar.gz/
|- pytorch_model.bin
|- ....
|- handler.py
|- requirements.txt
|- requirements.txt
```

In this example, `pytroch_model.bin` is the model file saved from training, `handler.py` is the custom inference handler, and `requirements.txt` is a requirements file to add additional dependencies.
The custom module can override the following methods:
The custom module can override the following methods:

### Vertex AI Support

Expand Down Expand Up @@ -136,9 +133,9 @@ curl --request POST \

The Hugging Face Inference Toolkit provides support for deploying Hugging Face on AWS Inferentia2. To deploy a model on Inferentia2 you have 3 options:

* Provide `HF_MODEL_ID`, the model repo id on huggingface.co which contains the compiled model under `.neuron` format e.g. `optimum/bge-base-en-v1.5-neuronx`
* Provide the `HF_OPTIMUM_BATCH_SIZE` and `HF_OPTIMUM_SEQUENCE_LENGTH` environment variables to compile the model on the fly, e.g. `HF_OPTIMUM_BATCH_SIZE=1 HF_OPTIMUM_SEQUENCE_LENGTH=128`
* Include `neuron` dictionary in the [config.json](https://huggingface.co/optimum/tiny_random_bert_neuron/blob/main/config.json) file in the model archive, e.g. `neuron: {"static_batch_size": 1, "static_sequence_length": 128}`
- Provide `HF_MODEL_ID`, the model repo id on huggingface.co which contains the compiled model under `.neuron` format e.g. `optimum/bge-base-en-v1.5-neuronx`
- Provide the `HF_OPTIMUM_BATCH_SIZE` and `HF_OPTIMUM_SEQUENCE_LENGTH` environment variables to compile the model on the fly, e.g. `HF_OPTIMUM_BATCH_SIZE=1 HF_OPTIMUM_SEQUENCE_LENGTH=128`
- Include `neuron` dictionary in the [config.json](https://huggingface.co/optimum/tiny_random_bert_neuron/blob/main/config.json) file in the model archive, e.g. `neuron: {"static_batch_size": 1, "static_sequence_length": 128}`

The currently supported tasks can be found [here](https://huggingface.co/docs/optimum-neuron/en/package_reference/supported_models). If you plan to deploy an LLM, we recommend taking a look at [Neuronx TGI](https://huggingface.co/blog/text-generation-inference-on-inferentia2), which is purposly build for LLMs.

Expand All @@ -148,14 +145,14 @@ Start Hugging Face Inference Toolkit with the following environment variables.

_Note: You need to run this on an Inferentia2 instance._

* transformers `text-classification` with `HF_OPTIMUM_BATCH_SIZE` and `HF_OPTIMUM_SEQUENCE_LENGTH`
- transformers `text-classification` with `HF_OPTIMUM_BATCH_SIZE` and `HF_OPTIMUM_SEQUENCE_LENGTH`

```bash
mkdir tmp2/
HF_MODEL_ID="distilbert/distilbert-base-uncased-finetuned-sst-2-english" HF_TASK="text-classification" HF_OPTIMUM_BATCH_SIZE=1 HF_OPTIMUM_SEQUENCE_LENGTH=128 HF_MODEL_DIR=tmp2 uvicorn src.huggingface_inference_toolkit.webservice_starlette:app --port 5000
```

* sentence transformers `feature-extration` with `HF_OPTIMUM_BATCH_SIZE` and `HF_OPTIMUM_SEQUENCE_LENGTH`
- sentence transformers `feature-extration` with `HF_OPTIMUM_BATCH_SIZE` and `HF_OPTIMUM_SEQUENCE_LENGTH`

```bash
HF_MODEL_ID="sentence-transformers/all-MiniLM-L6-v2" HF_TASK="feature-extraction" HF_OPTIMUM_BATCH_SIZE=1 HF_OPTIMUM_SEQUENCE_LENGTH=128 HF_MODEL_DIR=tmp2 uvicorn src.huggingface_inference_toolkit.webservice_starlette:app --port 5000
Expand Down Expand Up @@ -284,19 +281,12 @@ HF_OPTIMUM_SEQUENCE_LENGTH="128"

## ⚙ Supported Front-Ends

* [x] Starlette (HF Endpoints)
* [x] Starlette (Vertex AI)
* [ ] Starlette (Azure ML)
* [ ] Starlette (SageMaker)
- [x] Starlette (HF Endpoints)
- [x] Starlette (Vertex AI)
- [ ] Starlette (Azure ML)
- [ ] Starlette (SageMaker)

---
## 📜 License

## 🤝 Contributing
This project is licensed under the Apache-2.0 License.

---

## 📜 License

TBD.

---
10 changes: 5 additions & 5 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@ no_implicit_optional = true
scripts_are_modules = true

[tool.ruff]
lint.select = [
select = [
"E", # pycodestyle errors
"W", # pycodestyle warnings
"F", # pyflakes
"I", # isort
"C", # flake8-comprehensions
"B", # flake8-bugbear
]
lint.ignore = [
ignore = [
"E501", # Line length (handled by ruff-format)
"B008", # do not perform function calls in argument defaults
"C901", # too complex
Expand All @@ -21,13 +21,13 @@ lint.ignore = [
line-length = 119

# Allow unused variables when underscore-prefixed.
lint.dummy-variable-rgx = "^(_+|(_+[a-zA-Z0-9_]*[a-zA-Z0-9]+?))$"
dummy-variable-rgx = "^(_+|(_+[a-zA-Z0-9_]*[a-zA-Z0-9]+?))$"

# Assume Python 3.11.
target-version = "py311"

lint.per-file-ignores = {"__init__.py" = ["F401"]}
per-file-ignores = { "__init__.py" = ["F401"] }

[tool.isort]
profile = "black"
known_third_party = ["transformers", "starlette", "huggingface_hub"]
known_third_party = ["transformers", "starlette", "huggingface_hub"]
1 change: 0 additions & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ known_third_party =
datasets
tensorflow
torch
robyn

line_length = 119
lines_after_imports = 2
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# We don't declare our dependency on transformers here because we build with
# different packages for different variants

VERSION = "0.4.3"
VERSION = "0.5.0"

# Ubuntu packages
# libsndfile1-dev: torchaudio requires the development version of the libsndfile package which can be installed via a system package manager. On Ubuntu it can be installed as follows: apt install libsndfile1-dev
Expand Down
2 changes: 1 addition & 1 deletion tests/integ/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@
import docker
import pytest
import tenacity
from huggingface_inference_toolkit.utils import _load_repository_from_hf
from transformers.testing_utils import _run_slow_tests

from huggingface_inference_toolkit.utils import _load_repository_from_hf
from tests.integ.config import task2model

HF_HUB_CACHE = os.environ.get("HF_HUB_CACHE", "/home/ubuntu/.cache/huggingface/hub")
Expand Down
2 changes: 1 addition & 1 deletion tests/integ/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@
import pytest
import requests
from docker import DockerClient
from huggingface_inference_toolkit.utils import _load_repository_from_hf
from transformers.testing_utils import _run_slow_tests, require_tf, require_torch

from huggingface_inference_toolkit.utils import _load_repository_from_hf
from tests.integ.config import task2input, task2model, task2output, task2validation

IS_GPU = _run_slow_tests
Expand Down
2 changes: 1 addition & 1 deletion tests/integ/test_pytorch_local_inf2.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import pytest
from huggingface_inference_toolkit.optimum_utils import is_optimum_neuron_available
from transformers.testing_utils import require_torch

from huggingface_inference_toolkit.optimum_utils import is_optimum_neuron_available
from tests.integ.helpers import verify_task

require_inferentia = pytest.mark.skipif(
Expand Down
5 changes: 3 additions & 2 deletions tests/unit/test_diffusers.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
import logging
import tempfile

from huggingface_inference_toolkit.diffusers_utils import IEAutoPipelineForText2Image
from huggingface_inference_toolkit.utils import _load_repository_from_hf, get_pipeline
from PIL import Image
from transformers.testing_utils import require_torch, slow

from huggingface_inference_toolkit.diffusers_utils import IEAutoPipelineForText2Image
from huggingface_inference_toolkit.utils import _load_repository_from_hf, get_pipeline

logging.basicConfig(level="DEBUG")

@require_torch
Expand Down
3 changes: 2 additions & 1 deletion tests/unit/test_handler.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import tempfile

import pytest
from transformers.testing_utils import require_tf, require_torch

from huggingface_inference_toolkit.handler import (
HuggingFaceHandler,
get_inference_handler_either_custom_or_default_handler,
Expand All @@ -9,7 +11,6 @@
_is_gpu_available,
_load_repository_from_hf,
)
from transformers.testing_utils import require_tf, require_torch

TASK = "text-classification"
MODEL = "hf-internal-testing/tiny-random-distilbert"
Expand Down
3 changes: 2 additions & 1 deletion tests/unit/test_optimum_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,14 @@
import tempfile

import pytest
from transformers.testing_utils import require_torch

from huggingface_inference_toolkit.optimum_utils import (
get_input_shapes,
get_optimum_neuron_pipeline,
is_optimum_neuron_available,
)
from huggingface_inference_toolkit.utils import _load_repository_from_hf
from transformers.testing_utils import require_torch

require_inferentia = pytest.mark.skipif(
not is_optimum_neuron_available(),
Expand Down
3 changes: 2 additions & 1 deletion tests/unit/test_sentence_transformers.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
import tempfile

from transformers.testing_utils import require_torch

from huggingface_inference_toolkit.sentence_transformers_utils import (
SentenceEmbeddingPipeline,
get_sentence_transformers_pipeline,
Expand All @@ -8,7 +10,6 @@
_load_repository_from_hf,
get_pipeline,
)
from transformers.testing_utils import require_torch


@require_torch
Expand Down
3 changes: 2 additions & 1 deletion tests/unit/test_serializer.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,10 @@

import numpy as np
import pytest
from huggingface_inference_toolkit.serialization import Audioer, Imager, Jsoner
from PIL import Image

from huggingface_inference_toolkit.serialization import Audioer, Imager, Jsoner


def test_json_serialization():
t = {"res": np.array([2.0]), "text": "I like you.", "float": 1.2}
Expand Down
5 changes: 3 additions & 2 deletions tests/unit/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@
import tempfile
from pathlib import Path

from transformers.file_utils import is_torch_available
from transformers.testing_utils import require_tf, require_torch, slow

from huggingface_inference_toolkit.handler import get_inference_handler_either_custom_or_default_handler
from huggingface_inference_toolkit.utils import (
_get_framework,
Expand All @@ -11,8 +14,6 @@
check_and_register_custom_pipeline_from_directory,
get_pipeline,
)
from transformers.file_utils import is_torch_available
from transformers.testing_utils import require_tf, require_torch, slow

TASK_MODEL = "sshleifer/tiny-dbmdz-bert-large-cased-finetuned-conll03-english"

Expand Down
1 change: 1 addition & 0 deletions tests/unit/test_vertex_ai_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ def _load_repository_from_gcs(artifact_uri: str, target_dir: Path) -> str:
import re

from google.cloud import storage

from huggingface_inference_toolkit.vertex_ai_utils import GCS_URI_PREFIX

if isinstance(target_dir, str):
Expand Down
Loading