Skip to content

Latest commit

 

History

History
151 lines (101 loc) · 8.33 KB

File metadata and controls

151 lines (101 loc) · 8.33 KB

How to deploy trained models through TorchServe

  1. Verify requirements: requirements_onnx.txt
  2. Install TorchServe
git clone https://github.com/pytorch/serve.git
- requirements_onnx
# Install dependencies
# cuda is optional
python ./ts_scripts/install_dependencies.py --cuda=cu102

# Latest release
pip install torchserve torch-model-archiver torch-workflow-archiver

# Nightly build
pip install torchserve-nightly torch-model-archiver-nightly torch-workflow-archiver-nightly

Optional: Run the ONNX example to check you have installed it correctly

  1. experiments/torch_converter.py to save as Torch molddde

  2. From ONNX model

  • https://github.com/pytorch/serve/blob/master/test/pytest/test_onnx.py

    1. Define TorchServe handler

    2. Convert ONNX model into TorchServe MAR file

    • Package model
    • Package all the artifacts of the loadable models

    See also: https://medium.com/analytics-vidhya/deploy-huggingface-s-bert-to-production-with-pytorch-serve-27b068026d18

    Converting the trained checkpoint to TorchServe MAR file

    TorchServe uses a format called MAR (Model Archive) to package models and version them inside its model store. To make it accessible from TorchServe, we need to convert our trained BERT checkpoint to this format and attach our handler above.

    ONNX

    torch-model-archiver -f --model-name onnx --version 1.0 --serialized-file models/onnx/onnx_bert/model.onnx --export-path torchserve_m/ --handler torchserve_handlers/bert_handler.py

    torch-model-archiver --model-name "bert" --version 1.0 --serialized-file models/onnx/onnx_bert/model.onnx --extra-files "./models/onnx/onnx_bert/config.json,./models/onnx/onnx_bert/vocab.txt" --handler torchserve_handlers/bert_handler.py --export-path torchserve_m/

    codet5 torch-model-archiver -f --model-name codet5 --version 1.0 --serialized-file models/torch/codet5-base/pytorch_model.bin --export-path models/torch_m/ --handler experiments/torch_serve/t5_handler_codecarbon.py --extra-files "models/torch/codet5-base/config.json,models/torch/codet5-base/generation_config.json"

      torch-model-archiver -f --model-name codet5 --version 1.0 --serialized-file models/torch/codet5-base/pytorch_model.bin --export-path models/torch_m_02/ --handler experiments/torch_serve/t5_handler_codecarbon_2.py  --extra-files "models/torch/codet5-base/config.json,models/torch/codet5-base/generation_config.json"
    

    codeparrot torch-model-archiver -f --model-name codeparrot --version 1.0 --serialized-file models/torch/codeparrot-small/pytorch_model.bin --export-path models/torch_m/ --handler experiments/torch_serve/causal_lm_h.py --extra-files "models/torch/codeparrot-small/config.json,models/torch/codeparrot-small/generation_config.json"

      torch-model-archiver -f --model-name codeparrot --version 1.0 --serialized-file models/torch/codeparrot-small/pytorch_model.bin --export-path models/torch_m_02/ --handler experiments/torch_serve/causal_lm_2.py  --extra-files "models/torch/codeparrot-small/config.json,models/torch/codeparrot-small/generation_config.json"
    

    codegen torch-model-archiver -f --model-name codegen --version 1.0 --serialized-file models/torch/codegen-350-mono/pytorch_model.bin --export-path models/torch_m/ --handler experiments/torch_serve/codegen_h.py --extra-files "models/torch/codegen-350-mono/config.json,models/torch/codegen-350-mono/generation_config.json" codet5p torch-model-archiver -f --model-name codet5p --version 1.0 --serialized-file models/torch/codet5p-220/pytorch_model.bin --export-path models/torch_m/ --handler experiments/torch_serve/t5_handler.py --extra-files "models/torch/codet5p-220/config.json,models/torch/codet5p-220/generation_config.json"

      torch-model-archiver -f --model-name codet5p --version 1.0 --serialized-file models/torch/codet5p-220/pytorch_model.bin --export-path models/torch_m_02/ --handler experiments/torch_serve/t5_handler_codecarbon_2.py  --extra-files "models/torch/codet5p-220/config.json,models/torch/codet5p-220/generation_config.json"
    

    gpt-neo torch-model-archiver -f --model-name gpt-neo --version 1.0 --serialized-file models/torch/gpt-neo-125m/pytorch_model.bin --export-path models/torch_m/ --handler experiments/torch_serve/causal_lm_h.py --extra-files "models/torch/gpt-neo-125m/config.json,models/torch/gpt-neo-125m/generation_config.json"

      torch-model-archiver -f --model-name gpt-neo --version 1.0 --serialized-file models/torch/gpt-neo-125m/pytorch_model.bin --export-path models/torch_m_02/ --handler experiments/torch_serve/causal_lm_2.py  --extra-files "models/torch/gpt-neo-125m/config.json,models/torch/gpt-neo-125m/generation_config.json"
    

    pythia-410m torch-model-archiver -f --model-name pythia-410m --version 1.0 --serialized-file models/torch/pythia-410m/pytorch_model.bin --export-path models/torch_m/ --handler experiments/torch_serve/causal_lm_h.py --extra-files "models/torch/pythia-410m/config.json,models/torch/pythia-410m/generation_config.json"

      torch-model-archiver -f --model-name pythia-410m --version 1.0 --serialized-file models/torch/pythia-410m/pytorch_model.bin --export-path models/torch_m_02/ --handler experiments/torch_serve/causal_lm_2.py  --extra-files "models/torch/pythia-410m/config.json,models/torch/pythia-410m/generation_config.json"
    
  1. Start server mkdir model_store && mv bert.mar model_store && torchserve --start --model-store model_store --models bert=bert.mar

torchserve --start --ncs --model-store torchserve_m/ --models torchserve_m/bert1.mar

nohup torchserve --start --ncs --model-store torchserve_m/ --models torchserve_m/bert1.mar 2>&1 | tee server.log nohup torchserve --start --ncs --model-store models/torch_m/ --models models/torch_m/codet5p.mar 2>&1 | tee server.log

Check status curl http://localhost:8080/ping curl http://localhost:8081/models curl http://localhost:8081/models/bert2

You should see in the logs something like "WORKER_MODEL_LOADED" 5. After serving model

  • Inference API, Management API, and Metrics API, deployed by default on localhost in ports 8080, 8081, and 8082, respectively.
  1. Inference curl -X POST http://127.0.0.1:8080/predictions/bert -T example.txt
  • Verify your response
  • Add the string in the .txt file

Undeerstanding concepts

torch-model-archiver

https://towardsdatascience.com/serving-pytorch-models-with-torchserve-6b8e8cbdb632 https://medium.com/@SrGrace_/a-practical-guide-to-torchserve-197ec913bbd package all model artifacts into a single model archive file

it takes

  • A model checkpoint file (pth, serialized PyTorch state dictionary) or a model definition and a state_dict file in case of eager mode

    • Model file: model architecture, inherits from torch.nn.Module
    • serialized: .pt or .pth and state_dict in case of eager mode
  • Optional assets

  • .mar file

    • the “ready to serve” archive of the model generated with torch-model-archiver.

    Torch Model Archiver is a tool used for creating archives of trained neural net models that can be consumed for TorchServe inference.

    https://github.com/pytorch/serve/tree/master/model-archiver

Handlers

https://github.com/pytorch/serve/blob/master/docs/default_handlers.md

  • initialize(): Initialize the Model object
  • preprocess(): e.g. tokenize input
  • inference()
  • postprocess(): return a list with same length as batchsize

Workers

VIdeos

Pytorch

When inferring, put models in eval mode,

Workflow

https://pytorch.org/serve/workflows.html

Only following output types are supported by workflow models/functions : String, Int, List, Dict of String, int, Json serializable objects, byte array and Torch Tensors

AutoModels

tasks: https://huggingface.co/docs/optimum/exporters/task_manager fill-mask, ORTModelForMaskedLM, text-generation ORTModelForCausalLM encoder decoder models like t5 - AutoModelForSeq2SeqLM