Description
🐛 Describe the bug
Torchserve supports batching of multiple requests and batch_size value is provided while registering the model.
Request Envelope receives the input as list of multiple request body but Kserve V2 request envelope picks only the first item in the list of inputs
https://github.com/pytorch/serve/blob/master/ts/torch_handler/request_envelope/kservev2.py#L104
The result being a single output sent back as response causing the mismatch
Error logs
TorchServe Error
stdout MODEL_LOG - model: resnet50-3, number of batch response mismatched, expect: 5, got: 1.
Installation instructions
Followed instructions provided here - https://github.com/pytorch/serve/blob/master/kubernetes/kserve/kserve_wrapper/README.md
Model Packaing
Created a resnet50.mar using default parameters and handler
config.properties
inference_address=http://0.0.0.0:8085/
management_address=http://0.0.0.0:8085/
metrics_address=http://0.0.0.0:8082/
grpc_inference_port=7075
grpc_management_port=7076
enable_envvars_config=true
install_py_dep_per_model=true
enable_metrics_api=true
metrics_format=prometheus
NUM_WORKERS=1
number_of_netty_threads=4
job_queue_size=10
model_store=/mnt/models/model_store
model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"resnet50": {"1.0": {"defaultVersion": true,"marName": "resnet50.mar","minWorkers": 6,"maxWorkers": 6,"batchSize": 16,"maxBatchDelay": 200,"responseTimeout": 2000}}}}
Versions
Name: kserve
Version: 0.10.0
Name: torch
Version: 1.13.1+cu117
Name: torchserve
Version: 0.7.1
Repro instructions
Followed instructions provided here - https://github.com/pytorch/serve/blob/master/kubernetes/kserve/kserve_wrapper/README.md
run the kserve_wrapper main.py and hit multiple curl infer request for v2 protocol
Command used -
seq 1 10 | xargs -n1 -P 5 curl -H "Content-Type: application/json" --data @input_bytes.json http://0.0.0.0:8080/v2/models/resnet50/infer
Possible Solution
Changes required to handle Torchserve batched inputs and generate output for all the requests initiated by TorchServe
Changes are need in parse_input() and format_output() methods in kservev2.py