Skip to content

TorchServe crashes in production with `WorkerThread - IllegalStateException error' #3087

@MaelitoP

Description

@MaelitoP

Hello,

I have a problem with TorchServe in production. When I start the server there is no problem and everything works perfectly, I can use the model and make requests on it. But after around 3 days the server crashes and I don't understand why.

If you look at the monitoring metrics, you can see that ts_inference_requests_total is ~1 req/s. And the PredictionTime is between 40 and 600ms.

My attempt to trigger the error in local was unsuccessful 😞

Server error logs (TorchServe)

Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,559 [DEBUG] grpc-default-worker-ELG-3-5 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0x36776bf7, L:/0.0.0.0:7070 - R:/0.0.0.0:18528] INBOUND HEADERS: streamId=7663 headers=GrpcHttp2RequestHeaders[:path: /org.pytorch.serve.grpc.inference.InferenceAPIsService/Predictions padding=0 endStream=false
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,561 [DEBUG] grpc-default-worker-ELG-3-5 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0x36776bf7, L:/0.0.0.0:7070 - R:/0.0.0.0:18528] INBOUND DATA: streamId=7663 padding=0 endStream=true length=186 bytes=xxxxxxxxxxx...
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,561 [DEBUG] grpc-default-worker-ELG-3-5 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0x36776bf7, L:/0.0.0.0:7070 - R:/0.0.0.0:18528] OUTBOUND PING: ack=false bytes=1234
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,562 [DEBUG] W-9007-distilroberta_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd PREDICT repeats 1 to backend at: 1713066279562
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,563 [INFO ] W-9007-distilroberta_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1713066279563
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,564 [DEBUG] grpc-default-worker-ELG-3-5 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0x36776bf7, L:/0.0.0.0:7070 - R:/0.0.0.0:18528] INBOUND PING: ack=true bytes=1234
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,564 [INFO ] W-9007-distilroberta_1.0-stdout MODEL_LOG - Backend received inference at: 1713066279
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,598 [DEBUG] grpc-default-worker-ELG-3-5 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0x36776bf7, L:/0.0.0.0:7070 - R:/0.0.0.0:18528] INBOUND RST_STREAM: streamId=7663 errorCode=0
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,599 [WARN ] grpc-default-executor-371 org.pytorch.serve.grpcimpl.InferenceImpl - grpc client call already cancelled
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,599 [INFO ] grpc-default-executor-371 ACCESS_LOG - /0.0.0.0:18528 "gRPC org.pytorch.serve.grpc.inference.InferenceAPIsService/Predictions HTTP/2.0" 1 38
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,600 [DEBUG] grpc-default-worker-ELG-3-5 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0x36776bf7, L:/0.0.0.0:7070 - R:/0.0.0.0:18528] OUTBOUND RST_STREAM: streamId=7663 errorCode=8
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,650 [INFO ] W-9007-distilroberta_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]PredictionTime.Milliseconds:85.84|#ModelName:distilroberta,Level:Model|#type:GAUGE|#hostname:xxxxxxxx,1713066279,xxxxxxxx-xxxxx-xxxx-xxxx-xxxxxxx, pattern=[METRICS]
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,650 [WARN ] W-9007-distilroberta_1.0 org.pytorch.serve.job.GRPCJob - grpc client call already cancelled, not able to send this response for requestId: xxxxxxxx-xxxxx-xxxx-xxxx-xxxxxxx
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,650 [ERROR] W-9007-distilroberta_1.0 org.pytorch.serve.wlm.WorkerThread - IllegalStateException error
Apr 14 03:44:39 sentiment4_app[941]: java.lang.IllegalStateException: Stream was terminated by error, no further calls are allowed
Apr 14 03:44:39 sentiment4_app[941]: #011at com.google.common.base.Preconditions.checkState(Preconditions.java:502) ~[model-server.jar:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:374) ~[model-server.jar:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at org.pytorch.serve.job.GRPCJob.response(GRPCJob.java:130) ~[model-server.jar:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at org.pytorch.serve.wlm.BatchAggregator.sendResponse(BatchAggregator.java:103) ~[model-server.jar:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:238) [model-server.jar:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at java.lang.Thread.run(Thread.java:840) [?:?]
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,650 [WARN ] W-9007-distilroberta_1.0 org.pytorch.serve.job.GRPCJob - grpc client call already cancelled, not able to send this response for requestId: xxxxxxxx-xxxxx-xxxx-xxxx-xxxxxxx
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,650 [INFO ] W-9007-distilroberta_1.0 ACCESS_LOG - /0.0.0.0:18528 "gRPC org.pytorch.serve.grpc.inference.InferenceAPIsService/Predictions HTTP/2.0" 13 89
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,651 [INFO ] W-9007-distilroberta_1.0-stdout MODEL_LOG - Frontend disconnected.
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,650 [INFO ] epollEventLoopGroup-5-11 org.pytorch.serve.wlm.WorkerThread - 9007 Worker disconnected. WORKER_MODEL_LOADED
Apr 14 03:44:40 sentiment4_app[941]: 2024-04-14T03:44:40,903 [INFO ] W-9007-distilroberta_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9007-distilroberta_1.0-stdout
Apr 14 03:44:40 sentiment4_app[941]: 2024-04-14T03:44:40,903 [INFO ] W-9007-distilroberta_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9007-distilroberta_1.0-stderr
Apr 14 03:44:41 sentiment4_app[941]: 2024-04-14T03:44:41,184 [DEBUG] grpc-default-worker-ELG-3-7 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0xaecf14e8, L:/0.0.0.0:7070 - R:/0.0.0.0:7782] INBOUND HEADERS: streamId=43339 headers=GrpcHttp2RequestHeaders[:path: /org.pytorch.serve.grpc.inference.InferenceAPIsService/Predictions, padding=0 endStream=false
Apr 14 03:44:41 sentiment4_app[941]: 2024-04-14T03:44:41,184 [DEBUG] grpc-default-worker-ELG-3-7 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0xaecf14e8, L:/0.0.0.0:7070 - R:/0.0.0.0:7782] INBOUND DATA: streamId=43339 padding=0 endStream=true length=625 bytes=xxxxxxxxxxxx...
Apr 14 03:44:41 sentiment4_app[941]: 2024-04-14T03:44:41,184 [DEBUG] grpc-default-worker-ELG-3-7 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0xaecf14e8, L:/0.0.0.0:7070 - R:/0.0.0.0:7782] OUTBOUND PING: ack=false bytes=1234
Apr 14 03:44:41 sentiment4_app[941]: 2024-04-14T03:44:41,185 [DEBUG] grpc-default-worker-ELG-3-7 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0xaecf14e8, L:/0.0.0.0:7070 - R:/0.0.0.0:7782] INBOUND PING: ack=true bytes=1234
Apr 14 03:44:41 sentiment4_app[941]: 2024-04-14T03:44:41,850 [DEBUG] grpc-default-worker-ELG-3-5 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0x36776bf7, L:/0.0.0.0:7070 - R:/0.0.0.0:18528] INBOUND HEADERS: streamId=7665 headers=GrpcHttp2RequestHeaders[:path: /org.pytorch.serve.grpc.inference.InferenceAPIsService/Predictions, :method: POST, :scheme: http, te: trailers, content-type: application/grpc, user-agent: grpc-go/1.23.1, grpc-timeout: 120000m

Client error logs (Go client)

Apr 14 03:44:39 augmenter_app[924]: 
  time="2024-04-14T03:44:39Z" 
  level=error 
  msg="An error occurred during augmentation of an article using Sentiment v4" 
  articleId="xxxxx" 
  error="sentiment4c : prediction failed: rpc error: code = Internal desc = Model \"distilroberta\" has no worker to serve inference request. Please use scale workers API to add workers. If this is a sequence inference, please check if it is closed, or expired; or exceeds maxSequenceJobQueueSize\nInternalServerException.()"

Dockerfile

FROM python:3.12.0-slim-bookworm as build-stage

RUN pip install torch-model-archiver==0.9.0

COPY models models
COPY deployments deployments

# Create model for english
RUN mkdir model-store \
    && torch-model-archiver --model-name "distilroberta" --export-path "model-store" --version 1.0  \
    --serialized-file models/distilRoBERTa/pytorch_distil_roberta_sentiment_v4.pt  \
    --extra-files "./models/distilRoBERTa/vocab.json,./models/distilRoBERTa/merges.txt"  \
    --handler "deployments/distilRoBERTa_torchserve_handler.py" --force

# Create model for french, italian, german and spanish
RUN torch-model-archiver --model-name "multilingual_distilbert" --export-path "model-store" --version 1.0  \
    --serialized-file models/multilingual_distilBERT/pytorch_multi_distil_bert_sentiment_v4.pt  \
    --extra-files "./models/multilingual_distilBERT/vocab.txt"  \
    --handler "deployments/multilingual_distilBERT_torchserve_handler.py" --force

FROM pytorch/torchserve:latest as run-stage

COPY --from=build-stage model-store model-store
COPY deployments deployments

RUN pip install transformers==4.37.2

CMD ["torchserve", "--start", "--model-store", "model-store", "--models", "distilroberta=distilroberta.mar multilingual_distilbert=multilingual_distilbert.mar", "--ts-config", "deployments/config.properties"]

Configuration (config.properties)

[DEFAULT]
grpc_inference_port=7070
grpc_management_port=7071
metrics_address=http://127.0.0.1:8082
enable_metrics_api=true
metrics_config=deployments/metrics.yaml
metrics_mode=prometheus
vmargs=-Dlog4j.configurationFile=file:deployments/log4j2.xml
async_logging=true
models={"distilroberta": {"1.0": {"defaultVersion": true,"marName": "distilroberta.mar","minWorkers": 3,"maxWorkers": 6,"batchSize": 1}}}

Thank you so much for you time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions