Description
Hello,
I have a problem with TorchServe
in production. When I start the server there is no problem and everything works perfectly, I can use the model and make requests on it. But after around 3 days the server crashes and I don't understand why.
If you look at the monitoring metrics, you can see that ts_inference_requests_total
is ~1 req/s. And the PredictionTime
is between 40 and 600ms.
My attempt to trigger the error in local was unsuccessful 😞
Server error logs (TorchServe
)
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,559 [DEBUG] grpc-default-worker-ELG-3-5 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0x36776bf7, L:/0.0.0.0:7070 - R:/0.0.0.0:18528] INBOUND HEADERS: streamId=7663 headers=GrpcHttp2RequestHeaders[:path: /org.pytorch.serve.grpc.inference.InferenceAPIsService/Predictions padding=0 endStream=false
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,561 [DEBUG] grpc-default-worker-ELG-3-5 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0x36776bf7, L:/0.0.0.0:7070 - R:/0.0.0.0:18528] INBOUND DATA: streamId=7663 padding=0 endStream=true length=186 bytes=xxxxxxxxxxx...
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,561 [DEBUG] grpc-default-worker-ELG-3-5 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0x36776bf7, L:/0.0.0.0:7070 - R:/0.0.0.0:18528] OUTBOUND PING: ack=false bytes=1234
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,562 [DEBUG] W-9007-distilroberta_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd PREDICT repeats 1 to backend at: 1713066279562
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,563 [INFO ] W-9007-distilroberta_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1713066279563
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,564 [DEBUG] grpc-default-worker-ELG-3-5 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0x36776bf7, L:/0.0.0.0:7070 - R:/0.0.0.0:18528] INBOUND PING: ack=true bytes=1234
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,564 [INFO ] W-9007-distilroberta_1.0-stdout MODEL_LOG - Backend received inference at: 1713066279
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,598 [DEBUG] grpc-default-worker-ELG-3-5 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0x36776bf7, L:/0.0.0.0:7070 - R:/0.0.0.0:18528] INBOUND RST_STREAM: streamId=7663 errorCode=0
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,599 [WARN ] grpc-default-executor-371 org.pytorch.serve.grpcimpl.InferenceImpl - grpc client call already cancelled
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,599 [INFO ] grpc-default-executor-371 ACCESS_LOG - /0.0.0.0:18528 "gRPC org.pytorch.serve.grpc.inference.InferenceAPIsService/Predictions HTTP/2.0" 1 38
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,600 [DEBUG] grpc-default-worker-ELG-3-5 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0x36776bf7, L:/0.0.0.0:7070 - R:/0.0.0.0:18528] OUTBOUND RST_STREAM: streamId=7663 errorCode=8
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,650 [INFO ] W-9007-distilroberta_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]PredictionTime.Milliseconds:85.84|#ModelName:distilroberta,Level:Model|#type:GAUGE|#hostname:xxxxxxxx,1713066279,xxxxxxxx-xxxxx-xxxx-xxxx-xxxxxxx, pattern=[METRICS]
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,650 [WARN ] W-9007-distilroberta_1.0 org.pytorch.serve.job.GRPCJob - grpc client call already cancelled, not able to send this response for requestId: xxxxxxxx-xxxxx-xxxx-xxxx-xxxxxxx
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,650 [ERROR] W-9007-distilroberta_1.0 org.pytorch.serve.wlm.WorkerThread - IllegalStateException error
Apr 14 03:44:39 sentiment4_app[941]: java.lang.IllegalStateException: Stream was terminated by error, no further calls are allowed
Apr 14 03:44:39 sentiment4_app[941]: #011at com.google.common.base.Preconditions.checkState(Preconditions.java:502) ~[model-server.jar:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:374) ~[model-server.jar:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at org.pytorch.serve.job.GRPCJob.response(GRPCJob.java:130) ~[model-server.jar:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at org.pytorch.serve.wlm.BatchAggregator.sendResponse(BatchAggregator.java:103) ~[model-server.jar:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:238) [model-server.jar:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at java.lang.Thread.run(Thread.java:840) [?:?]
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,650 [WARN ] W-9007-distilroberta_1.0 org.pytorch.serve.job.GRPCJob - grpc client call already cancelled, not able to send this response for requestId: xxxxxxxx-xxxxx-xxxx-xxxx-xxxxxxx
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,650 [INFO ] W-9007-distilroberta_1.0 ACCESS_LOG - /0.0.0.0:18528 "gRPC org.pytorch.serve.grpc.inference.InferenceAPIsService/Predictions HTTP/2.0" 13 89
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,651 [INFO ] W-9007-distilroberta_1.0-stdout MODEL_LOG - Frontend disconnected.
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,650 [INFO ] epollEventLoopGroup-5-11 org.pytorch.serve.wlm.WorkerThread - 9007 Worker disconnected. WORKER_MODEL_LOADED
Apr 14 03:44:40 sentiment4_app[941]: 2024-04-14T03:44:40,903 [INFO ] W-9007-distilroberta_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9007-distilroberta_1.0-stdout
Apr 14 03:44:40 sentiment4_app[941]: 2024-04-14T03:44:40,903 [INFO ] W-9007-distilroberta_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9007-distilroberta_1.0-stderr
Apr 14 03:44:41 sentiment4_app[941]: 2024-04-14T03:44:41,184 [DEBUG] grpc-default-worker-ELG-3-7 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0xaecf14e8, L:/0.0.0.0:7070 - R:/0.0.0.0:7782] INBOUND HEADERS: streamId=43339 headers=GrpcHttp2RequestHeaders[:path: /org.pytorch.serve.grpc.inference.InferenceAPIsService/Predictions, padding=0 endStream=false
Apr 14 03:44:41 sentiment4_app[941]: 2024-04-14T03:44:41,184 [DEBUG] grpc-default-worker-ELG-3-7 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0xaecf14e8, L:/0.0.0.0:7070 - R:/0.0.0.0:7782] INBOUND DATA: streamId=43339 padding=0 endStream=true length=625 bytes=xxxxxxxxxxxx...
Apr 14 03:44:41 sentiment4_app[941]: 2024-04-14T03:44:41,184 [DEBUG] grpc-default-worker-ELG-3-7 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0xaecf14e8, L:/0.0.0.0:7070 - R:/0.0.0.0:7782] OUTBOUND PING: ack=false bytes=1234
Apr 14 03:44:41 sentiment4_app[941]: 2024-04-14T03:44:41,185 [DEBUG] grpc-default-worker-ELG-3-7 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0xaecf14e8, L:/0.0.0.0:7070 - R:/0.0.0.0:7782] INBOUND PING: ack=true bytes=1234
Apr 14 03:44:41 sentiment4_app[941]: 2024-04-14T03:44:41,850 [DEBUG] grpc-default-worker-ELG-3-5 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0x36776bf7, L:/0.0.0.0:7070 - R:/0.0.0.0:18528] INBOUND HEADERS: streamId=7665 headers=GrpcHttp2RequestHeaders[:path: /org.pytorch.serve.grpc.inference.InferenceAPIsService/Predictions, :method: POST, :scheme: http, te: trailers, content-type: application/grpc, user-agent: grpc-go/1.23.1, grpc-timeout: 120000m
Client error logs (Go client)
Apr 14 03:44:39 augmenter_app[924]:
time="2024-04-14T03:44:39Z"
level=error
msg="An error occurred during augmentation of an article using Sentiment v4"
articleId="xxxxx"
error="sentiment4c : prediction failed: rpc error: code = Internal desc = Model \"distilroberta\" has no worker to serve inference request. Please use scale workers API to add workers. If this is a sequence inference, please check if it is closed, or expired; or exceeds maxSequenceJobQueueSize\nInternalServerException.()"
Dockerfile
FROM python:3.12.0-slim-bookworm as build-stage
RUN pip install torch-model-archiver==0.9.0
COPY models models
COPY deployments deployments
# Create model for english
RUN mkdir model-store \
&& torch-model-archiver --model-name "distilroberta" --export-path "model-store" --version 1.0 \
--serialized-file models/distilRoBERTa/pytorch_distil_roberta_sentiment_v4.pt \
--extra-files "./models/distilRoBERTa/vocab.json,./models/distilRoBERTa/merges.txt" \
--handler "deployments/distilRoBERTa_torchserve_handler.py" --force
# Create model for french, italian, german and spanish
RUN torch-model-archiver --model-name "multilingual_distilbert" --export-path "model-store" --version 1.0 \
--serialized-file models/multilingual_distilBERT/pytorch_multi_distil_bert_sentiment_v4.pt \
--extra-files "./models/multilingual_distilBERT/vocab.txt" \
--handler "deployments/multilingual_distilBERT_torchserve_handler.py" --force
FROM pytorch/torchserve:latest as run-stage
COPY --from=build-stage model-store model-store
COPY deployments deployments
RUN pip install transformers==4.37.2
CMD ["torchserve", "--start", "--model-store", "model-store", "--models", "distilroberta=distilroberta.mar multilingual_distilbert=multilingual_distilbert.mar", "--ts-config", "deployments/config.properties"]
Configuration (config.properties
)
[DEFAULT]
grpc_inference_port=7070
grpc_management_port=7071
metrics_address=http://127.0.0.1:8082
enable_metrics_api=true
metrics_config=deployments/metrics.yaml
metrics_mode=prometheus
vmargs=-Dlog4j.configurationFile=file:deployments/log4j2.xml
async_logging=true
models={"distilroberta": {"1.0": {"defaultVersion": true,"marName": "distilroberta.mar","minWorkers": 3,"maxWorkers": 6,"batchSize": 1}}}
Thank you so much for you time
Metadata
Metadata
Assignees
Labels
No labels