Skip to content

pipline 部署 cascade 模型 http 分析了多张图像后报 Segmentation fault #1276

Closed
@okooo5km

Description

如题,使用 pipline 的方式部署了 cascade 服务,使用 http 接口进行图像预测,使用多线程方式调接口,分析了多张图像之后出现段错误。

测试环境

  • CUDA 11.2
  • 显卡:RTX 3090
  • python 3.7.0
  • PaddlePaddle 2.1.0.post112
  • paddle-serving-server-gpu 0.6.0.post11
  • paddle_serving_app 0.6.0

web_service.py

import sys
import base64
import logging

import cv2
import numpy as np
from paddle_serving_app.reader import *
from paddle_serving_server.web_service import WebService, Op

NAME = "bank"

class CascadeBankOp(Op):
    def init_op(self):
        self.img_preprocess = Sequential([
            BGR2RGB(), Div(255.0),
            Normalize([0.4966, 0.4876, 0.4861], [0.0270, 0.0245, 0.0238], False),
            Resize((720, 1280)), Transpose((2, 0, 1)), PadStride(32)
        ])
        self.img_postprocess = RCNNPostprocess("label_list.txt", "output")

    def preprocess(self, input_dicts, data_id, log_id):
        (_, input_dict), = input_dicts.items()
        imgs = []
        #print("keys", input_dict.keys())
        for key in input_dict.keys():
            data = base64.b64decode(input_dict[key].encode('utf8'))
            data = np.fromstring(data, np.uint8)
            im = cv2.imdecode(data, cv2.IMREAD_COLOR)
            im = self.img_preprocess(im)
            imgs.append({
              "image": im[np.newaxis,:],
              "im_shape": np.array(list(im.shape[1:])).reshape(-1)[np.newaxis,:],
              "scale_factor": np.array([1.0, 1.0]).reshape(-1)[np.newaxis,:],
            })
        feed_dict = {
            "image": np.concatenate([x["image"] for x in imgs], axis=0),
            "im_shape": np.concatenate([x["im_shape"] for x in imgs], axis=0),
            "scale_factor": np.concatenate([x["scale_factor"] for x in imgs], axis=0)
        }
        #for key in feed_dict.keys():
        #    print(key, feed_dict[key].shape)
        return feed_dict, False, None, ""

    def postprocess(self, input_dicts, fetch_dict, log_id):
        # print(fetch_dict)
        bbox_result = self.img_postprocess(fetch_dict, visualize=False)
        bbox_result = list(filter(lambda x: x.get("score") > 0.5, bbox_result))
        res_dict = {"bbox_result": str(bbox_result)}
        return res_dict, None, ""


class CascadeBankService(WebService):
    def get_pipeline_response(self, read_op):
        cascade_bank_op = CascadeBankOp(name=NAME, input_ops=[read_op])
        return cascade_bank_op


cascade_bank_service = CascadeBankService(name=NAME)
cascade_bank_service.prepare_pipeline_config("config.yml")
cascade_bank_service.run_service()

config.yaml

dag:
  is_thread_op: false
  tracer:
    interval_s: 30
http_port: 9292
op:
  bank:
    concurrency: 4

    local_service_conf:
      client_type: local_predictor
      device_type: 1
      devices: '8'
      fetch_list:
      - save_infer_model/scale_0.tmp_1
      model_config: serving_server/
rpc_port: 9998
worker_num: 32

错误信息

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   paddle::AnalysisPredictor::ZeroCopyRun()
1   paddle::framework::NaiveExecutor::Run()
2   paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)
3   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const
4   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const
5   std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CPUPlace, false, 2ul, paddle::operators::SliceKernel<paddle::platform::C
PUDeviceContext, int>, paddle::operators::SliceKernel<paddle::platform::CPUDeviceContext, long>, paddle::operators::SliceKernel<paddle::platform::CPUDeviceContext, float>, paddle::operators::SliceKernel<paddle::p
latform::CPUDeviceContext, double>, paddle::operators::SliceKernel<paddle::platform::CPUDeviceContext, paddle::platform::complex64>, paddle::operators::SliceKernel<paddle::platform::CPUDeviceContext, paddle::plat
form::complex128> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&)
6   void paddle::operators::SliceKernel<paddle::platform::CPUDeviceContext, float>::SliceCompute<2ul>(paddle::framework::ExecutionContext const&) const
7   Eigen::internal::TensorExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 2, 1, int>, 16, Eigen::MakePointer>, Eigen::TensorSlicingOp<Eigen::DSizes<int, 2> const, Eigen::DSizes<int, 2> const
, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, int>, 16, Eigen::MakePointer> const> const> const, Eigen::DefaultDevice, true, (Eigen::internal::TiledEvaluation)1>::run(Eigen::TensorAssignOp<Eigen::TensorMap<
Eigen::Tensor<float, 2, 1, int>, 16, Eigen::MakePointer>, Eigen::TensorSlicingOp<Eigen::DSizes<int, 2> const, Eigen::DSizes<int, 2> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, int>, 16, Eigen::MakePo
inter> const> const> const&, Eigen::DefaultDevice const&)
8   paddle::framework::SignalHandle(char const*, int)
9   paddle::platform::GetCurrentTraceBackString[abi:cxx11]()

----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
  [TimeInfo: *** Aborted at 1623228748 (unix time) try "date -d @1623228748" if you are using GNU date ***]
  [SignalInfo: *** SIGSEGV (@0x7fd13d3f1600) received by PID 17800 (TID 0x7fd2b7261740) from PID 1027544576 ***]

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

questionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions