Skip to content
This repository was archived by the owner on Feb 3, 2025. It is now read-only.
This repository was archived by the owner on Feb 3, 2025. It is now read-only.

Serve tf-trt converted model return error: NodeDef mentions attr 'max_batch_size' not in Op: name=TRTEngineOp #332

Open
@biaochen

Description

@biaochen

I want to use tf-trt to optimize a tf2 model, and then serve with triton. But fail to serve the optimized tf-trt model. Following is the process:

  1. following this tutorial (https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#introduction), create a tf-trt optimized model
    I use image nvcr.io/nvidia/tensorflow:22.07-tf2-py3 to run the code, and successfully created native model and converted model:
models/
├── native_saved_model
│   ├── assets
│   ├── keras_metadata.pb
│   ├── saved_model.pb
│   └── variables
│       ├── variables.data-00000-of-00001
│       └── variables.index
└── tftrt_saved_model
    ├── assets
    │   └── trt-serialized-engine.TRTEngineOp_000_000
    ├── saved_model.pb
    └── variables
        ├── variables.data-00000-of-00001
        └── variables.index
  1. copy the native and converted model to a repos, and create the dir structure as triton wants:
├── mnist
│   ├── 1
│   │   └── model.savedmodel
│   │       ├── assets
│   │       ├── keras_metadata.pb
│   │       ├── saved_model.pb
│   │       └── variables
│   │           ├── variables.data-00000-of-00001
│   │           └── variables.index
│   └── config.pbtxt
└── mnist_trt
    ├── 1
    │   └── model.savedmodel
    │       ├── assets
    │       │   └── trt-serialized-engine.TRTEngineOp_000_000
    │       ├── saved_model.pb
    │       └── variables
    │           ├── variables.data-00000-of-00001
    │           └── variables.index
    └── config.pbtxt

the native model is copied under mnist/1/model.savedmodel, with config.pbtxt like this:

name: "mnist"
platform: "tensorflow_savedmodel"
max_batch_size : 0

the converted model is copied under mnist_trt/1/model.savedmodel, with config.pbtxt the same as above.

  1. start the triton server within container nvcr.io/nvidia/tritonserver:22.07-py3, the log shows both models are loaded successfully.

  2. try to infer. The client code likes this:

import tensorflow as tf
import numpy as np
import tritonclient.http as httpclient

# Setting up client
url = 'SERVER_IP:8000'
triton_client = httpclient.InferenceServerClient(url=url)
input1_shape = [1, 28, 28]
input1 = httpclient.InferInput("flatten_input", input1_shape, datatype="FP32")
input1_data = np.arange(1*28*28).reshape(1,28,28).astype(np.float32)
print('input1_data: ', input1_data)
input1.set_data_from_numpy(input1_data, binary_data=False)

test_output = httpclient.InferRequestedOutput("dense_1", binary_data=False, class_count=10)

# Querying the server
model_name="mnist"
results = triton_client.infer(model_name=model_name, inputs=[input1], outputs=[test_output])
print(results.as_numpy('dense_1'))

If the model_name is mnist, the infer succeeds, and print the predict result.

[['9575.137695:3' '9021.530273:2' '5957.917969:7' '-416.794525:5'
'-6797.246582:9' '-8895.693359:1' '-9928.074219:0' '-15507.916016:8'
'-22406.882812:6' '-29679.443359:4']]

However, after changing model_name to mnist_trt, the call fails, with error message:

tritonclient.utils.InferenceServerException: NodeDef mentions attr 'max_batch_size' not in Op<name=TRTEngineOp; signature=in_tensor: -> out_tensor:; attr=serialized_segment:string; attr=segment_func:func,default=[]; attr=InT:list(type),min=1,allowed=[DT_INT8, DT_HALF, DT_FLOAT, DT_INT32]; attr=OutT:list(type),min=1,allowed=[DT_INT8, DT_HALF, DT_FLOAT, DT_INT32]; attr=max_cached_engines_count:int,default=1; attr=workspace_size_bytes:int; attr=precision_mode:string,allowed=["FP32", "FP16", "INT8"]; attr=calibration_data:string,default=""; attr=use_calibration:bool,default=true; attr=input_shapes:list(shape),default=[]; attr=output_shapes:list(shape),default=[]; attr=segment_funcdef_name:string,default=""; attr=cached_engine_batches:list(int),default=[],min=0; attr=fixed_input_size:bool,default=true; attr=static_engine:bool,default=true>; NodeDef: {{node TRTEngineOp_000_000}}. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
[[PartitionedCall/PartitionedCall/TRTEngineOp_000_000]]

I guess maybe it's a version issue?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions