This repository was archived by the owner on Feb 3, 2025. It is now read-only.
This repository was archived by the owner on Feb 3, 2025. It is now read-only.
Serve tf-trt converted model return error: NodeDef mentions attr 'max_batch_size' not in Op: name=TRTEngineOp #332
Open
Description
I want to use tf-trt to optimize a tf2 model, and then serve with triton. But fail to serve the optimized tf-trt model. Following is the process:
- following this tutorial (https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#introduction), create a tf-trt optimized model
I use image nvcr.io/nvidia/tensorflow:22.07-tf2-py3 to run the code, and successfully created native model and converted model:
models/
├── native_saved_model
│ ├── assets
│ ├── keras_metadata.pb
│ ├── saved_model.pb
│ └── variables
│ ├── variables.data-00000-of-00001
│ └── variables.index
└── tftrt_saved_model
├── assets
│ └── trt-serialized-engine.TRTEngineOp_000_000
├── saved_model.pb
└── variables
├── variables.data-00000-of-00001
└── variables.index
- copy the native and converted model to a repos, and create the dir structure as triton wants:
├── mnist
│ ├── 1
│ │ └── model.savedmodel
│ │ ├── assets
│ │ ├── keras_metadata.pb
│ │ ├── saved_model.pb
│ │ └── variables
│ │ ├── variables.data-00000-of-00001
│ │ └── variables.index
│ └── config.pbtxt
└── mnist_trt
├── 1
│ └── model.savedmodel
│ ├── assets
│ │ └── trt-serialized-engine.TRTEngineOp_000_000
│ ├── saved_model.pb
│ └── variables
│ ├── variables.data-00000-of-00001
│ └── variables.index
└── config.pbtxt
the native model is copied under mnist/1/model.savedmodel, with config.pbtxt like this:
name: "mnist"
platform: "tensorflow_savedmodel"
max_batch_size : 0
the converted model is copied under mnist_trt/1/model.savedmodel, with config.pbtxt the same as above.
-
start the triton server within container nvcr.io/nvidia/tritonserver:22.07-py3, the log shows both models are loaded successfully.
-
try to infer. The client code likes this:
import tensorflow as tf
import numpy as np
import tritonclient.http as httpclient
# Setting up client
url = 'SERVER_IP:8000'
triton_client = httpclient.InferenceServerClient(url=url)
input1_shape = [1, 28, 28]
input1 = httpclient.InferInput("flatten_input", input1_shape, datatype="FP32")
input1_data = np.arange(1*28*28).reshape(1,28,28).astype(np.float32)
print('input1_data: ', input1_data)
input1.set_data_from_numpy(input1_data, binary_data=False)
test_output = httpclient.InferRequestedOutput("dense_1", binary_data=False, class_count=10)
# Querying the server
model_name="mnist"
results = triton_client.infer(model_name=model_name, inputs=[input1], outputs=[test_output])
print(results.as_numpy('dense_1'))
If the model_name is mnist, the infer succeeds, and print the predict result.
[['9575.137695:3' '9021.530273:2' '5957.917969:7' '-416.794525:5'
'-6797.246582:9' '-8895.693359:1' '-9928.074219:0' '-15507.916016:8'
'-22406.882812:6' '-29679.443359:4']]
However, after changing model_name to mnist_trt, the call fails, with error message:
tritonclient.utils.InferenceServerException: NodeDef mentions attr 'max_batch_size' not in Op<name=TRTEngineOp; signature=in_tensor: -> out_tensor:; attr=serialized_segment:string; attr=segment_func:func,default=[]; attr=InT:list(type),min=1,allowed=[DT_INT8, DT_HALF, DT_FLOAT, DT_INT32]; attr=OutT:list(type),min=1,allowed=[DT_INT8, DT_HALF, DT_FLOAT, DT_INT32]; attr=max_cached_engines_count:int,default=1; attr=workspace_size_bytes:int; attr=precision_mode:string,allowed=["FP32", "FP16", "INT8"]; attr=calibration_data:string,default=""; attr=use_calibration:bool,default=true; attr=input_shapes:list(shape),default=[]; attr=output_shapes:list(shape),default=[]; attr=segment_funcdef_name:string,default=""; attr=cached_engine_batches:list(int),default=[],min=0; attr=fixed_input_size:bool,default=true; attr=static_engine:bool,default=true>; NodeDef: {{node TRTEngineOp_000_000}}. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
[[PartitionedCall/PartitionedCall/TRTEngineOp_000_000]]
I guess maybe it's a version issue?
Metadata
Metadata
Assignees
Labels
No labels