Open
Description
Feature Request
If this is a feature request, please fill out the following form in full:
Describe the problem the feature is intended to solve
For now, tensorflow serving exports metrics by model like below.
...
:tensorflow:serving:request_count{model_name="test_model",status="OK"} 6
...
:tensorflow:serving:request_latency_bucket{model_name="test_model",API="predict",entrypoint="REST",le="10"} 0
:tensorflow:serving:request_latency_bucket{model_name="test_model",API="predict",entrypoint="REST",le="18"} 0
...
:tensorflow:serving:runtime_latency_bucket{model_name="test_model",API="Predict",runtime="TF1",le="10"} 0
:tensorflow:serving:runtime_latency_bucket{model_name="test_model",API="Predict",runtime="TF1",le="18"} 0
:tensorflow:serving:runtime_latency_bucket{model_name="test_model",API="Predict",runtime="TF1",le="32.4"} 0
...
We cannot collect metrics by signatures, even if the latencies of each signature are very different.
Related codes:
serving/tensorflow_serving/servables/tensorflow/util.h
Lines 118 to 119 in 21360c7
serving/tensorflow_serving/servables/tensorflow/util.h
Lines 122 to 123 in 21360c7
Describe the solution
It must be better if runtime latency and request latency are recorded with signature names.