Loosen model interface checks #1621

shkwsk · 2020-05-09T10:02:20Z

Hello.

This patch loosens the checking of the model interface in gRPC Predict API.
If the key of "request.inputs()" satisfies the required key of "signature.inputs()", the request is predictable.
This allows you to add features to a running gRPC client server at any given time, even though they are not used for prediction.
The advantage of this patch is that the API of the model is backwards compatible when AB testing a model with added features on a running service.

Let me know what you think.

geraldstanje · 2020-05-15T02:52:07Z

@shkwsk
i also run into the same issue. please let me know if the following is what you also mean?

my tfs model uses 10 features, but my grpc request sends 11 features (the 10 features the models needs + 1 additional feature (called dummy - which the model does not need))
--> tf serving fails with the following error:

{
   "error":"Failed to process element: 0 key: dummy of \\'instances\\' list. Error: Invalid argument: JSON object: does not have named input: dummy"
}

when you want to do a/b testing as follows, lets say the grpc client sends a request with 20 features:

modelA (needs 10 features from that request)
modelB (needs 5 features from that request)

currently tensorflow serving would not allow that.

cc @rmothukuru

shkwsk · 2020-05-16T07:57:32Z

@geraldstanje
Thanks for your comment.

It seems to be the same as your issue, but from the error messages you show, it looks like you're using the RESTful API.
This patch loosens the check for keys in the protocol buffer map in gRPC API.

The RESTful API may be able to do the same fix, but I don't have a good idea right now.
In #1622, I have modified the RESTful API to use protocol buffer, and you can use it in conjunction with this PR to solve your error messages:)

geraldstanje · 2020-05-16T14:28:17Z

@shkwsk thanks for the pr - lets get it merged.
im using grpc but only tested it with the rest api in this case...
that pr will make the testing of different models possible.

shkwsk · 2020-05-17T05:08:29Z

@geraldstanje
I don't merge that PR into this because the issues I want to solve are different.
In essence, it's better to loosen the checks in RESTful API.

In any case, I'll be waiting for the official reviewer's comments.

geraldstanje · 2020-05-17T16:52:20Z

@shkwsk i think both grpc and rest interface need to implement the same checks...

cc @misterpeddy @lilao

shkwsk · 2020-05-23T15:39:09Z

@geraldstanje
I tried loosen the checks in the RESTful API as well.
Unfortunately, I can't prepare a model to handle the instances format, so I haven't tested it.
Could you please check if there is still an error?

geraldstanje · 2020-05-23T17:40:44Z

@shkwsk ok. let me test over the weekend.

geraldstanje · 2020-06-06T02:18:57Z

@shkwsk do you have a script to builds the tfs docker image which is based on your branch?

shkwsk · 2020-06-07T15:47:17Z

@geraldstanje
I think you can switch branches and build as documented.
https://github.com/tensorflow/serving/blob/master/tensorflow_serving/g3doc/setup.md#build
I also merged in the latest commits so you can be built.

geraldstanje · 2020-06-07T15:49:45Z

@shkwsk i tried, but that field since Dockerfile.devel clones https://github.com/tensorflow/serving... but your code is in https://github.com/shkwsk/serving/tree/shkwsk-patch1

#!/bin/bash

USER=geri
TF_SERVING_VERSION_GIT_BRANCH=shkwsk-patch1
TAG=shkwsk-patch1

git clone --branch="${TF_SERVING_VERSION_GIT_BRANCH}" https://github.com/shkwsk/serving

cd serving && \
	docker build --pull -t $USER/tensorflow-serving-devel:$TAG \
  	--build-arg TF_SERVING_VERSION_GIT_BRANCH="${TF_SERVING_VERSION_GIT_BRANCH}" \
  	-f tensorflow_serving/tools/docker/Dockerfile.devel .

cd serving && \
	docker build -t $USER/tensorflow-serving:$TAG \
  	--build-arg TF_SERVING_BUILD_IMAGE=$USER/tensorflow-serving-devel:$TAG \
  	-f tensorflow_serving/tools/docker/Dockerfile .

any idea?

shkwsk · 2020-06-07T17:00:09Z

@geraldstanje
How about rewriting the Dockerfile.devel temporarily?
Could you please edit it after the git clone?

diff --git a/tensorflow_serving/tools/docker/Dockerfile.devel b/tensorflow_serving/tools/docker/Dockerfile.devel
index fb247c1..3ac4a35 100644
--- a/tensorflow_serving/tools/docker/Dockerfile.devel
+++ b/tensorflow_serving/tools/docker/Dockerfile.devel
@@ -88,8 +88,8 @@ RUN mkdir /bazel && \

 # Download TF Serving sources (optionally at specific commit).
 WORKDIR /tensorflow-serving
-RUN git clone --branch=${TF_SERVING_VERSION_GIT_BRANCH} https://github.com/tensorflow/serving . && \
-    git remote add upstream https://github.com/tensorflow/serving.git && \
+RUN git clone --branch=${TF_SERVING_VERSION_GIT_BRANCH} https://github.com/shkwsk/serving . && \
+    git remote add upstream https://github.com/tensorflow/serving.git && \
     if [ "${TF_SERVING_VERSION_GIT_COMMIT}" != "head" ]; then git checkout ${TF_SERVING_VERSION_GIT_COMMIT} ; fi

geraldstanje · 2020-06-15T02:20:20Z

@shkwsk i tested both grpc and rest api, both worked fine.

please take a look: @misterpeddy @lilao @netfs

shkwsk · 2020-06-15T13:15:44Z

@geraldstanje Thanks for your testing!

shkwsk · 2020-08-13T14:45:33Z

@misterpeddy @lilao @netfs Please check and merge this PR?

netfs · 2020-12-03T00:29:49Z

i think having different # of features in request compared to that what model expects -- is usually an indicator of training vs serving "skew".

you are assuming here that its kosher to send request with more features than the model expects. This may be OK for your scenario, but might be an indicator of bad model pushed to serving in other cases (and you will want to fail the request in such cases, for monitoring to alert appropriately).

IOW i am not convinced that this change is safe for all situations and its OK to loosen this check.

shkwsk · 2020-12-05T16:48:04Z

@netfs
I understand what you mean.

But I need this patch for model migration on the running service.
Loosening checks allows to adding/removing features seamlessly.

If you are not convinced to loosen the validation checks by default, how about adding the option to choose to loosen?

netfs · 2020-12-07T20:31:29Z

@netfs
I understand what you mean.

But I need this patch for model migration on the running service.
Loosening checks allows to adding/removing features seamlessly.

why can't versioning be used to do this safely? IOW send versioned requests. And maybe use labels to make the transition usable [0].

If you are not convinced to loosen the validation checks by default, how about adding the option to choose to loosen?

[0] https://www.tensorflow.org/tfx/serving/serving_config#assigning_string_labels_to_model_versions_to_simplify_canary_and_rollback

shkwsk · 2020-12-28T10:35:33Z

@netfs
If you want to switch the model in version, you need to distribute the configuration file to the server.

It is difficult to manage the status of all servers while distributing the file to each server.
Also, the client needs to know the status of each server and the version of the model on the stateful server.

I update multiple models daily on our S3 bucket, and run AB tests between the updated models.
It is not practical to switch the model version on the client side on a stateful server every day.
This is very costly, so I want to switch the requests by model_name.

netfs · 2021-01-06T18:13:07Z

I update multiple models daily on our S3 bucket, and run AB tests between the updated models.
It is not practical to switch the model version on the client side on a stateful server every day.

you still need to notify the client to switch its inference request to contain different set of features (corresponding to the new B version of the model) -- no? how is this switch different than asking client to also use a different explicit version?

This is very costly, so I want to switch the requests by model_name.

Arnold1 · 2022-06-03T17:22:30Z

@netfs @shkwsk @chrisolston any progress on that?

Loosen model interface checks

fa48971

googlebot added the cla: yes label May 9, 2020

Loosen model interface checks in restful api

7d30ab4

fix indent

02ad65d

Merge branch 'master' into shkwsk-patch1

07451ec

netfs self-requested a review December 3, 2020 02:57

Loosen model interface checks #1621

Are you sure you want to change the base?

Loosen model interface checks #1621

Uh oh!

Conversation

shkwsk commented May 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

geraldstanje commented May 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shkwsk commented May 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

geraldstanje commented May 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shkwsk commented May 17, 2020

Uh oh!

geraldstanje commented May 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shkwsk commented May 23, 2020

Uh oh!

geraldstanje commented May 23, 2020

Uh oh!

geraldstanje commented Jun 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shkwsk commented Jun 7, 2020

Uh oh!

geraldstanje commented Jun 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shkwsk commented Jun 7, 2020

Uh oh!

geraldstanje commented Jun 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shkwsk commented Jun 15, 2020

Uh oh!

shkwsk commented Aug 13, 2020

Uh oh!

netfs commented Dec 3, 2020

Uh oh!

shkwsk commented Dec 5, 2020

Uh oh!

netfs commented Dec 7, 2020

Uh oh!

shkwsk commented Dec 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netfs commented Jan 6, 2021

Uh oh!

Arnold1 commented Jun 3, 2022

Uh oh!

Uh oh!

shkwsk commented May 9, 2020 •

edited

Loading

geraldstanje commented May 15, 2020 •

edited

Loading

shkwsk commented May 16, 2020 •

edited

Loading

geraldstanje commented May 16, 2020 •

edited

Loading

geraldstanje commented May 17, 2020 •

edited

Loading

geraldstanje commented Jun 6, 2020 •

edited

Loading

geraldstanje commented Jun 7, 2020 •

edited

Loading

geraldstanje commented Jun 15, 2020 •

edited

Loading

shkwsk commented Dec 28, 2020 •

edited

Loading