@thomasonzhou for more info. Basically, there exists times where we would want to downcast our data to something a quantized model can accept (ie. camera driver spits out image of FLOAT32, and our model accepts INT8).
Whether we should provide an avenue to downcast tensors before they are passed into our inference backend, or have the inference backend handle that downcasting itself, is up for question.