Description
Problem Statement
Ray Serve is a scalable model serving library for building online inference APIs. Serve is framework-agnostic, so you can use a single toolkit to serve everything from deep learning models built with frameworks like PyTorch, TensorFlow, and Keras, to Scikit-Learn models, to arbitrary Python business logic. It has several features and performance optimizations for serving Large Language Models such as response streaming, dynamic request batching, multi-node/multi-GPU serving, etc.
We should create a Ray Serve integration (ideally both tracing and errors). This integration was requested during a customer call with @leokster and @smeubank (internal call notes), and it would help enhance our LLM Monitoring offering.
Solution Brainstorm
Ray integrates with OTel, so we might want to wait until #2251 is implemented so we can use OTel for this integration. However, the Ray's OTel integration is no longer actively maintained, so we might need to fork it and maintain it ourselves.
Related: #2400 (Ray Remote integration)