Description
Implement request timeout handling to prevent long-running requests from blocking the server.
Current Status - Partially Implemented
Model-level timeout exists but HTTP-level timeout handling is not implemented.
Already Implemented
- Model-level timeout in predict_batch_with_timeout method
- Timeout logging and monitoring
- Graceful timeout error handling at model level
Remaining Work
- Add configurable HTTP request timeout (default 30 seconds)
- Return HTTP 408 Request Timeout for exceeded requests
- Add timeout configuration to CLI arguments
- Add timeout middleware to Axum router
- Add tests for HTTP timeout behavior
- Update documentation with timeout information
Implementation Guidance
- Use tower::timeout::TimeoutLayer for HTTP-level timeouts
- Add --request-timeout CLI argument
- Consider different timeouts for different endpoints
- Integrate with existing model timeout for layered timeout handling
Estimated Difficulty
Easy - Half day (reduced due to existing timeout infrastructure)
Description
Implement request timeout handling to prevent long-running requests from blocking the server.
Current Status - Partially Implemented
Model-level timeout exists but HTTP-level timeout handling is not implemented.
Already Implemented
Remaining Work
Implementation Guidance
Estimated Difficulty
Easy - Half day (reduced due to existing timeout infrastructure)