Challenges
The SDK release agent is becoming a critical path for self-service releases. As the agent evolves, we need a reliable way to validate behavior.
Today, SH team has been asked to test for the SDK release agent manual which is difficult to test all cases and is unable to scale, there is interest in leveraging an eval approach for covering a scenario set more consistently as the agent changes
Goal
Build a test and evaluation framework for the SDK release agent that enables
- Repeatable scenario-based testing
- Regression detection for agent changes
- Step evaluation and end-to-end flow evaluation
The Shanghai team can contribute real service-team release scenarios encountered in self-service releases and help convert them into reusable test cases for the scenario catalog.
Challenges
The SDK release agent is becoming a critical path for self-service releases. As the agent evolves, we need a reliable way to validate behavior.
Today, SH team has been asked to test for the SDK release agent manual which is difficult to test all cases and is unable to scale, there is interest in leveraging an eval approach for covering a scenario set more consistently as the agent changes
Goal
Build a test and evaluation framework for the SDK release agent that enables
The Shanghai team can contribute real service-team release scenarios encountered in self-service releases and help convert them into reusable test cases for the scenario catalog.