Currently, the mitigation tasks have the lowest success rate(18%) when evaluated with GPT-4o-mini, ReAct.
Failed traces revealed that sometimes the agent could not even locate the faulty service correctly.
To isolate and evaluate mitigation itself, the agent should be given the localization info at the beginning.
Even if the agent manages to locate the faulty service, the localization process(reading long logs etc) takes up the context window and degrades agent performance.