Open
Description
Search before asking
- I had searched in the issues and found no similar issues.
What happened
Seatunnel本身有checkpoint的机制,海豚调度也存在恢复容错的机制,这两者目前的结合不完善 存在一定的bug,验证如下
1.通过海豚调度部署Seatunnel的cdc任务,
2.模拟意外宕机:杀死海豚调度的任务进程(此时Seatunnelclient任务并没有被杀死)
3.启动海豚调度
4.此时海豚调度会启动容错恢复机制,会重新提交新的Seatunnelclient任务
5.当Seatunnelclient任务较多时,会依次被恢复,导致同样的Seatunnel task被创建,如果任务很多的话,会直接导致cpu短时间内暴涨最终导致雪崩
What you expected to happen
1.海豚的恢复容错目前看来是并发的,考虑到任务的数量,是否应该在恢复容错时控制并发甚至按照串行方式恢复
2.调度意外宕机,再次启动时,发现st任务没有kill应该无需恢复
How to reproduce
1.通过海豚调度部署Seatunnel的cdc任务,
2.模拟意外宕机:杀死海豚调度的任务进程(此时Seatunnelclient任务并没有被杀死)
3.启动海豚调度
4.此时海豚调度会启动容错恢复机制,会重新提交新的Seatunnelclient任务
5.当Seatunnelclient任务较多时,会依次被恢复,导致同样的Seatunnel task被创建,如果任务很多的话,会直接导致cpu短时间内暴涨最终导致雪崩
Anything else
No response
Version
dev
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct