Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MongoDBReader关于Job中split的切分问题 #2188

Open
lazycancerpatients opened this issue Aug 20, 2024 · 0 comments
Open

MongoDBReader关于Job中split的切分问题 #2188

lazycancerpatients opened this issue Aug 20, 2024 · 0 comments

Comments

@lazycancerpatients
Copy link

lazycancerpatients commented Aug 20, 2024

源为mongodb数据库时,使用多并发读取,Job在进行split切分算法时,并没有针对query条件进行数据过滤
而是读取全量数据通过 _id 进行切分,这样在collection中数据量较大时,切分会十分缓慢
真实场景:在源表数据为11亿条(存储空间约为4.5T)时,2channel切分耗时40min,3channel切分耗时90min

不针对query条件做过滤是有什么其它的考量吗

@lazycancerpatients lazycancerpatients changed the title MongoDBReader MongoDBReader关于Job的split切分问题 Aug 20, 2024
@lazycancerpatients lazycancerpatients changed the title MongoDBReader关于Job的split切分问题 MongoDBReader关于Job中split的切分问题 Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant