Replies: 1 comment
-
|
Yes, it's not a global sort. Typically, we don't need global ordering, which can be very costly. URLs are partitioned by host, then sorted within each partition. The goal is to perform some aggregations on URLs with the same prefix. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
A quick question about the urls_sort_benchmark.py, the code is doing the Hash partition and then local/partition sort,
but the result is not globally sorted at all, do I understand the code correctly ?
Either radix sort (replace string hashing with prefix ), or merge sort (not implemented in the code) makes sense to me, but the currently implementation seems problematic ?
Beta Was this translation helpful? Give feedback.
All reactions