How to increase join performance

When performing a join on two large dataframes (each with only a 2 or 3 columns, but 10s to 100s of millions of rows, and allow_duplication=True), how do I improve Vaex performance? It will sometimes take well over an hour and much of that time Vaex is running single-threaded. Is there a way to improve performance? (like presorting the join column in each separate dataframe, or something else?)