Replies: 1 comment
-
|
Hi, Thanks for the question! I think a more scalable approach would be to generate a (fast) model for predicting the difficulty, e.g., as quoted from the COPA 2024 tutorial (see notebook in the docs folder), here using the out-of-bag predictions of a random forest, but in your case you would need to set aside part of the training data for this purpose: Best regards, |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am working with a very large dataset (shape of 32_128_560, 42). I want to leverage the DifficultyEstimator, but it is using sklearn NearestNeighbors which is very slow.
I tried leveraging cuML NearestNeighbors which runs on the GPU. I simply made a copy of the code for DifficultyEstimator and replaced the import for NearestNeighbors by the one from cuML and it works flawlessly on the GPU.
But still, even DifficultyEstimatorGPU would fit the data very fast, but getting the scores is still very long and would not return after 10 minutes for the eval set (~2 million rows)...
My code looks like this:
I guess my question would be, how else could I accelerate this? My ultimate goal would be to be able to use conformal predictive system, but I need to find a solution to make it faster. I am open at looking at implementing other difficulty estimator that could potentially be faster.
Thanks,
Beta Was this translation helpful? Give feedback.
All reactions