Choose batches manually #894
Replies: 2 comments 3 replies
-
Interesting! Unfortunately there’s nothing built in. You could add this yourself in a forked version though? Not sure if you are using Python or Julia here. On the PySR docs there’s a page explaining how to customise the backend version. The batching is just done in this line: https://github.com/MilesCranmer/SymbolicRegression.jl/blob/cd45719835fdb5260defa7b9bb5cfa0b315eaa4a/src/Dataset.jl#L304. So you could modify that |
Beta Was this translation helpful? Give feedback.
-
@MilesCranmer, relevant to this, I have been experimenting on batching on and off Although with batching on, it convergences much quicker, it seems to hit an evolutionary dead-end earlier and plateau. In comparison, disabling batching (although the runtime is much slower), achieves a lower loss function. Essentially, no batching has a higher ceiling for a lower loss function if both ran indefinitely. Although this is partially mitigated by increasing the batch_size, but a random batch will under sample the extremes of data. I think adding more sophisticated batch samplers would bridge this gap. You could have something like batch_method = random, as default, or sobol or even a custom_batch_func as you do with other parts in the API. Alternatively you could consider having a batch_sequence = for a more expensive batch method like a farthest point sampling which would just cycle through the batch_sequence for each iteration and leave it to to the user to generate the batch_sequence before the run. |
Beta Was this translation helpful? Give feedback.
-
Hi!
Is it possible to control how the batches are chosen? I'm working on a custom loss function where it would be advantageous to control how the dataset is being batched.
Beta Was this translation helpful? Give feedback.
All reactions