Help with speeding up the search and creating custom loss function expression #905

skadoosh-MC · 2025-05-01T16:15:05Z

skadoosh-MC
May 1, 2025

Hi,

I usually run PySR on an HPC and let it run for 4 days at a time but I find that the loss is still quite large and the equations haven't evolved as much as I wanted them to. I've been reading and implementing the tips from Astroautomata, like turning on turbo, bumper and batching, keeping the operators simple, keeping a large ncycles_per_iteration, etc. One thing I would like to mention here is that I've kept a large maxsize since I'd like more complex equations (I know this can slow the process down a little bit). I've even downsampled the number of points in my dataset. Do you have any other suggestions that could help speed up the search?

My second question is, is there a way to save a model and then continue training from where it stopped?

Below is what my hyperparameters look like -

# loss is defined before the regressor object

model_pk = PySRRegressor(
            binary_operators=["+", "-", "*", "/", "^"],        
            constraints={'^': (-1, 1)},         
            unary_operators=["square"],
            elementwise_loss=elementwise_loss,     
            niterations=1000,
            populations=200,
            population_size=300,             
            maxsize=70,            	
            maxdepth=20,   		
            timeout_in_seconds=345600,     # 4 days
            ncycles_per_iteration=3000,
            weight_optimize=0.001,
            weight_add_node=0.004522835200528927,
            weight_insert_node=27.524411192119594,
            weight_mutate_constant=3.4908752644974177,
            weight_mutate_operator=0.005418386628230309,
            weight_do_nothing=0.001374964776098089,
            fraction_replaced=0.013374993832223066,
            fraction_replaced_hof=0.030981822809764206,
            procs=40,    	  # cores in a node 
            batching=True,
            turbo=True,
	    bumper=True   
)

Any help will be appreciated!! Thank you in advance : D

MilesCranmer · 2025-05-01T18:23:28Z

MilesCranmer
May 1, 2025
Maintainer

A few tips:

Don't use square as an operator if you are also using ^ since they are degenerate with eachother.
I would impose stricter constraints on operators such as / and ^.
Since you are using a larger maxsize you could experiment with providing a parsimony. As I say on the tuning page, I like to set this to 0.1x the top loss I would expect for a particular problem.
Experiment with batch size.
@gm89uk has found that adaptive_parsimony_scaling set to 20 seems to produce better results sometimes.

I also find that warmup_maxsize_by can sometimes help. But note that it is set relative to niterations rather than timeout_in_seconds so you need to select the specific fraction based on niterations.

You can also try template expressions. If you know the functional form, this can help a lot. https://ai.damtp.cam.ac.uk/pysr/examples/#template-expressions

My second question is, is there a way to save a model and then continue training from where it stopped?

Yes you can do this with warm_start=True and then just call .fit again. It should start where it left off. You can even use this as a sort of adaptive runs – perhaps gradually turning down the parsimony over time might help, so that the model only gradually explores more complex solutions.

0 replies

skadoosh-MC · 2025-05-08T16:47:40Z

skadoosh-MC
May 8, 2025
Author

Hi Miles,

I've been implementing your suggestions over the last few days, and they seem to have helped; thank you : D However, I do find that using parsimony or even batch_size causes it to crash? Not sure why that is.

Alternatively, I realised that maybe using loss_function_expression may be more useful than elementwise_loss for my particular project (I'm also using TemplateExpressionSpec). The initial elementwise_loss looks like this at the moment -

elementwise_loss = '''
                    function custom_loss(prediction, target)            
                        loss = abs(log(abs(prediction)/abs(target)))
                        sign_loss = 10 * (sign(prediction) - sign(target))^2  
                        return loss + sign_loss
                    end'''

(taken from the Dimensional Constraints example). I'm having problems adapting it so it works with loss_function_expression . Options also mentions that setting batching=True would add an additional argument in the function definition, I can't get it to work that way either. I was hoping you could share some insight on this as well.

2 replies

MilesCranmer May 8, 2025
Maintainer

Can you share more about the crash? Maybe in an issue?

skadoosh-MC May 9, 2025
Author

Using parsimony and batch_size doesn't cause it to crash once I used the new definition for loss_function_expression. I'd be interested to know why this is.

Should I still raise this as a separate issue?

MilesCranmer · 2025-05-08T22:02:59Z

MilesCranmer
May 8, 2025
Maintainer

On your loss: be careful because log(abs(prediction/target)) can blow up. So what I would do is

function custom_loss(prediction, target)
    T = typeof(prediction)
    zero_point = T(1e-9)
    loss = abs(log(abs(prediction/target) + zero_point))
    sign_loss = 10 * (sign(prediction) - sign(target))^2
    return loss + sign_loss
end

for the loss_function_expression version, I would do it like this:

function custom_loss_full(ex, dataset::Dataset{T,L}, options) where {T,L}
    prediction, valid = eval_tree_array(ex, dataset.X, options)
    !valid && return L(Inf)
    y = dataset.y

    function custom_loss(a, b)
        zero_point = T(1e-9)
        loss = abs(log(abs(a / b) + zero_point))
        sign_loss = 10 * (sign(a) - sign(b))^2
        return loss + sign_loss
    end
    
    total_loss = sum([
        custom_loss(prediction[i], y[i])
        for i in eachindex(y)
    ])
    # The above is written to be pedagogical and python-like.
    # In truth, it's actually a tiny bit faster to write this as
    # `sum(i -> custom_loss(prediction[i], y[i]), eachindex(y))`
    # because you avoid allocating the extra array
    return L(total_loss)
end

Note that the L usually isn't needed. I just tend to write it like this in examples as a safety measure in case a user accidentally introduces a type instability somewhere. By wrapping the output with L, we can ensure the return value is always a stable type (of type L, which is usually just Float32).

1 reply

skadoosh-MC May 9, 2025
Author

Thank you!! This function works perfectly : D My fits and fractional errors plots are much better now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Help with speeding up the search and creating custom loss function expression #905

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Help with speeding up the search and creating custom loss function expression #905

Uh oh!

skadoosh-MC May 1, 2025

Replies: 3 comments · 3 replies

Uh oh!

MilesCranmer May 1, 2025 Maintainer

Uh oh!

skadoosh-MC May 8, 2025 Author

Uh oh!

MilesCranmer May 8, 2025 Maintainer

Uh oh!

skadoosh-MC May 9, 2025 Author

Uh oh!

MilesCranmer May 8, 2025 Maintainer

Uh oh!

skadoosh-MC May 9, 2025 Author

skadoosh-MC
May 1, 2025

Replies: 3 comments 3 replies

MilesCranmer
May 1, 2025
Maintainer

skadoosh-MC
May 8, 2025
Author

MilesCranmer May 8, 2025
Maintainer

skadoosh-MC May 9, 2025
Author

MilesCranmer
May 8, 2025
Maintainer

skadoosh-MC May 9, 2025
Author