PySR for (integer) sequences and constant optimization #907

matthiaswilhelmNBI · 2025-05-02T08:54:05Z

matthiaswilhelmNBI
May 2, 2025

Thanks for the great code you are providing to the community!
I would like to apply PySR to a research problem in theoretical particle physics, concretely to identifying integer sequences. To gauge its capabilities, I wanted to see whether it can rediscover the simplest sequence we found by hand, which is A052737 in oeis (excluding the alternating sign). Here is my code:

import numpy as np
from scipy.special import gamma

X = np.array([1, 2, 3, 4, 5, 6, 7, 8]).reshape(-1, 1)
y = 0.5 * (-4)**X[:,0] * gamma(2*(X[:,0]-1)+1) / gamma((X[:,0]-1)+1)

from pysr import PySRRegressor

model = PySRRegressor(
    maxsize=16,
    niterations=1000,  # < Increase me for better results
    binary_operators=["+", "*", "^"],
    unary_operators=[
        "gammaratio(x::T) where {T} = (x > -1) ? gamma(2*x+1)/gamma(x+1) : T(NaN)",
    ],
    extra_sympy_mappings={"gammaratio": lambda x: gamma(2*x+1) / gamma(x+1)},
    elementwise_loss="loss(prediction, target) = (prediction - target)^2",
    complexity_of_operators={"+": 1, "*": 1, "^": 2, "gammaratio": 1},
    complexity_of_constants=1,
    complexity_of_variables=2,
    nested_constraints={
        "gammaratio": {"gammaratio": 0, "^": 0},
        "^": {"gammaratio": 0, "^": 0},
    },
    constraints={
        "^": (1, 2),
        "gammaratio": 4
    },
)

model.fit(X, y)

After around 5000 iterations, PySR finds
13 1.356e+17 1.097e-01 y = (7.8383 + (-1.5791 ^ x₀)) * gammaratio(0.89622 + x₀)
This is one mutation of "+" to "*" and one optimization of the constants away from the right answer
y = (0.5 * (-4 ^ x₀)) * gammaratio(-1 + x₀)
However, during 25000 further iterations, PySR never managed to find the right answer.
My suspicion is that this is because PySR either mutates or optimizes constants, so the result of the mutation drops out of tournament selection before its constants can get optimized. Does that sound reasonable?
And if so, would there be a way to have PySR optimize constants before each tournament selection?
Many thanks in advance!

Answered by MilesCranmer

May 2, 2025

I think part of the issue is that the loss function is entirely dominated by a couple of points:

[ins] In [3]: X
Out[3]:
array([[1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8]])

[ins] In [4]: y
Out[4]:
array([-2.00000000e+00,  1.60000000e+01, -3.84000000e+02,  1.53600000e+04,
       -8.60160000e+05,  6.19315200e+07, -5.44997376e+09,  5.66797271e+11])

It is trying to fit this with mean-square error so it will primarily just focus on the last two.

If I change the loss function to be relative error:

model = PySRRegressor(
    # ...
    elementwise_loss="(pred, targ) -> abs(pred - targ)/(abs(pred)+1)",
)

then it seems to work

View full answer

MilesCranmer · 2025-05-02T12:27:53Z

MilesCranmer
May 2, 2025
Maintainer

What happens if you normalize your data first? The initialisation for constants is a unit variance Gaussian so it might take many steps to get to something at large magnitudes

0 replies

MilesCranmer · 2025-05-02T18:55:03Z

MilesCranmer
May 2, 2025
Maintainer

I think part of the issue is that the loss function is entirely dominated by a couple of points:

[ins] In [3]: X
Out[3]:
array([[1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8]])

[ins] In [4]: y
Out[4]:
array([-2.00000000e+00,  1.60000000e+01, -3.84000000e+02,  1.53600000e+04,
       -8.60160000e+05,  6.19315200e+07, -5.44997376e+09,  5.66797271e+11])

It is trying to fit this with mean-square error so it will primarily just focus on the last two.

If I change the loss function to be relative error:

model = PySRRegressor(
    # ...
    elementwise_loss="(pred, targ) -> abs(pred - targ)/(abs(pred)+1)",
)

then it seems to work

1 reply

matthiaswilhelmNBI May 2, 2025
Author

Thank you so much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PySR for (integer) sequences and constant optimization #907

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

PySR for (integer) sequences and constant optimization #907

Uh oh!

Uh oh!

matthiaswilhelmNBI May 2, 2025

Replies: 2 comments · 1 reply

Uh oh!

MilesCranmer May 2, 2025 Maintainer

Uh oh!

MilesCranmer May 2, 2025 Maintainer

Uh oh!

matthiaswilhelmNBI May 2, 2025 Author

matthiaswilhelmNBI
May 2, 2025

Replies: 2 comments 1 reply

MilesCranmer
May 2, 2025
Maintainer

MilesCranmer
May 2, 2025
Maintainer

matthiaswilhelmNBI May 2, 2025
Author