Custom loss with partial derivatives for 2 inputs and 2 outputs function. #1174

NiccoloAntonelliDziri · 2026-04-07T12:49:49Z

NiccoloAntonelliDziri
Apr 7, 2026

Hi everyone,

I am currently trying to use PySR for finding an approximant for the inverse of a function. I apologize if there is something trivial that I missed but I haven't been able to find a discussion or an issue for something similar.

I want to find two functions, f1 and f2, such that:

d = f1(P, h)
T = f2(P, h)

I also have a custom condition on their partial derivatives that I want to enforce during the optimization process:

(df1/dh) / (df2/dh) = (df1/dP) / (df2/dP) + (2*(df1/dh)*(df1/dP)) / ((df2/dP)*(df1/dh) + (df2/dh)*(df1/dP))

My hope is that this condition will help steer the algorithm toward solutions that are physically meaningful.

I have X and y of shape (241127, 2), where:

X contains the inputs [P, h]
y contains the targets [d, T]

I used a custom loss function in PySR to combine the standard MSE loss with the derivative condition.

When I tried to implement the custom loss, I got a BoundsError:
BoundsError: attempt to access 2×241127 Matrix{Float32} at index [1:2, 1, 2]

This suggests that eval_grad_tree_array only returns a (2, N) matrix (2 variables × N data points), not the full Jacobian for both outputs.
My attempts to extract four derivatives (df1/dP, df1/dh, df2/dP, df2/dh) failed because PySR appears to call the loss function separately for each output (or at least it's my guess).

Is it possible to enforce such a condition in PySR with a single loss function for two outputs?
If not, is there a recommended way to achieve this ?

d = np.linspace(0.001, 5, 500)
T = np.linspace(200, 1000, 500)

D, T_grid = np.meshgrid(d, T)

p, h = ph_dt(D, T_grid)

pp = p[~np.isnan(p) & ~np.isnan(h)]
hh = h[~np.isnan(p) & ~np.isnan(h)]
DD = D[~np.isnan(p) & ~np.isnan(h)]
TT = T_grid[~np.isnan(p) & ~np.isnan(h)]

X = np.column_stack((pp.ravel(), hh.ravel()))
y = np.column_stack((DD.ravel(), TT.ravel()))

from pysr import jl, PySRRegressor
jl.seval("using Zygote") # For gradient calculations

extra_sympy_mappings={
    "pow_n": lambda x, n: sympy.Pow(x, n),
}

model = PySRRegressor(
    unary_operators=["log", "exp", "sqrt"],
    nested_constraints={
        "log": {"log": 0, "exp": 0},
        "exp": {"exp": 0, "log": 0},
        "sqrt": {"sqrt": 0},
    },

    maxsize=30,
    weight_optimize=0.001,
    batching=True,                 # Enable mini-batch training
    batch_size=1024,               # Recommended batch size (adjust as needed)
    niterations=100,               # Number of iterations (adjust for complexity)
    binary_operators=["+", "-", "*", "/", "pow_n(x::T, n::T) where(T) = x > 0 ? convert(T, x^n) : convert(T, NaN)"],  # Operators
    extra_sympy_mappings=extra_sympy_mappings,  # Custom unary operators
    constraints={"pow_n": (-1,1)}, # Allow the model to fit the exponent "a"
    populations=36,               # Number of populations (influences exploration)
    population_size=64,            # Number of individuals in each population
    ncycles_per_iteration=100,     # Number of total mutations to run, per 10 samples of the population, per iteration
    model_selection="best",        # Keep the best-performing model
    parsimony=1e-5,                # parsimony (times complexity) added to the loss
    maxdepth=7,                    # Max depth of the equation
    progress=True,                 # Display training progress
    verbosity=1,                   # Show intermediate output

    loss_function = """
function custom_derivative_loss(tree, dataset::Dataset{T,L}, options) where {T,L}
    # Evaluate the expression tree
    (prediction, gradients, completion) = eval_grad_tree_array(tree, dataset.X, options; variable=true)
    !completion && return L(Inf)

    # Compute MSE loss (standard)
    diffs = prediction .- dataset.y
    mse_loss = sum(abs2.(diffs)) / length(diffs)


    df1_dp = gradients[:, 1, 1]  # ∂f1/∂p for all N data points
    df1_dh = gradients[:, 1, 2]  # ∂f1/∂h for all N data points
    df2_dp = gradients[:, 2, 1]  # ∂f2/∂p for all N data points
    df2_dh = gradients[:, 2, 2]  # ∂f2/∂h for all N data points

    denominator = (df2_dp .* df1_dh .+ df2_dh .* df1_dp)

    if abs(denominator) < 1e-10 || isnan(denominator)
        derivative_condition_loss = L(Inf)
    else
        rhs = (df1_dp ./ df2_dp) .+ (2 .* df1_dh .* df1_dp) ./ denominator

        lhs = df1_dh ./ df2_dh

        derivative_condition_loss = sum(abs2.(lhs .- rhs)) / length(lhs)
    end

    total_loss = mse_loss + derivative_condition_loss

    return total_loss
end
"""
)
model.fit(X, y, variable_names = ["p", "h"])

MilesCranmer · 2026-04-07T17:45:12Z

MilesCranmer
Apr 7, 2026
Maintainer

I think you could try doing this with TemplateExpressionSpec and loss_function_expression and you could get faster symbolic derivatives?

I'll do some digging for relevant issues... (@MilesCranmerBot can you please read through discussions/issues over the past year and point to the most relevant ones? Ideally it would be nice if you could find a discussion that shows how to use loss_function_expression in tandem with template expressions. Although I think for this we might not even need to, and can simply have the template return the per-row loss?)

4 replies

NiccoloAntonelliDziri Apr 8, 2026
Author

Thank you for your answer, I spent a few hours trying to use TemplateExpressionSpec :

template = TemplateExpressionSpec(
    expressions=["f1", "f2"],  # The two functions to learn
    variable_names=["p", "h"],  # Inputs
    combine="(f1(p,h),f2(p,h))",  # Output both functions
    # parameters={"p":2,"h":2},
)

but could not get it to work (with and without setting a custom loss). Maybe I haven't tried hard enough but setting combine="f1(p,h) + f2(p,h)" instantly seemed to work. However doing it with a single output and this sum will loose potentially crucial information for what I'm trying to do.
Is this an intended behavior of TemplateExpressionSpec or am I still missing something ?

MilesCranmer Apr 8, 2026
Maintainer

Yes this will not work b/c combine has to return a single value rather than a tuple in PySR.

Sorry my bot didn't run, I ran out of LLM calls for the week... Will run it later today

MilesCranmerBot Apr 9, 2026

I dug through the most relevant recent threads, and I think the clearest pointers are:

PySR discussion Define Multiple Outputs with TemplateExpressionSpec #1002, “Define Multiple Outputs with TemplateExpressionSpec”: Define Multiple Outputs with TemplateExpressionSpec #1002
- This is the closest match to the current question. The recommended pattern is to pass y_0, y_1, etc. as features, have the template return the per-row loss, and use elementwise_loss="(pred, target) -> pred".
PySR discussion BoundsError when using TemplateExpressionSpec with custom loss function #1011, “BoundsError when using TemplateExpressionSpec with custom loss function”: BoundsError when using TemplateExpressionSpec with custom loss function #1011
- Confirms that with templates you generally want loss_function_expression, not loss_function.
- Also suggests that for derivative-based objectives it is often cleaner to put the derivative terms directly in the template, e.g. with D(...), and again return the per-row residual.
PySR discussion [BUG]: TemplateExpressionSpec and loss_function_expression not compatible with multiprocessing #939 / issue [BUG]: TemplateExpressionSpec and loss_function_expression not compatible with multiprocessing #938: [BUG]: TemplateExpressionSpec and loss_function_expression not compatible with multiprocessing #939 and [BUG]: TemplateExpressionSpec and loss_function_expression not compatible with multiprocessing #938
- Example of TemplateExpressionSpec working with loss_function_expression; the bug there was just the custom function name/signature.
SymbolicRegression.jl issue How to compute the complexity of a function inside the custom objective function? #380: [BUG]: when using loss_function, Template Expression cannot have operators outside of binary_operators astroautomata/SymbolicRegression.jl#380
- This is the backend thread that motivated loss_function_expression for TemplateExpression.
- It also explicitly suggests the workaround of using the template itself as the loss, by passing the true targets in X and returning the residual from the template.

So I think the answer to the second part is yes: for this use case, the template can directly return the per-row loss, and that is probably the cleanest PySR route.

The important limitation is that in PySR the template combine still needs to resolve to a single per-row value (internally a ValidVector), not a tuple of outputs. So something like (f1(p,h), f2(p,h)) will not work directly, but something like

(y1 - f1(p,h))^2 + (y2 - f2(p,h))^2

will.

There is now also a docs example mirroring this pattern: “Vector-valued expressions” in the PySR examples: https://ai.damtp.cam.ac.uk/pysr/examples/#13-vector-valued-expressions

NiccoloAntonelliDziri Apr 10, 2026
Author

I'll look into that on monday and come back to you ! Thanks!

NiccoloAntonelliDziri · 2026-04-13T08:42:57Z

NiccoloAntonelliDziri
Apr 13, 2026
Author

So this is what I have:

d = np.linspace(5, 1400, 500)
T = np.linspace(refprop.T_min, 450, 500)

D, T_grid = np.meshgrid(d, T)

p, h = ph_dt(D, T_grid)

pp = p[~np.isnan(p) & ~np.isnan(h)]
hh = h[~np.isnan(p) & ~np.isnan(h)]
DD = D[~np.isnan(p) & ~np.isnan(h)]
TT = T_grid[~np.isnan(p) & ~np.isnan(h)]

X = np.column_stack((pp.ravel(), hh.ravel(), DD.ravel(), TT.ravel()))
dummy_y = np.zeros(X.shape[0])

from pysr import jl, PySRRegressor, TemplateExpressionSpec
jl.seval("using Zygote") # For gradient calculations

template = TemplateExpressionSpec(
    variable_names=["p", "h", "d", "T"],
    expressions=["f1", "f2"],
    combine="abs(d - f1(p,h)) + abs(T - f2(p,h))  + abs(D(f1,1)(p,h)*D(f2,2)(p,h) - D(f1,2)(p,h)*D(f2,1)(p,h))",
    # L1 loss for d and T + det(Jacobian) noteq 0
)

model = PySRRegressor(
    unary_operators=["log", "exp"],
    nested_constraints={
        "log": {"log": 0, "exp": 0},
        "exp": {"exp": 0, "log": 0},
    },

    maxsize=35,
    weight_optimize=0.001,
    batching=True,                 # Enable mini-batch training
    batch_size=1024,               # Recommended batch size (adjust as needed)
    niterations=100,               # Number of iterations (adjust for complexity)
    binary_operators=["+", "-", "*", "/", "^"],  # Operators
    constraints={"^": (-1,1)}, # Allow the model to fit the exponent "a"
    populations=100,               # Number of populations (influences exploration)
    population_size=100,            # Number of individuals in each population
    ncycles_per_iteration=100,     # Number of total mutations to run, per 10 samples of the population, per iteration
    model_selection="best",        # Keep the best-performing model
    parsimony=1e-5,                # parsimony (times complexity) added to the loss
    maxdepth=7,                    # Max depth of the equation
    progress=True,                 # Display training progress
    verbosity=1,                   # Show intermediate output

    expression_spec=template,
    elementwise_loss="(pred, targ) -> pred"
)
model.fit(X, dummy_y, variable_names = ["p", "h", "d", "T"])

And it seems to work so thank you !

Howeve, I encountered a different issue and I suspect it has to do with the derivatives:

juliacall.JuliaError: DomainError with -0.5468379:
log was called with a negative real argument but will only return a complex result if called with a complex argument. Try log(Complex(x)).

None of the variables contain negative values and this wasn't happening before. I suspect something is going on with the derivatives but I have no idea why.

To go around this issue I modified the model like that, using safe version of log and pow:

extra_sympy_mappings={
    "slog": sympy.log,
    "spow": lambda x, y: x**y
},

model = PySRRegressor(
    unary_operators=["exp", "slog(x) = x > 0 ? log(x) : typeof(x)(NaN)"],
    nested_constraints={
        "slog": {"slog": 0, "exp": 0},
        "exp": {"exp": 0, "slog": 0},
    },
    binary_operators=["+", "-", "*", "/", "spow(x, y) = x > 0 ? x^y : typeof(x)(NaN)"],
    constraints={"spow": (-1,1)},
    extra_sympy_mappings=extra_sympy_mappings,

But I don't know what changed and if the log of negative values is a bug or not.

4 replies

MilesCranmer Apr 13, 2026
Maintainer

Interesting. It could be that the safe_log operation (maps negative input to NaN output) has a normal log derivative. Not sure. @MilesCranmerBot could you please investigate the reason for this?

NiccoloAntonelliDziri Apr 13, 2026
Author

It is the opposite, sorry if it was unclear, having a safe log is my solution to the problem.
The unsafe log works well without the template expression spec. But with it, it tries to compute the log of a negative value.

MilesCranmer Apr 13, 2026
Maintainer

The weird thing is that the safe log operation is already what it gets mapped to internally: https://github.com/astroautomata/SymbolicRegression.jl/blob/9c3b5e518e19170f656edad8a07e000b626ab589/src/Operators.jl#L173

Although... maybe it's like from the fallback meant to handle SIMD types and dual types. https://github.com/astroautomata/SymbolicRegression.jl/blob/9c3b5e518e19170f656edad8a07e000b626ab589/src/Operators.jl#L88 In principle this shouldn't affect things if the forward pass already spits out a NaN. But maybe there are some instances where the derivative goes through a log that the forward pass does not... Weird

NiccoloAntonelliDziri Apr 15, 2026
Author

Did you manage to reproduce the issue on your machine ? Maybe the safe log is not mapped to the log when it's computed through D ?

Custom loss with partial derivatives for 2 inputs and 2 outputs function. #1174

Uh oh!

NiccoloAntonelliDziri Apr 7, 2026

Replies: 2 comments · 8 replies

Uh oh!

Uh oh!

MilesCranmer Apr 7, 2026 Maintainer

Uh oh!

NiccoloAntonelliDziri Apr 8, 2026 Author

Uh oh!

MilesCranmer Apr 8, 2026 Maintainer

Uh oh!

Uh oh!

MilesCranmerBot Apr 9, 2026

Uh oh!

NiccoloAntonelliDziri Apr 10, 2026 Author

Uh oh!

NiccoloAntonelliDziri Apr 13, 2026 Author

Uh oh!

MilesCranmer Apr 13, 2026 Maintainer

Uh oh!

NiccoloAntonelliDziri Apr 13, 2026 Author

Uh oh!

Uh oh!

MilesCranmer Apr 13, 2026 Maintainer

Uh oh!

NiccoloAntonelliDziri Apr 15, 2026 Author

NiccoloAntonelliDziri
Apr 7, 2026

Replies: 2 comments 8 replies

MilesCranmer
Apr 7, 2026
Maintainer

NiccoloAntonelliDziri Apr 8, 2026
Author

MilesCranmer Apr 8, 2026
Maintainer

NiccoloAntonelliDziri Apr 10, 2026
Author

NiccoloAntonelliDziri
Apr 13, 2026
Author

MilesCranmer Apr 13, 2026
Maintainer

NiccoloAntonelliDziri Apr 13, 2026
Author

MilesCranmer Apr 13, 2026
Maintainer

NiccoloAntonelliDziri Apr 15, 2026
Author