Replies: 1 comment 1 reply
-
|
I think this might be a good use case for function variable_sparsity_complexity(expression)
tree = get_tree(expression) # (for template expressions, would need to do something more complex)
num_nodes = Ref(0)
unique_features = Set{Int}()
foreach(tree) do node # (equivalent to`for node in tree`, but is slightly faster)
num_nodes[] += 1
if node.degree == 0 && !node.constant
push!(unique_features, node.feature)
end
end
# complexity is normal complexity + number of unique features used
return num_nodes[] + length(unique_features)
end
options_sparse = Options(; # or SRRegressor
binary_operators=[+, -, *],
complexity_mapping=variable_sparsity_complexity,
) |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'd like to propose a new penalty to encourage variable sparsity.
I have repeatedly found that searches with fewer input variables have equal or lower loss than when I do extended searches with more variables. Ideally, if two equations at equal complexities have equal loss, the algorithm should, in most instances, by default, prioritise the equation with fewer input variables.
I know this can be achieved through a custom loss function, but it would be cool to have as a modifiable input parameter; penalty for each additional unique feature used in each equation. This would be similar to the existing parsimony parameter, but instead of penalising expression complexity, it would penalise the number of unique variables used.
This would act as a form of L0 regularisation on the features, helping the search find expressions that are not only simple in structure but also rely on a minimal set of inputs.
From what I understand, this is different to complexity_of_variables which does not address number of unique variables. Maybe this would slightly help SR in higher dimensional problems?
Beta Was this translation helpful? Give feedback.
All reactions