Utilising PySR on matrix input data #806
-
Hi! I have been trying to modify PySR for an application where my input features are pairs of matrices. That is, my input data is a list of pairs of matrices, say [(A1, B1), (A2, B2),....]. In general, the size of the matrices can vary. I am searching for symbolic expressions between A and B and want to utilize methods within Julia's LinearAlgebra package (there are predefined functions for determinant, trace etc.). As an example, my input data has the following form x1 = Matrix{Float64}[]
x2 = Matrix{Float64}[]
for i in 1:10
push!(x1, rand(1:10, 2, 2))
push!(x2, rand(1:10, 2, 2))
end
X = DataFrame(x1=x1, x2=x2)
X = Tables.columntable(X) and I would like to use the following operators for example unary = [-, LinearAlgebra.det]
binary = [+, *, /] and potential output symbolic expressions of the form: y = x1 + LinearAlgebra.det(x2) I have been running into issues since the scitype of my input doesn't match the model specifications for SRRegressor. Is this possible to allow through a custom implementation? Or even passing the input data as a concatenation of vectors along with indices and then reconstructing my matrices while computing loss in a custom loss function? My current error: ┌ Error: Problem fitting the machine machine(SRRegressor(defaults = nothing, …), …).
└ @ MLJBase ~/.julia/packages/MLJBase/7nGJF/src/machines.jl:694
[ Info: Running type checks...
┌ Warning: The number and/or types of data arguments do not match what the specified model
│ supports. Suppress this type check by specifying `scitype_check_level=0`.
│
│ Run `@doc SymbolicRegression.SRRegressor` to learn more about your model's requirements.
│
│ Commonly, but non exclusively, supervised models are constructed using the syntax
│ `machine(model, X, y)` or `machine(model, X, y, w)` while most other models are
│ constructed with `machine(model, X)`. Here `X` are features, `y` a target, and `w`
│ sample or class weights.
│
│ In general, data in `machine(model, data...)` is expected to satisfy
│
│ scitype(data) <: MLJ.fit_data_scitype(model)
│
│ In the present case:
│
│ scitype(data) = Tuple{ScientificTypesBase.Table{AbstractVector{AbstractMatrix{ScientificTypesBase.Continuous}}}, AbstractVector{ScientificTypesBase.Continuous}}
│
│ fit_data_scitype(model) = Union{Tuple{Union{ScientificTypesBase.Table{<:Union{AbstractVector{<:ScientificTypesBase.Continuous}, AbstractVector{<:ScientificTypesBase.Count}}}, AbstractMatrix{<:ScientificTypesBase.Continuous}}, AbstractVector}, Tuple{Union{ScientificTypesBase.Table{<:Union{AbstractVector{<:ScientificTypesBase.Continuous}, AbstractVector{<:ScientificTypesBase.Count}}}, AbstractMatrix{<:ScientificTypesBase.Continuous}}, AbstractVector, AbstractVector{<:Union{ScientificTypesBase.Continuous, ScientificTypesBase.Count}}}}
└ @ MLJBase ~/.julia/packages/MLJBase/7nGJF/src/machines.jl:237
[ Info: It seems an upstream node in a learning network is providing data of incompatible scitype. See above.
ERROR: MethodError: no method matching equation_search(::Matrix{…}, ::Vector{…}; niterations::Int64, weights::Nothing, variable_names::Vector{…}, display_variable_names::Vector{…}, options::Options{…}, parallelism::Symbol, numprocs::Nothing, procs::Nothing, addprocs_function::Nothing, heap_size_hint_in_bytes::Nothing, worker_imports::Nothing, runtests::Bool, saved_state::Nothing, return_state::Bool, run_id::Nothing, loss_type::DataType, X_units::Nothing, y_units::Nothing, verbosity::Int64, extra::@NamedTuple{}, logger::Nothing, v_dim_out::Val{…})
The function `equation_search` exists, but no method is defined for this combination of argument types.
Closest candidates are:
equation_search(::AbstractMatrix{T}, ::AbstractVector; kw...) where T<:Number
@ SymbolicRegression ~/Desktop/research_projects_coding/matrix-op-dynamic-expressions/SymbolicRegression.jl/src/SymbolicRegression.jl:503
equation_search(::Vector{D}; options, saved_state, runtime_options, runtime_options_kws...) where {T<:Number, L<:Real, D<:(Dataset{T, L, AX} where AX<:AbstractMatrix{T})}
@ SymbolicRegression ~/Desktop/research_projects_coding/matrix-op-dynamic-expressions/SymbolicRegression.jl/src/SymbolicRegression.jl:513
equation_search(::Dataset; kws...)
@ SymbolicRegression ~/Desktop/research_projects_coding/matrix-op-dynamic-expressions/SymbolicRegression.jl/src/SymbolicRegression.jl:509
...
Reading the new version of DynamicExpressions.jl, it seemed to me that _eval_tree_array_generic should allow for input and output data of arbitrary type
Thank you again for the wonderful package! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
So the first thing that needs to be changed is to use a However, this still won't work out-of-the-box in SymbolicRegression.jl, because of the fact that https://github.com/MilesCranmer/SymbolicRegression.jl/blob/9fabc303dd33c624739e759d960374c7f85e56f7/src/ProgramConstants.jl#L5 is set: const DATA_TYPE = Number Meaning that all elements of So the first to do is checkout a local version of SymbolicRegression.jl and make a change to: const DATA_TYPE = Any and then see what functions need to be updated to accommodate it. Hopefully not too many! |
Beta Was this translation helpful? Give feedback.
This part will require some internal changes. I would just keep working through MethodErrors and changing the logic as required. I think this problem in paricular is about computing a baseline loss or something, so will need to edit
Dataset(...)
to change the logic somehow.