Open
Description
Sorry about using GitHub issues for asking questions again. After switching to cpp20
branch I was able to successfully run examples. Thanks!
Now I would like to discuss on whether (and how) the following can be achieved. The dataset that I having is actually a time series which is ordered. This means that theoretically primitives such as "shift 1" and "shift -1" or "rolling mean" are valid primitives that can batch operate on each of the original variables and intermediate variables. After briefly reading the code, mainly functions.hpp, I have a few questions on how to implement the above:
- It seems that there is an upper limit on the number of primitives we can have because NodeType is 32 bit integer. If I would like to add more primitives, should I extend this to
uint64_t
? - It seems that each time when the primitive in
function.cpp
is called, it is called on a batch of a variable or intermediate. Is it possible to make it call across the whole dataset (across the data point dimension, which in the time series case is the time dimension)? - If the above two can be resolved, I think I can probably come up with a solution. Is there any other approach that you would recommend?
Thank you!
Activity
foolnotion commentedon Sep 24, 2023
Hi, yes this is theoretically possible. This is what the
Dynamic
node type is supposed to do. My plan is to eventually get rid of all the hard-coded function types and only rely on functions registered at runtime.The idea is to define a custom function and register it in the dispatch table. Here is an example:
define a custom
summation_function
https://github.com/foolnotion/atomic-potentials/blob/5c871ea89c99a3a7e965bb61e48f848d3b4e159a/source/atomic.hpp#L72
register the function
https://github.com/foolnotion/atomic-potentials/blob/5c871ea89c99a3a7e965bb61e48f848d3b4e159a/source/main.cpp#L219
define a
Dynamic
node type with this function and add it to the primitive sethttps://github.com/foolnotion/atomic-potentials/blob/5c871ea89c99a3a7e965bb61e48f848d3b4e159a/source/main.cpp#L229
The atomic potentials code relies on an older version of Operon but you should be able to make it work with the latest
cpp20
head.If you run into issues or find bugs please let me know.
breakds commentedon Sep 24, 2023
Thanks a lot for the detailed explanation and links to the files. I am now understanding the code better. Let me try to implement the idea and update. Appreciate the prompt response!
breakds commentedon Oct 2, 2023
I was slowly learning the concepts, and there is a few more questions if you don't mind.
if
branch at here. What does it mean bysymbolic == true
orsymbolic == false
? How is this related to the template argument to beint
orScalar
?TreeCreator
. Is there a rule of thumb to pick from those candidates before digging into the implementation?c++20
branch, In order to create anInterpreter
, it requires having a dispatch table, a dataset and atree
. I am not sure what thetree
that I should supply to it. My current vague understanding is that trees are "symbolic formulas" generated as candidates for evaluation during the solving phase of the algorithm. And because of this (probably wrong) understanding, I found it confusing how and why I should provide atree
toInterpreter
construction, which happens before the algorithm starts running.Thanks a lot! Still, sorry if some of the questions seems dumb, I didn't have time to full go through all the detailed code yet.
foolnotion commentedon Oct 2, 2023
The
symbolic
boolean flag was meant to configure the algorithm in a certain way as to promote "nice" models (formulas):In general I've noticed that the choice of creator does not make a difference in algorithm performance. I would recommend using the
BalancedTreeCreator
which imho is a better version ofPTC2
. It may also be beneficial to limit max tree size during initialization to a smaller limit (5-15 nodes). Keep the max tree size during the run to a larger value.Yes, this was a big change from before, in the interest of making it easier to program the entire tree evaluation / optimization infrastructure and integration with likelihoods.
The tree is kept in the
Genotype
property of theIndividual
https://github.com/heal-research/operon/blob/cpp20/include/operon/core/individual.hpp#L18So normally you'd want to use an interpreter in a context where you already have an individual, so then you'd pass
individual.Genotype
to the interpreter.Similar to here: https://github.com/heal-research/operon/blob/cpp20/source/operators/evaluator.cpp#L196
breakds commentedon Oct 2, 2023
Thank you for the explanation! I now understand why using
int
for symbolic case and more about the tree creator!One more question about
Interpreter
if you don't mind.I am actually creating the
Interpreter
before having anything yet. This is because (I might be wrong) to create the algorithm instance (e.g.NSGA2
), it seems that the following need to be constructed:Interpreter
⇨ErrorEvaluator
⇨Generator
⇨NSGA2
If
ErrorEvaluator
is going to be able to evaluate all sorts of trees, which specific tree do I need to construct to provide to theInterpreter
? This is at the stage that the algorithm is yet to be constructed - does that mean I just create an arbitrary tree by hand?Thanks!
foolnotion commentedon Oct 9, 2023
Hi,
Normally you shouldn't need to initialize the interpreter yourself.
The flow should be:
DispatchTable ⇨ ErrorEvaluator ⇨ Generator ⇨ NSGA2
The specific type of interpreter can be passed as a template parameter to the
DispatchTable
.The interpreter will know how to evaluate any kind of tree (or, more accurately, any type of node inside the tree) by querying the dispatch table for the appropriate function primitive. The interpreter is meant to be a lightweight cheap object initialized on the spot whenever a tree needs to be interpreted (so you'd construct an interpreter within an evaluator context when you already have a tree). You do not need to construct an interpreter manually before the algorithm.
If you show me your code I can assist more.
github-actions commentedon Nov 9, 2023
This issue is stale because it has been open for 30 days with no activity.