Skip to content

About implementing specific custom primitives #32

Open
@breakds

Description

Sorry about using GitHub issues for asking questions again. After switching to cpp20 branch I was able to successfully run examples. Thanks!

Now I would like to discuss on whether (and how) the following can be achieved. The dataset that I having is actually a time series which is ordered. This means that theoretically primitives such as "shift 1" and "shift -1" or "rolling mean" are valid primitives that can batch operate on each of the original variables and intermediate variables. After briefly reading the code, mainly functions.hpp, I have a few questions on how to implement the above:

  1. It seems that there is an upper limit on the number of primitives we can have because NodeType is 32 bit integer. If I would like to add more primitives, should I extend this to uint64_t?
  2. It seems that each time when the primitive in function.cpp is called, it is called on a batch of a variable or intermediate. Is it possible to make it call across the whole dataset (across the data point dimension, which in the time series case is the time dimension)?
  3. If the above two can be resolved, I think I can probably come up with a solution. Is there any other approach that you would recommend?

Thank you!

Activity

foolnotion

foolnotion commented on Sep 24, 2023

@foolnotion
Member

Hi, yes this is theoretically possible. This is what the Dynamic node type is supposed to do. My plan is to eventually get rid of all the hard-coded function types and only rely on functions registered at runtime.

The idea is to define a custom function and register it in the dispatch table. Here is an example:

The atomic potentials code relies on an older version of Operon but you should be able to make it work with the latest cpp20 head.

If you run into issues or find bugs please let me know.

breakds

breakds commented on Sep 24, 2023

@breakds
Author

Thanks a lot for the detailed explanation and links to the files. I am now understanding the code better. Let me try to implement the idea and update. Appreciate the prompt response!

breakds

breakds commented on Oct 2, 2023

@breakds
Author

I was slowly learning the concepts, and there is a few more questions if you don't mind.

  1. I do not understand the if branch at here. What does it mean by symbolic == true or symbolic == false? How is this related to the template argument to be int or Scalar?
  2. There are a few options for TreeCreator. Is there a rule of thumb to pick from those candidates before digging into the implementation?
  3. On c++20 branch, In order to create an Interpreter, it requires having a dispatch table, a dataset and a tree. I am not sure what the tree that I should supply to it. My current vague understanding is that trees are "symbolic formulas" generated as candidates for evaluation during the solving phase of the algorithm. And because of this (probably wrong) understanding, I found it confusing how and why I should provide a tree to Interpreter construction, which happens before the algorithm starts running.

Thanks a lot! Still, sorry if some of the questions seems dumb, I didn't have time to full go through all the detailed code yet.

foolnotion

foolnotion commented on Oct 2, 2023

@foolnotion
Member

I do not understand the if branch at here. What does it mean by symbolic == true or symbolic == false? How is this related to the template argument to be int or Scalar?

The symbolic boolean flag was meant to configure the algorithm in a certain way as to promote "nice" models (formulas):

  • only integer coefficients (during the run and during initialization)
  • mutation operator configured to only support integer values
  • nonlinear least squares coefficient tuning disabled

There are a few options for TreeCreator. Is there a rule of thumb to pick from those candidates before digging into the implementation?

In general I've noticed that the choice of creator does not make a difference in algorithm performance. I would recommend using the BalancedTreeCreator which imho is a better version of PTC2. It may also be beneficial to limit max tree size during initialization to a smaller limit (5-15 nodes). Keep the max tree size during the run to a larger value.

On c++20 branch, In order to create an Interpreter, it requires having a dispatch table, a dataset and a tree. I am not sure what the tree that I should supply to it.

Yes, this was a big change from before, in the interest of making it easier to program the entire tree evaluation / optimization infrastructure and integration with likelihoods.

The tree is kept in the Genotype property of the Individual https://github.com/heal-research/operon/blob/cpp20/include/operon/core/individual.hpp#L18

So normally you'd want to use an interpreter in a context where you already have an individual, so then you'd pass individual.Genotype to the interpreter.

Similar to here: https://github.com/heal-research/operon/blob/cpp20/source/operators/evaluator.cpp#L196

breakds

breakds commented on Oct 2, 2023

@breakds
Author

Thank you for the explanation! I now understand why using int for symbolic case and more about the tree creator!

One more question about Interpreter if you don't mind.

I am actually creating the Interpreter before having anything yet. This is because (I might be wrong) to create the algorithm instance (e.g. NSGA2), it seems that the following need to be constructed:

InterpreterErrorEvaluatorGeneratorNSGA2

If ErrorEvaluator is going to be able to evaluate all sorts of trees, which specific tree do I need to construct to provide to the Interpreter? This is at the stage that the algorithm is yet to be constructed - does that mean I just create an arbitrary tree by hand?

Thanks!

foolnotion

foolnotion commented on Oct 9, 2023

@foolnotion
Member

Hi,

I am actually creating the Interpreter before having anything yet. This is because (I might be wrong) to create the algorithm instance (e.g. NSGA2), it seems that the following need to be constructed:

Normally you shouldn't need to initialize the interpreter yourself.

The flow should be:
DispatchTable ⇨ ErrorEvaluator ⇨ Generator ⇨ NSGA2

The specific type of interpreter can be passed as a template parameter to the DispatchTable.

If ErrorEvaluator is going to be able to evaluate all sorts of trees, which specific tree do I need to construct to provide to the Interpreter?

The interpreter will know how to evaluate any kind of tree (or, more accurately, any type of node inside the tree) by querying the dispatch table for the appropriate function primitive. The interpreter is meant to be a lightweight cheap object initialized on the spot whenever a tree needs to be interpreted (so you'd construct an interpreter within an evaluator context when you already have a tree). You do not need to construct an interpreter manually before the algorithm.

If you show me your code I can assist more.

github-actions

github-actions commented on Nov 9, 2023

@github-actions

This issue is stale because it has been open for 30 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

      Participants

      @foolnotion@breakds

      Issue actions

        About implementing specific custom primitives · Issue #32 · heal-research/operon