Design of CIL Algorithms and the base class #1064
Replies: 2 comments
-
|
This is an example from Pytorch for the [Stochastic Gradient Descent].(https://pytorch.org/docs/stable/_modules/torch/optim/sgd.html#SGD) In the |
Beta Was this translation helpful? Give feedback.
-
|
I believe @gfardell @jakobsj @paskino discussed a similar topic for the Thanks for bringing this up, it is a great topic for the developer guidelines. The agreement is that for each class there are essential, and non-essential parameters. The non-essential can be further be divided in often configured and advanced parameters:
To create an instance of a class, the creator of a class should require the essential and often-configured parameters as named parameters. It should not accept positional arguments For all iterative algorithms I'd argue that Looking at the Algorithm base class, the parameters are CIL/Wrappers/Python/cil/optimisation/algorithms/Algorithm.py Lines 39 to 66 in 843b899 Trying to answer all you questions:
In the case of FDK the only essential parameter is the |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I think it is time to decide the design style for our algorithms and the base class
Algorithm. At the moment, we have the following algorithms:Very soon, more algorithms will be added in CIL so it is urgent to decide the style that we want for our users.
In order to define our algorithms, we use 2 methods
__init__andset_up. The__init__calls theset_upmethod with the same signature. In most or all of the algorithms, the__init__method does not do anything. In practice, using thekwargsin the signature of__init__, we have access to two importantkwargsfrom the baseAlgorithmclass:max_iteration,update_objective_interval.Let's focus on a specific example, e.g., the
GDalgorithmGradientDescent
In the
GDclass, we have:To configure the Gradient Descent algorithm, we need 3 things:
Functionclassx^{n+1} = x^{n} - \gamma^{n}\nabla f(x^{n})See for example GradientDescent.
In general (non convex/strongly convex objective), the initial point is very important. Also, the step size is important for the convergence speed and we certainly need a function that is differentiable. Finally, for an algorithm the number of iterations is also an imporant parameter.
At the moment, these arguments are by default
Nonewhich in my opinion is wrong. These should be required parameters.Let's continue with the
kwargs. In practice, we have 2 types ofkwargs.kwargsthat are used for the corresponding algorithm. For example inGD, we have:that are used in the armijo_rule and also
that are used in the
should_stopmethod ofGDthat basically overrides theshould_stopof theAlgorithmbase class.kwargsfrom theAlgorithmbase class, e.g.,max_iteration,update_objective_interval.At the moment the UI for
GDis :or
Note: In the above example,
rateis used askwargsbut there is no actualratein theGDclass. Therefore, it is not used, the correct name isstep_size. We need to be careful what is passed in kwargs and if it is used. For example, we need to check for the allowed kwargs, e.g.,atol,rtol,alpha,beta.Below some questions:
Question 1: What do we considered as required parameters for an algorithm.
Question 2: How do we configure required parameters for an algorithm?**
Question 3: What do we considered as optional parameters(kwargs) for an algorithm.
Question 4: What do we considered as optional parameters(kwargs) for the algorithm base class.
Question 5: How do we configure kwargs parameters? What do we want for the UI of an algorithm?**
Personally, provided that we check for the allowed kwargs (of the algorithm and its base), I like the following UI:
Another option :
I will continue with the discussion adding more examples.
Beta Was this translation helpful? Give feedback.
All reactions