Skip to content

Parallelization avenues #25

@jemus42

Description

@jemus42

There are multiple layers of parallelization, some of which we need to reign in and others we should try to enable.

  • Any conditional method using ARFSampler internally will

    • be subject to ranger's default parallelization behavior, which might be undesirable (and we might just fix it to 1 thread?)
    • probably want to use arf's parallelization (which is done via foreach, but using doFuture is an option)
  • Batched predictions in e.g. SAGE could benefit from parallelization, but that would need to be balanced carefully.

    • The point of the batching is to avoid excessive RAM usage by predicting on all coalitions' data at once
    • Splitting that data into k chunks and parallelizing over it might just defeat the purpose and create additional overhead on top
    • Reasonable batch_size is probably learner-dependent anyway?
  • Some operations are embarrassingly parallel, e.g. repeated operations over iter_perm permutations in PFI and friends, or generally repeated operations over resampling operations.

    • Since we use mlr3::resample() for initial models for reference in most methods, we need to be careful about setting up a future::plan() because I assume mlr3 will then use it for the resample bit while we might want to use it for a later step instead (or in addition).

Computing is hard.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementExtends package features in some wayperformanceDoesn't make it more correct but faster or less memory hungryquestionUnclear how to proceed without further info / discussion

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions