Skip to content

CenterPoint Backbone preprocessing optimization #83

@angry-crab

Description

@angry-crab

The current implementation of scatter has some limitation.

  1. the GPU implementation hard coded iterator bindings which might not work for certain devices. For example, for OpenCL backend, if a GPU has only one dimension global work size.
        for j in T.thread_binding(0, 560, thread = "blockIdx.x"):
            for k in T.thread_binding(0, 560, thread = "blockIdx.y"):
                for i in T.thread_binding(0, 32, thread = "threadIdx.x"):
  1. There is no room for optimization because of hard code. Normally, we need to create schedule from IRModule and define optimization strategies.

  2. Need to create a optimization schedule and measure its performance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions