CenterPoint Backbone preprocessing optimization

The current implementation of `scatter` has some limitation. 
1. the GPU implementation hard coded iterator bindings which might not work for certain devices. For example, for OpenCL backend, if a GPU has only one dimension global work size. 
```
        for j in T.thread_binding(0, 560, thread = "blockIdx.x"):
            for k in T.thread_binding(0, 560, thread = "blockIdx.y"):
                for i in T.thread_binding(0, 32, thread = "threadIdx.x"):
```

2. There is no room for optimization because of hard code. Normally, we need to create `schedule` from `IRModule` and define optimization strategies. 

3. Need to create a optimization schedule and measure its performance. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CenterPoint Backbone preprocessing optimization #83

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CenterPoint Backbone preprocessing optimization #83

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions