Skip to content

Conversation

@congyue1977
Copy link

In the initialization process of KANLayer, since the knots vector of B-Splines is constructed based on the grid_range parameter, it is identical across all input dimensions (in_dim). This means the data in the grid is redundant, so simply setting the size of the first dimension to 1 suffices. Subsequent calculations will automatically utilize tensor broadcasting and will not affect the grid update process.

This optimization reduces memory or GPU usage significantly. After optimization, each layer of KANLayer can save (in_dim-1) * (G+2k+1) memory. If the depth is N and input dimensions are the same, this can save N*(in_dim-1) * (G+2k+1).

Furthermore, this optimization drastically reduces the initialization time of KANLayer, improving network efficiency. Through testing, with a large G, for example 100, and a width of [4,100,100,100,1] with k=3 for KAN, before optimization, it took nearly 30s to start training on an Intel i9-12900K. After optimization, training starts in less than 1s.

KindXiaoming and others added 12 commits July 21, 2024 20:36
…duce memory/GPU usage significantly and greatly reduce the initialization time of KANLayer.

In the initialization process of KANLayer, since the knots vector of B-Splines is constructed based on the grid_range parameter, it is identical across all input dimensions (in_dim). This means the data in the grid is redundant, so simply setting the size of the first dimension to 1 suffices. Subsequent calculations will automatically utilize tensor broadcasting and will not affect the grid update process.

This optimization reduces memory or GPU usage significantly. After optimization, each layer of KANLayer can save (in_dim-1) * (G+2k+1) memory. If the depth is N and input dimensions are the same, this can save N*(in_dim-1) * (G+2k+1).

Furthermore, this optimization drastically reduces the initialization time of KANLayer, improving network efficiency. Through testing, with a large G, for example 100, and a width of [4,100,100,100,1] with k=3 for KAN, before optimization, it took nearly 30s to start training on an Intel i9-12900K. After optimization, training starts in less than 1s.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants