Skip to content

Latest commit

 

History

History
521 lines (263 loc) · 51.4 KB

File metadata and controls

521 lines (263 loc) · 51.4 KB

Scientific Report: Optimal Kernel Size for Convolutional Neural Networks

Introduction

This report investigates the question of what constitutes the best kernel size for Convolutional Neural Networks (CNNs). Given the wide range of applications and architectures, this report aims to synthesize findings from a selection of recent research papers to offer a nuanced perspective on kernel size selection. The report considers not only the direct impact of kernel size on accuracy but also its interplay with factors such as computational cost, hardware efficiency, model robustness, and the specific task at hand.

Background

Convolutional Neural Networks (CNNs) have achieved remarkable success in various machine learning tasks, particularly in image processing and computer vision. The convolutional layer, a fundamental building block of CNNs, utilizes kernels to extract features from input data. The size of these kernels, often referred to as the filter size, is a crucial hyperparameter that determines the receptive field of the neurons and influences the network's ability to capture patterns and dependencies in the data.

Historically, smaller kernels such as 3x3 have been preferred due to their computational efficiency and ability to capture local features. However, recent research has challenged this paradigm, exploring the benefits of larger kernels, adaptive kernel sizes, and alternative kernel shapes to enhance model performance and address specific limitations. This report seeks to synthesize these recent developments and provide a comprehensive overview of the considerations involved in kernel size selection for CNNs.

Literature Review and Analysis

1. Adaptive Kernel Sizes and Dynamic Convolution

Several papers emphasize the benefits of adaptive kernel sizes, where the network learns to adjust the kernel size based on the input data. Tek, Çam, and Karlı [2] proposed an adaptive kernel learning approach using a differentiable Gaussian envelope, demonstrating statistically significant improvements over fixed-size kernels. They found that larger adaptive kernels (7x7, 9x9) often perform best, as the adaptivity allows them to adjust their receptive field to the most relevant features.

Chen et al. [6] introduced Dynamic Convolution, where multiple convolution kernels are aggregated dynamically based on input-dependent attentions. While this approach favors smaller kernels for individual components due to computational efficiency, it offers a mechanism to combine them effectively for increased representational power. This suggests that smaller kernels with dynamic combination can rival the performance of static large kernels.

Romero et al. [4] proposed FlexConv, which learns kernel sizes during training and explores continuous kernel convolutions. They argue that different layers benefit from different kernel sizes and that fixing kernel sizes a priori is suboptimal. Their results suggest that kernels should generally increase in size as you go deeper into the network. Similarly, Romero and Zeghidour [18] introduced DNArch, a method for jointly learning the weights and the architecture of CNNs by backpropagation, allowing for learning the size of convolutional kernels at each layer. They found that large kernels are often beneficial in 1D tasks, while diverse, rectangular kernels are often effective in 2D tasks.

Li et al. [3] proposed Omni-dimensional Dynamic Convolution (ODConv), which dynamically adjusts convolutional kernels along various dimensions, including spatial size. While not directly optimizing kernel size, this approach emphasizes that how you use a kernel is just as important as the size itself, with the spatial attention mechanism potentially compensating for a non-optimal kernel size.

These approaches highlight that there isn't a single "best" kernel size for all CNNs and tasks. The optimal size can depend on the network architecture, the dataset, and the specific layer within the network.

2. Large Kernel Convolutions

Several recent studies advocate for the use of large kernel convolutions, challenging the conventional wisdom of stacking small kernels. Luo et al. [5] strongly advocate for large kernel convolutions (specifically a decomposed 21x21 kernel) for single image dehazing, arguing that they increase the receptive field and capture more structured information. However, they address the computational challenges by decomposing the large kernel into a combination of smaller and dilated convolutions.

Xu et al. [11] proposed ParCNetV2, which leverages "oversized convolutions" (kernels with a size approximately twice the input feature size) to model long-range dependencies and achieve implicit positional encoding. Their experiments demonstrate improved performance on image classification, object detection, and semantic segmentation tasks.

Li et al. [7] challenges the notion of simply increasing kernel size, proposing a shift-wise operator that achieves the effects of large kernel sizes using smaller convolutions combined with a shifting mechanism. This operator aims to capture long-range dependencies sparsely, mimicking the sparse attention mechanism in Transformers.

Liu et al. [17] investigated scaling kernel sizes beyond 51x51 using sparsity. Their proposed SLaK (Sparse Large Kernel Network) architecture, based on sparse factorized 51x51 kernels, achieves performance on par with or better than state-of-the-art hierarchical Transformers and modern ConvNets.

These studies suggest that large kernels can be effective for capturing long-range dependencies and improving performance, but they often require careful design and optimization to address the associated computational challenges.

3. Trade-offs and Efficient Implementations

The increased computational cost of larger kernels is a recurring theme in the literature. Ma et al. [1] and [9] introduced hyper-convolutions to decouple kernel size from the number of learnable parameters, enabling the use of larger kernels with improved parameter efficiency. They found that larger kernels are generally better for tasks requiring a larger receptive field but that simply increasing kernel size in standard convolutions leads to overfitting.

Chen et al. [12] proposed XSepConv, which fuses spatially separable convolutions into depthwise convolution to reduce computational cost and parameter size of large kernels. Their experiments demonstrate that replacing 5x5 depthwise convolutions in MobileNetV3-Small with XSepConv leads to improved accuracy with fewer parameters and FLOPs.

Chen et al. [14] also found that by simply replacing 3x3 filters with 5x5 filters in depthwise convolutions in MobileNetV1-0.25, the accuracy on ImageNet increased by more than 1%. This demonstrates that larger kernels in depthwise convolutions can improve accuracy for image classification. They also introduced a "DDC layer" which combines depthwise convolutions with dilated convolutions, reducing computational costs.

Hoang and Jo [15] introduced PydMobileNet, which uses a pyramid of kernel sizes within the Depthwise Separable Convolution operation. Their experiments show that using a combination of kernel sizes (3x3, 5x5, 7x7) generally leads to better performance than using a single 3x3 kernel size.

Chen et al. [10] used dynamic multi-scale convolution to incorporate multiple kernel sizes (3x3 and 5x5) in parallel paths to capture features at different scales. They also explored depth-wise separable convolutions with kernel sizes of 5x5 and 7x7 as a lightweight alternative to standard convolutions.

These studies suggest that a multi-scale approach, where a combination of different kernel sizes are used to capture features at different scales, can be an effective strategy. They also highlight the importance of using efficient implementations, such as depthwise separable convolutions and dynamic convolution, to mitigate the computational cost of larger kernels.

4. Hardware Considerations

Several papers emphasized the importance of considering the target hardware when selecting kernel sizes. Chen et al. [14] designed a hardware architecture that efficiently handles depthwise convolutions with kernels larger than 3x3. Li et al. [2024.12736v1] argued against simply increasing kernel size due to hardware limitations. Chen et al. [2020.02.12046v1] addressed the challenges of using even-sized kernels due to the "shift problem" and emphasized the importance of symmetric padding to improve generalization abilities. They also proposed an improved symmetric padding strategy, performing symmetric padding within four successive even-sized convolution layers instead of a single even-sized convolution layer.

Chen et al. [111] focused on tailoring the hyperparameters of a wide-kernel CNN to fit different bearing fault vibration datasets, demonstrating that dataset properties, such as sampling rate, impact the optimal values of hyperparameters.

Chen et al. [2019.12.12405v2] introduced a genetic algorithm-based technique to reduce the efforts of finding the optimal combination of a hyper-parameter (kernel size) of a convolutional neural network-based architecture, directly addressing the research question.

Chen et al. [24] and Jiang et al. [2410.03435v1] proposed FPGA-based accelerators for CNNs. These papers considered the trade-offs and proposed a hardware design that supports various kernel sizes efficiently. They found that their design has shorter computational time for depthwise convolutions with either 3x3 or 5x5 kernels. The architecture is designed such that increasing kernel size does not significantly decrease computational efficiency.

Chen et al. [2020.02.2711:46:17Z] further emphasized that the choice of hyperparameters for a neural network can be different depending on the target machine.

These papers highlight that the optimal kernel size is not only determined by the network's accuracy but also by the characteristics of the target hardware. A kernel size that is theoretically optimal might not be practical due to hardware limitations.

5. Specific Applications

Some papers focused on specific applications, providing insights into the optimal kernel size within that particular context.

Yang et al. [2024.10.03 01:19:21Z] explored kernel sizes for pancreas segmentation in CT images. They found that a combination of different kernel sizes (e.g., 3x3 and 5x5) can be beneficial for capturing features at multiple scales and that depth-wise separable convolutions can be an effective way to reduce the computational cost of larger kernels.

Luo et al. [2022.09.05 06:56:48Z] strongly advocated for large kernel convolutions (specifically a decomposed 21x21 kernel) for single image dehazing, provided that the computational challenges are addressed.

Shan et al. [2020.01.19 04:21:51Z] argued that a combination of different kernel sizes is better than using a single, fixed kernel size for video action recognition. The best-performing configuration tested in the ablation study used kernel sizes of 1, 3, 5, and 7.

Ahmad et al. [2022.01.04 06:30:24Z] suggested that using a combination of different kernel sizes in a CNN architecture, specifically within a hybrid 2D/3D Inception network framework enhanced with an attention mechanism, can significantly improve hyperspectral image classification accuracy.

Chen et al. [2021.04.06 05:52:25Z] used 3x3 kernels in deformable convolution layers and 2x2 kernels for max-pooling in a network for change detection in SAR images.

Chen et al. [2024.11.19 09:17:13Z] tailored hyperparameters, including kernel width, of a wide-kernel CNN to fit different bearing fault vibration datasets.

These studies highlight the importance of considering the specific characteristics of the task and dataset when selecting kernel sizes. The optimal kernel size for one application may not be optimal for another.

6. Alternatives to Large Kernels

Several papers explored alternative approaches to achieve the benefits of large receptive fields without directly using large kernels.

Li et al. [2024.01.23 13:13:45Z] proposed a "shift-wise operator" that achieves the effects of large kernel sizes using smaller, more hardware-friendly convolutions combined with a shifting mechanism.

Romero et al. [2021.10.15 12:35:49Z] proposed FlexConv, which learns kernel sizes during training and explores continuous kernel convolutions. They found that varying kernel sizes allows the optimal size to be learned, and that the best architecture is somewhere in the middle of small and large kernel sizes.

Hoang and Jo [2018.11.17 02:58:31Z] proposed PydMobileNet, which uses a pyramid of kernel sizes within the Depthwise Separable Convolution operation.

Chen et al. [2020.02.27 11:46:17Z] introduced XSepConv, which fuses spatially separable convolutions into depthwise convolution to reduce the computational cost and parameter size of large kernels.

These approaches suggest that the benefits of large receptive fields can be achieved without directly using large kernels, offering a more efficient and flexible way to design CNN architectures.

7. Challenging the Assumption of Large Kernels

Some recent papers challenge the assumption that large kernels are the primary driver of high performance.

Yasuki and Taki [2024.03.11 12:48:22Z] argued that improvements in feature map quality are more critical than large effective receptive fields for downstream tasks like Weakly Supervised Object Localization (WSOL). They found that modern CNNs, such as ConvNeXt and RepLKNet, generate improved feature maps that address inherent problems in traditional Class Activation Mapping (CAM) methods used for WSOL.

Gavrikov and Keuper [2023.01.26 19:17:10Z] demonstrated that modern CNN architectures can achieve high test accuracies without updating randomly initialized spatial convolution filters. Instead, 1x1 convolutions provide the necessary recombination to create expressive network operators.

These papers suggest that other factors, such as feature map quality and the presence of linear combinations, can be just as important as kernel size in determining CNN performance.

8. Dataset and Task Specific

Many studies highlighted the importance of tailoring kernel size selection to the specific dataset and task at hand.

Tomen and van Gemert [2021.10.15 12:35:49Z] found that the benefits of larger kernel sizes are more pronounced when spectral leakage is addressed using a Hamming window. They also found that a 7x7 kernel size combined with a Hamming window provides better robustness against adversarial attacks.

Yang et al. [2017.03.15 00:52:50Z] advocated for using multiple parallel hybrid sub-nets, each employing different kernel sizes, for hyperspectral image classification. They argued that convolutions with different spatial sizes can capture more discriminative and important information for pixel-based HSIC.

Kulwa et al. [2022.09.13 08:14:49Z] found that larger kernel sizes (specifically 7x7 in their study) are generally better for CNNs used in sEMG-based motion intent classification with raw sEMG signals.

Hudson et al. [2024.11.19 09:17:13Z] emphasized that the optimal kernel size in the first layer of a wide-kernel CNN is highly dependent on the dataset when applied to bearing fault detection.

Yang et al. [2024.10.03 01:19:21Z] found that 5x5 convolutional layers achieved the best segmentation for pancreas segmentation in CT images.

Li et al. [2020.01.19 04:21:51Z] argued that the best approach is to use a mixture of different sizes, implemented in a depthwise convolutional manner to improve both accuracy and efficiency for action recognition tasks.

These studies highlight that the characteristics of the dataset and the requirements of the task play a crucial role in determining the optimal kernel size.

Conclusion

Based on the synthesis of information from the reviewed papers, the following conclusions can be drawn:

  1. No Universally Optimal Kernel Size: The research suggests that there is no single "best" kernel size for all CNNs and tasks. The optimal kernel size depends on a complex interplay of factors, including network architecture, dataset characteristics, computational constraints, and the desired trade-offs between accuracy, efficiency, and robustness.

  2. Context-Specific Optimization: The best approach involves tailoring the kernel size selection to the specific context of the application. This includes considering the following factors:

    • The type of data being processed (e.g., images, audio, time series).
    • The complexity of the features being extracted (e.g., local details, long-range dependencies).
    • The desired trade-off between accuracy and computational cost.
    • The target hardware platform and its architectural characteristics.
  3. Large Kernels Can Be Effective (With Caveats): Recent research suggests that larger kernels can be beneficial for capturing long-range dependencies and improving performance, particularly on tasks requiring a large receptive field. However, simply increasing kernel size can lead to increased computational cost and overfitting. Therefore, careful design and optimization techniques are needed to effectively leverage large kernels.

  4. Adaptive and Multi-Scale Approaches Are Promising: Several studies advocate for adaptive kernel sizes, where the network learns to adjust the kernel size based on the input data. Multi-scale approaches, where a combination of different kernel sizes are used to capture features at multiple scales, are also shown to be effective.

  5. Hardware Considerations Are Critical: The choice of kernel size should consider the target hardware platform and its limitations. Efficient hardware implementations, such as those using depthwise separable convolutions, tensor decompositions, and specialized FPGA architectures, can help to mitigate the computational cost of larger kernels.

  6. Smaller Kernels Can Be Sufficient (If Used Strategically): Smaller kernels can achieve similar performance to larger kernels if they are combined with other techniques to increase the receptive field, such as dilated convolutions, shift-wise operators, or stacking multiple layers. In some cases, smaller kernels may even be preferred due to their reduced computational cost and potential for improved generalization.

  7. The Future of Kernel Size Selection: The trend towards automated neural architecture search (NAS) suggests that the future of kernel size selection will involve algorithms that can automatically discover the optimal kernel sizes (and other architectural parameters) for a given task and hardware platform.

References

[1] Tianyu Ma, Adrian V. Dalca, Mert R. Sabuncu, "Hyper-Convolution Networks for Biomedical Image Segmentation", arXiv preprint:2105.10559v2, 2021.

[2] F. Boray Tek, İlker Çam, Deniz Karlı, "Adaptive Convolution Kernel for Artificial Neural Networks", arXiv preprint:2009.06385v1, 2020.

[3] Chao Li, Aojun Zhou, Anbang Yao, "Omni-Dimensional Dynamic Convolution", arXiv preprint:2209.07947v1, 2022.

[4] David W. Romero, Robert-Jan Bruintjes, Jakub M. Tomczak, Erik J. Bekkers, Mark Hoogendoorn, Jan C. van Gemert, "FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes", arXiv preprint:2110.08059v3, 2021.

[5] Pinjun Luo, Guoqiang Xiao, Xinbo Gao, Song Wu, "LKD-Net: Large Kernel Convolution Network for Single Image Dehazing", arXiv preprint:2209.01788v1, 2022.

[6] Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dongdong Chen, Lu Yuan, Zicheng Liu, "Dynamic Convolution: Attention over Convolution Kernels", arXiv preprint:1912.03458v2, 2019.

[7] Dachong Li, Li Li, Zhuangzhuang Chen, Jianqiang Li, "Shift-ConvNets: Small Convolutional Kernel with Large Kernel Effects", arXiv preprint:2401.12736v1, 2024.

[8] Shuang Wu, Guanrui Wang, Pei Tang, Feng Chen, Luping Shi, "Convolution with even-sized kernels and symmetric padding", arXiv preprint:1903.08385v2, 2019.

[9] Tianyu Ma, Alan Q. Wang, Adrian V. Dalca, Mert R. Sabuncu, "Hyper-Convolutions via Implicit Kernels for Medical Imaging", arXiv preprint:2202.02701v1, 2022.

[10] Jin Yang, Daniel S. Marcus, Aristeidis Sotiras, "DMC-Net: Lightweight Dynamic Multi-Scale and Multi-Resolution Convolution Network for Pancreas Segmentation in CT Images", arXiv preprint:2410.02129v1, 2024.

[11] Ruihan Xu, Haokui Zhang, Wenze Hu, Shiliang Zhang, Xiaoyu Wang, "ParCNetV2: Oversized Kernel with Enhanced Attention", arXiv preprint:2211.07157v3, 2022.

[12] Jiarong Chen, Zongqing Lu, Jing-Hao Xue, Qingmin Liao, "XSepConv: Extremely Separated Convolution", arXiv preprint:2002.12046v1, 2020.

[13] Animesh Singh, Sandip Saha, Ritesh Sarkhel, Mahantapas Kundu, Mita Nasipuri, Nibaran Das, "A Genetic Algorithm based Kernel-size Selection Approach for a Multi-column Convolutional Neural Network", arXiv preprint:1912.12405v2, 2019.

[14] Tse-Wei Chen, Wei Tao, Deyu Wang, Dongchao Wen, Kinya Osa, Masami Kato, "Hardware Architecture of Embedded Inference Accelerator and Analysis of Algorithms for Depthwise and Large-Kernel Convolutions", arXiv preprint:2104.14125v1, 2021.

[15] Van-Thanh Hoang, Kang-Hyun Jo, "PydMobileNet: Improved Version of MobileNets with Pyramid Depthwise Separable Convolution", arXiv preprint:1811.07083v1, 2018.

[16] Henry H. Yu, Xue Feng, Hao Sun, Ziwen Wang, "MixModule: Mixed CNN Kernel Module for Medical Image Segmentation", arXiv preprint:1910.08728v2, 2019.

[17] Shiwei Liu, Tianlong Chen, Xiaohan Chen, Xuxi Chen, Qiao Xiao, Boqian Wu, Tommi Kärkkäinen, Mykola Pechenizkiy, Decebal Mocanu, Zhangyang Wang, "More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity", arXiv preprint:2207.03620v3, 2022.

[18] David W. Romero, Neil Zeghidour, "DNArch: Learning Convolutional Neural Architectures by Backpropagation", arXiv preprint:2302.05400v2, 2023.

[19] Dan Hudson, Jurgen van den Hoogen, Martin Atzmueller, "Tailoring the Hyperparameters of a Wide-Kernel Convolutional Neural Network to Fit Different Bearing Fault Vibration Datasets", arXiv preprint:2411.15191v1, 2024.

[20] Paul Gavrikov, Janis Keuper, "The Power of Linear Combinations: Learning with Random Convolutions", arXiv preprint:2301.11360v2, 2023.

[21] Xin Zhang, Yingze Song, Tingting Song, Degang Yang, Yichen Ye, Jie Zhou, Liming Zhang, "LDConv: Linear deformable convolution for improving convolutional neural networks", arXiv preprint:2311.11587v3, 2023.

[22] Junjie Wang, Feng Gao, Junyu Dong, "Change Detection from SAR Images Based on Deformable Residual Convolutional Neural Networks", arXiv preprint:2104.02299v1, 2021.

[23] Hamza Boukraichi, Nissrine Akkari, Fabien Casenave, David Ryckelynck, "A priori compression of convolutional neural networks for wave simulators", arXiv preprint:2304.04964v2, 2023.

[24] Kaiyu Shan, Yongtao Wang, Zhuoying Wang, Tingting Liang, Zhi Tang, Ying Chen, Yangyan Li, "MixTConv: Mixed Temporal Convolutional Kernels for Efficient Action Recogntion", arXiv preprint:2001.06769v3, 2020.

[25] Muhammad Ahmad, Adil Mehmood Khan, Manuel Mazzara, Salvatore Distefano, Swalpa Kumar Roy, Xin Wu, "Attention Mechanism Meets with Hybrid Dense Network for Hyperspectral Image Classification", arXiv preprint:2201.01001v1, 2022.

[26] Frank Kulwa, Oluwarotimi Williams Samuel, Mojisola Grace Asogbon, Olumide Olayinka Obe, Guanglin Li, "Analyzing the Impact of Varied Window Hyper-parameters on Deep CNN for sEMG based Motion Intent Classification", arXiv preprint:2209.05804v1, 2022.

[27] Xiaohan Ding, Xiangyu Zhang, Yizhuang Zhou, Jungong Han, Guiguang Ding, Jian Sun, "Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs", arXiv preprint:2203.06717v4, 2022.

[28] Nergis Tomen, Jan van Gemert, "Spectral Leakage and Rethinking the Kernel Size in CNNs", arXiv preprint:2101.10143v2, 2021.

[29] Brosnan Yuen, "Classifying Multi-Gas Spectrums using Monte Carlo KNN and Multi-Resolution CNN", arXiv preprint:1907.02188v4, 2019.

[30] Shunsuke Yasuki, Masato Taki, "CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective", arXiv preprint:2403.06676v1, 2024.

[31] Yukang Chen, Jianhui Liu, Xiangyu Zhang, Xiaojuan Qi, Jiaya Jia, "LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs", arXiv preprint:2206.10555v2, 2022.

[32] Tianjin Huang, Tianlong Chen, Zhangyang Wang, Shiwei Liu, "The Counterattack of CNNs in Self-Supervised Learning: Larger Kernel Size might be All You Need", arXiv preprint:2312.05695v2, 2023.

[33] Yanli Liu, Bochen Guan, Qinwen Xu, Weiyi Li, Shuxue Quan, "SMOF: Squeezing More Out of Filters Yields Hardware-Friendly CNN Pruning", arXiv preprint:2110.10842v1, 2021.

[34] Johannes C. Myburgh, Coenraad Mouton, Marelie H. Davel, "Tracking translation invariance in CNNs", arXiv preprint:2104.05997v2, 2021.

[35] Wensi Tang, Guodong Long, Lu Liu, Tianyi Zhou, Michael Blumenstein, Jing Jiang, "Omni-Scale CNNs: a simple and effective kernel size configuration for time series classification", arXiv preprint:2002.10061v3, 2020.

[36] Anderson de Andrade, "Best Practices for Convolutional Neural Networks Applied to Object Recognition in Images", arXiv preprint:1910.13029v1, 2019.

[37] Christoph Linse, Beatrice Brückner, Thomas Martinetz, "Enhancing Generalization in Convolutional Neural Networks through Regularization with Edge and Line Features", arXiv preprint:2410.16897v1, 2024.

[38] Tharindu P. Miyanawala, Rajeev K. Jaiman, "An Efficient Deep Learning Technique for the Navier-Stokes Equations: Application to Unsteady Wake Flow Dynamics", arXiv preprint:1710.09099v3, 2017.

[39] Honghao Chen, Xiangxiang Chu, Yongjian Ren, Xin Zhao, Kaiqi Huang, "PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution", arXiv preprint:2403.07589v2, 2024.

[40] Ziwei Wang, Martin A. Trefzer, Simon J. Bale, Andy M. Tyrrell, "Multi-objective Evolutionary Approach for Efficient Kernel Size and Shape for CNN", arXiv preprint:2106.14776v1, 2021.

[41] Calden Wloka, John K. Tsotsos, "An Empirical Method to Quantify the Peripheral Performance Degradation in Deep Networks", arXiv preprint:2012.02749v1, 2020.

[42] Pengpeng Yang, Wei Zhao, Rongrong Ni, Yao Zhao, "Source Camera Identification Based On Content-Adaptive Fusion Network", arXiv preprint:1703.04856v1, 2017.

[43] Di Huang, Xishan Zhang, Rui Zhang, Tian Zhi, Deyuan He, Jiaming Guo, Chang Liu, Qi Guo, Zidong Du, Shaoli Liu, Tianshi Chen, Yunji Chen, "DWM: A Decomposable Winograd Method for Convolution Acceleration", arXiv preprint:2002.00552v1, 2020.

[44] Gagan Kanojia, Sudhakar Kumawat, Shanmuganathan Raman, "Exploring Temporal Differences in 3D Convolutional Neural Networks", arXiv preprint:1909.03309v1, 2019.

[45] Xiang Li, Wenhai Wang, Xiaolin Hu, Jian Yang, "Selective Kernel Networks", arXiv preprint:1903.06586v2, 2019.

[46] Kai Zhong, Zhao Song, Inderjit S. Dhillon, "Learning Non-overlapping Convolutional Neural Networks with Multiple Kernels", arXiv preprint:1711.03440v1, 2017.

[47] Jin Yang, Peijie Qiu, Yichi Zhang, Daniel S. Marcus, Aristeidis Sotiras, "D-Net: Dynamic Large Kernel with Dynamic Feature Fusion for Volumetric Medical Image Segmentation", arXiv preprint:2403.10674v2, 2024.

[48] Byeong Tak Lee, Yong-Yeon Jo, Joon-Myoung Kwon, "Optimizing Neural Network Scale for ECG Classification", arXiv preprint:2308.12492v1, 2023.

[49] Chenggang Zhao, Genghan Zhang, Ao Shen, Mingyu Gao, "Canvas: End-to-End Kernel Architecture Search in Neural Networks", arXiv preprint:2304.07741v3, 2023.

[50] Gang Wu, Junjun Jiang, Yuanchao Bai, Xianming Liu, "Incorporating Transformer Designs into Convolutions for Lightweight Image Super-Resolution", arXiv preprint:2303.14324v1, 2023.

[51] Hao Liang, Josue Ortega Caro, Vikram Maheshri, Ankit B. Patel, Guha Balakrishnan, "Linking convolutional kernel size to generalization bias in face analysis CNNs", arXiv preprint:2302.03750v2, 2023.

[52] Xin Jin, Hongyu Zhu, Mounîm A. El Yacoubi, Haiyang Li, Hongchao Liao, Huafeng Qin, Yun Jiang, "StarLKNet: Star Mixup with Large Kernel Networks for Palm Vein Identification", arXiv preprint:2405.12721v3, 2024.

[53] Jingbo Jiang, Xizi Chen, Chi-Ying Tsui, "Accelerating Large Kernel Convolutions with Nested Winograd Transformation.pdf", arXiv preprint:2102.13272v2, 2021.

[54] Peng Liu, Xiaoxiao Zhou, Junyiyang Li, El Basha Mohammad D, Ruogu Fang, "KRNET: Image Denoising with Kernel Regulation Network", arXiv preprint:1910.08867v1, 2019.

[55] Honghao Chen, Yurong Zhang, Xiaokun Feng, Xiangxiang Chu, Kaiqi Huang, "Revealing the Dark Secrets of Extremely Large Kernel ConvNets on Robustness", arXiv preprint:2407.08972v1, 2024.

[56] Jinhong Wang, Jintai Chen, Danny Chen, Jian Wu, "LKM-UNet: Large Kernel Vision Mamba UNet for Medical Image Segmentation", arXiv preprint:2403.07332v2, 2024.

[57] Kun He, Chao Li, Yixiao Yang, Gao Huang, John E. Hopcroft, "Integrating Large Circular Kernels into CNNs through Neural Architecture Search", arXiv preprint:2107.02451v4, 2021.

[58] Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun, "Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network", arXiv preprint:1703.02719v1, 2017.

[59] Dongheon Lee, Seokju Yun, Youngmin Ro, "Partial Large Kernel CNNs for Efficient Super-Resolution", arXiv preprint:2404.11848v1, 2024.

[60] Tianjin Huang, Lu Yin, Zhenyu Zhang, Li Shen, Meng Fang, Mykola Pechenizkiy, Zhangyang Wang, Shiwei Liu, "Are Large Kernels Better Teachers than Transformers for ConvNets?", arXiv preprint:2305.19412v1, 2023.

[61] Miaoxin Wang, Xiao Wu, Jun Lin, Zhongfeng Wang, "An FPGA-Based Accelerator Enabling Efficient Support for CNNs with Arbitrary Kernel Sizes", arXiv preprint:2402.14307v1, 2024.

[62] Shizheng Wen, Michael W. Lee, Kai M. Kruger Bastos, Earl H. Dowell, "Feature Identification in Complex Fluid Flows by Convolutional Neural Networks", arXiv preprint:2208.09663v1, 2022.

[63] Bing Su, Ji-Rong Wen, "Log-Polar Space Convolution for Convolutional Neural Networks", arXiv preprint:2107.11943v1, 2021.

[64] Meisam Rakhshanfar, "Radius Adaptive Convolutional Neural Network", arXiv preprint:1911.11079v1, 2019.

[65] Mo Zhang, Jie Zhao, Xiang Li, Li Zhang, Quanzheng Li, "ASCNet: Adaptive-Scale Convolutional Neural Networks for Multi-Scale Feature Learning", arXiv preprint:1907.03241v1, 2019.

[66] Tao Lei, Rui Sun, Xuan Wang, Yingbo Wang, Xi He, Asoke Nandi, "CiT-Net: Convolutional Neural Networks Hand in Hand with Vision Transformers for Medical Image Segmentation", arXiv preprint:2306.03373v2, 2023.

[67] Gongping Chen, Lu Zhou, Jianxun Zhang, Xiaotao Yin, Liang Cui, Yu Dai, "ESKNet-An enhanced adaptive selection kernel convolution for breast tumors segmentation", arXiv preprint:2211.02915v2, 2022.

[68] Rui Sun, Tao Lei, Weichuan Zhang, Yong Wan, Yong Xia, Asoke K. Nandi, "TEC-Net: Vision Transformer Embrace Convolutional Neural Networks for Medical Image Segmentation", arXiv preprint:2306.04086v3, 2023.

[69] Haoxiao Wang, Bo Peng, Jianhua Zhang, Xu Cheng, "AdaFSNet: Time Series Classification Based on Convolutional Network with a Adaptive and Effective Kernel Size Configuration", arXiv preprint:2404.18246v1, 2024.

[70] Sonia Rani Gupta, Nikela Papadopoulou, Miquel Pericas, "Accelerating CNN inference on long vector architectures via co-design", arXiv preprint:2212.11574v1, 2022.

[71] Zhihang Yuan, Xin Liu, Bingzhe Wu, Guangyu Sun, "ENAS4D: Efficient Multi-stage CNN Architecture Search for Dynamic Inference", arXiv preprint:2009.09182v1, 2020.

[72] Haiyong Chen, Yue Pang, Qidi Hu, Kun Liu, "Solar Cell Surface Defect Inspection Based on Multispectral Convolutional Neural Network", arXiv preprint:1812.06220v1, 2018.

[73] Md Sazedur Rahman, Mohamed Elmahallawy, Sanjay Madria, Samuel Frimpong, "CAV-AD: A Robust Framework for Detection of Anomalous Data and Malicious Sensors in CAV Networks", arXiv preprint:2407.05461v1, 2024.

[74] Xinheng Liu, Yao Chen, Cong Hao, Ashutosh Dhar, Deming Chen, "WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs", arXiv preprint:2107.04244v1, 2021.

[75] Zihan Yin, Akhilesh Jaiswal, "FPCA: Field-Programmable Pixel Convolutional Array for Extreme-Edge Intelligence", arXiv preprint:2408.10233v1, 2024.

[76] Zhen Zeng, Jianzong Wang, Ning Cheng, Jing Xiao, "MelGlow: Efficient Waveform Generative Network Based on Location-Variable Convolution", arXiv preprint:2012.01684v1, 2020.

[77] Junyan Wang, Zhenhong Sun, Yichen Qian, Dong Gong, Xiuyu Sun, Ming Lin, Maurice Pagnucco, Yang Song, "Maximizing Spatio-Temporal Entropy of Deep 3D CNNs for Efficient Video Recognition", arXiv preprint:2303.02693v1, 2023.

[78] Jaeseong Lee, Duseok Kang, Soonhoi Ha, "S3NAS: Fast NPU-aware Neural Architecture Search Methodology", arXiv preprint:2009.02009v1, 2020.

[79] Lukas Hahn, Lutz Roese-Koerner, Klaus Friedrichs, Anton Kummert, "Fast and Reliable Architecture Selection for Convolutional Neural Networks", arXiv preprint:1905.01924v1, 2019.

[80] Junhong Shen, Mikhail Khodak, Ameet Talwalkar, "Efficient Architecture Search for Diverse Tasks", arXiv preprint:2204.07554v3, 2022.

[81] Yuke Wang, Boyuan Feng, Xueqiao Peng, Yufei Ding, "An Efficient Quantitative Approach for Optimizing Convolutional Neural Networks", arXiv preprint:2009.05236v4, 2020.

[82] Yu Xue, Ziming Yuan, Adam Slowik, "A Novel Sleep Stage Classification Using CNN Generated by an Efficient Neural Architecture Search with a New Data Processing Trick", arXiv preprint:2110.15277v3, 2021.

[83] Rui Wang, Zhihua Wei, Haoran Duan, Shouling Ji, Yang Long, Zhen Hong, "EfficientTDNN: Efficient Architecture Search for Speaker Recognition", arXiv preprint:2103.13581v5, 2021.

[84] Tamirlan Seidakhmetov, "Question Type Classification Methods Comparison", arXiv preprint:2001.00571v1, 2020.

[85] Jodie Crocker, Krishna Kumar, Brady R. Cox, "Using explainability to design physics-aware CNNs for solving subsurface inverse problems", arXiv preprint:2211.08651v2, 2022.

[86] Wuyang Chen, Junru Wu, Zhangyang Wang, Boris Hanin, "Principled Architecture-aware Scaling of Hyperparameters", arXiv preprint:2402.17440v1, 2024.

[87] Ruinan Wang, Ian Nabney, Mohammad Golbabaee, "Efficient Hyperparameter Importance Assessment for CNNs", arXiv preprint:2410.08920v1, 2024.

[88] Pongpak Manoret, Punnatorn Chotipurk, Sompoom Sunpaweravong, Chanati Jantrachotechatchawan, Kobchai Duangrattanalert, "Automatic Detection of Depression from Stratified Samples of Audio Data", arXiv preprint:2111.10783v1, 2021.

[89] Bum Jun Kim, Hyeyeon Choi, Hyeonah Jang, Dong Gu Lee, Wonseok Jeong, Sang Woo Kim, "Dead Pixel Test Using Effective Receptive Field", arXiv preprint:2108.13576v1, 2021.

[90] Shahaf E. Finder, Roy Amoyal, Eran Treister, Oren Freifeld, "Wavelet Convolutions for Large Receptive Fields", arXiv preprint:2407.05848v2, 2024.

[91] Ho Hin Lee, Quan Liu, Shunxing Bao, Qi Yang, Xin Yu, Leon Y. Cai, Thomas Li, Yuankai Huo, Xenofon Koutsoukos, Bennett A. Landman, "Scaling Up 3D Kernels with Bayesian Frequency Re-parameterization for Medical Image Segmentation", arXiv preprint:2303.05785v2, 2023.

[92] Chengxu Wu, Qinrui Fan, Shu Hu, Xi Wu, Xin Wang, Jing Hu, "Efficient Image Super-Resolution via Symmetric Visual Attention Network", arXiv preprint:2401.08913v1, 2024.

[93] Serkan Kiranyaz, Junaid Malik, Mehmet Yamac, Mert Duman, Ilke Adalioglu, Esin Guldogan, Turker Ince, Moncef Gabbouj, "Super Neurons", arXiv preprint:2109.01594v2, 2021.

[94] Haoyang He, Jiangning Zhang, Yuxuan Cai, Hongxu Chen, Xiaobin Hu, Zhenye Gan, Yabiao Wang, Chengjie Wang, Yunsheng Wu, Lei Xie, "MobileMamba: Lightweight Multi-Receptive Visual Mamba Network", arXiv preprint:2411.15941v1, 2024.

[95] Tianyang Wang, Mingxuan Sun, Kaoning Hu, "Dilated Deep Residual Network for Image Denoising", arXiv preprint:1708.05473v3, 2017.

[96] Guoyi Zhang, Guangsheng Xu, Han Wang, Siyang Chen, Yunxiao Shan, Xiaohu Zhang, "Learning Dynamic Local Context Representations for Infrared Small Target Detection", arXiv preprint:2412.17401v1, 2024.

[97] Zhun Sun, Mete Ozay, Takayuki Okatani, "Design of Kernels in Convolutional Neural Networks for Image Classification", arXiv preprint:1511.09231v3, 2015.

[98] Ismail Khalfaoui-Hassani, Thomas Pellegrini, Timothée Masquelier, "Dilated convolution with learnable spacings", arXiv preprint:2112.03740v4, 2021.

[99] Junaid Malik, Serkan Kiranyaz, Moncef Gabbouj, "Image denoising by Super Neurons: Why go deep?", arXiv preprint:2111.14948v1, 2021.

[100] Li Zhang, Jiachen Lu, Sixiao Zheng, Xinxuan Zhao, Xiatian Zhu, Yanwei Fu, Tao Xiang, Jianfeng Feng, Philip H. S. Torr, "Vision Transformers: From Semantic Segmentation to Dense Prediction", arXiv preprint:2207.09339v4, 2022.

[101] Yanwen Li, Luyang Luo, Huangjing Lin, Pheng-Ann Heng, Hao Chen, "Scale-aware Super-resolution Network with Dual Affinity Learning for Lesion Segmentation from Medical Images", arXiv preprint:2305.19063v1, 2023.

[102] Chun Bao, Jie Cao, Yaqian Ning, Yang Cheng, Qun Hao, "Rega-Net:Retina Gabor Attention for Deep Convolutional Neural Networks", arXiv preprint:2211.12698v2, 2022.

[103] Xian Lin, Zengqiang Yan, Xianbo Deng, Chuansheng Zheng, Li Yu, "ConvFormer: Plug-and-Play CNN-Style Transformers for Improving Medical Image Segmentation", arXiv preprint:2309.05674v1, 2023.

[104] Ziyu Wang, Jie Yang, Mohamad Sawan, "A Novel Multi-scale Dilated 3D CNN for Epileptic Seizure Prediction", arXiv preprint:2105.02823v1, 2021.

[105] Qingchao Zhang, Coy D. Heldermon, Corey Toler-Franklin, "Multiscale Detection of Cancerous Tissue in High Resolution Slide Scans", arXiv preprint:2010.00641v1, 2020.

[106] Jiangwei Weng, Zhiqiang Yan, Ying Tai, Jianjun Qian, Jian Yang, Jun Li, "MambaLLIE: Implicit Retinex-Aware Low Light Enhancement with Global-then-Local State Space", arXiv preprint:2405.16105v1, 2024.

[107] Rulin Shao, Zhouxing Shi, Jinfeng Yi, Pin-Yu Chen, Cho-Jui Hsieh, "On the Adversarial Robustness of Vision Transformers", arXiv preprint:2103.15670v3, 2021.

[108] Lida Li, Shuai Li, Kun Wang, Xiangchu Feng, Lei Zhang, "Towards Robust 2D Convolution for Reliable Visual Recognition", arXiv preprint:2203.09790v1, 2022.

[109] Zeyu Wang, Yutong Bai, Yuyin Zhou, Cihang Xie, "Can CNNs Be More Robust Than Transformers?", arXiv preprint:2206.03452v2, 2022.

[110] Xiaojiao Guo, Yihang Dong, Xuhang Chen, Weiwen Chen, Zimeng Li, FuChen Zheng, Chi-Man Pun, "Underwater Image Restoration via Polymorphic Large Kernel CNNs", arXiv preprint:2412.18459v1, 2024.

[111] Janani Suresh, Nancy Nayak, Sheetal Kalyani, "First line of defense: A robust first layer mitigates adversarial attacks", arXiv preprint:2408.11680v1, 2024.

[112] Guandong Li, Chunju Zhang, "Faster hyperspectral image classification based on selective kernel mechanism using deep convolutional networks", arXiv preprint:2202.06458v1, 2022.

[113] Rohan Ghosh, Anupam K. Gupta, Mehul Motani, "Investigating Convolutional Neural Networks using Spatial Orderness", arXiv preprint:1908.06416v2, 2019.

[114] Wenzhuo Liu, Fei Zhu, Cheng-Lin Liu, "Multi-scale Unified Network for Image Classification", arXiv preprint:2403.18294v1, 2024.

[115] Tyler Highlander, Andres Rodriguez, "Very Efficient Training of Convolutional Neural Networks using Fast Fourier Transform and Overlap-and-Add", arXiv preprint:1601.06815v1, 2016.

[116] Jiabin Ma, Wei Wang, Liang Wang, "Irregular Convolutional Neural Networks", arXiv preprint:1706.07966v1, 2017.

[117] Roman Snytsar, "Accelerating Machine Learning Primitives on Commodity Hardware", arXiv preprint:2310.05218v1, 2023.

[118] Julia Grabinski, Janis Keuper, Margret Keuper, "As large as it gets: Learning infinitely large Filters via Neural Implicit Functions in the Fourier Domain", arXiv preprint:2307.10001v2, 2023.

[119] Fareed Qararyah, Muhammad Waqar Azhar, Mohammad Ali Maleki, Pedro Trancoso, "Fusing Depthwise and Pointwise Convolutions for Efficient Inference on GPUs", arXiv preprint:2404.19331v2, 2024.

[120] Alexandre Kirchmeyer, Jia Deng, "Convolutional Networks with Oriented 1D Kernels", arXiv preprint:2309.15812v1, 2023.

[121] Haoyuan Gui, Xiaoyu Zhang, Chong Zhang, Zitong Su, Huiyuan Li, "Optimizing Winograd Convolution on ARMv8 processors", arXiv preprint:2411.16152v2, 2024.

[122] Kin Wai Lau, Lai-Man Po, Yasar Abbas Ur Rehman, "Large Separable Kernel Attention: Rethinking the Large Kernel Attention Design in CNN", arXiv preprint:2309.01439v3, 2023.

[123] Tuo Feng, Wenguan Wang, Fan Ma, Yi Yang, "LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels", arXiv preprint:2403.15173v1, 2024.

[124] Viacheslav Dudar, Vladimir Semenov, "Use of symmetric kernels for convolutional neural networks", arXiv preprint:1805.09421v1, 2018.

[125] Guotian Xie, Jingdong Wang, Ting Zhang, Jianhuang Lai, Richang Hong, Guo-Jun Qi, "IGCV$2$: Interleaved Structured Sparse Convolutional Neural Networks", arXiv preprint:1804.06202v1, 2018.

[126] Priyadarshini Panda, Gopalakrishnan Srinivasan, Kaushik Roy, "Convolutional Spike Timing Dependent Plasticity based Feature Learning in Spiking Neural Networks", arXiv preprint:1703.03854v2, 2017.

[127] Matthias Fey, Jan Eric Lenssen, Frank Weichert, Heinrich Müller, "SplineCNN: Fast Geometric Deep Learning with Continuous B-Spline Kernels", arXiv preprint:1711.08920v2, 2017.

[128] Chuhui Qiu, Bugao Liang, Matthew L Key, "Effect of Kernel Size on CNN-Vision-Transformer-Based Gaze Prediction Using Electroencephalography Data", arXiv preprint:2408.03478v1, 2024.

[129] Shiyu Li, Edward Hanson, Hai Li, Yiran Chen, "PENNI: Pruned Kernel Sharing for Efficient CNN Inference", arXiv preprint:2005.07133v2, 2020.

[130] Seong-Hu Kim, Hyeonuk Nam, Yong-Hwa Park, "Decomposed Temporal Dynamic CNN: Efficient Time-Adaptive Network for Text-Independent Speaker Verification Explained with Speaker Activation Map", arXiv preprint:2203.15277v2, 2022.

[131] Mateusz Gabor, Rafał Zdunek, "Reduced storage direct tensor ring decomposition for convolutional neural networks compression", arXiv preprint:2405.10802v2, 2024.

[132] Cheng Tai, Tong Xiao, Yi Zhang, Xiaogang Wang, Weinan E, "Convolutional neural networks with low-rank regularization", arXiv preprint:1511.06067v3, 2015.

[133] Chao Li, Zhun Sun, Jinshi Yu, Ming Hou, Qibin Zhao, "Low-Rank Embedding of Kernels in Convolutional Neural Networks under Random Shuffling", arXiv preprint:1810.13098v1, 2018.

[134] Bianjiang Yang, Zi Hui, Haoji Hu, Xinyi Hu, Lu Yu, "Compressing Facial Makeup Transfer Networks by Collaborative Distillation and Kernel Decomposition", arXiv preprint:2009.07604v1, 2020.

[135] Tobias Engelhardt Rasmussen, Line H Clemmensen, Andreas Baum, "Compressing CNN Kernels for Videos Using Tucker Decompositions: Towards Lightweight CNN Applications", arXiv preprint:2203.07033v1, 2022.

[136] Anh-Huy Phan, Konstantin Sobolev, Konstantin Sozykin, Dmitry Ermilov, Julia Gusak, Petr Tichavsky, Valeriy Glukhov, Ivan Oseledets, Andrzej Cichocki, "Stable Low-rank Tensor Decomposition for Compression of Convolutional Neural Network", arXiv preprint:2008.05441v1, 2020.

[137] Sukhbinder Singh, Saeed S. Jahromi, Roman Orus, "Tensor network compressibility of convolutional models", arXiv preprint:2403.14379v2, 2024.

[138] Bijiao Wu, Dingheng Wang, Guangshe Zhao, Lei Deng, Guoqi Li, "Hybrid Tensor Decomposition in Neural Network Compression", arXiv preprint:2006.15938v3, 2020.

[139] Mete Ozay, Takayuki Okatani, "Optimization on Product Submanifolds of Convolution Kernels", arXiv preprint:1701.06123v2, 2017.

[140] Rui Lin, Ching-Yun Ko, Zhuolun He, Cong Chen, Yuan Cheng, Hao Yu, Graziano Chesi, Ngai Wong, "HOTCAKE: Higher Order Tucker Articulated Kernels for Deeper CNN Compression", arXiv preprint:2002.12663v1, 2020.

[141] Yunsheng Li, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dongdong Chen, Ye Yu, Lu Yuan, Zicheng Liu, Mei Chen, Nuno Vasconcelos, "Revisiting Dynamic Convolution via Matrix Decomposition", arXiv preprint:2103.08756v1, 2021.

[142] Yash Bhalgat, Yizhe Zhang, Jamie Lin, Fatih Porikli, "Structured Convolutions for Efficient Neural Network Design", arXiv preprint:2008.02454v2, 2020.

[143] Yinan Wang, Weihong "Grace" Guo, Xiaowei Yue, "Tensor decomposition to Compress Convolutional Layers in Deep Learning", arXiv preprint:2005.13746v2, 2020.

[144] Marawan Gamal Abdel Hameed, Ali Mosleh, Marzieh S. Tahaei, Vahid Partovi Nia, "SeKron: A Decomposition Method Supporting Many Factorization Structures", arXiv preprint:2210.06299v1, 2022.

[145] Jason Chun Lok Li, Rui Lin, Jiajun Zhou, Edmund Yin Mun Lam, Ngai Wong, "A Unifying Tensor View for Lightweight CNNs", arXiv preprint:2312.09922v1, 2023.

[146] Jun-Gi Jang, Chun Quan, Hyun Dong Lee, U Kang, "FALCON: Lightweight and Accurate Convolution", arXiv preprint:1909.11321v2, 2019.

[147] Saeed Khaki, Hieu Pham, Ye Han, Andy Kuhl, Wade Kent, Lizhi Wang, "Convolutional Neural Networks for Image-based Corn Kernel Detection and Counting", arXiv preprint:2003.12025v2, 2020.

[148] Kohei Hayashi, Taiki Yamaguchi, Yohei Sugawara, Shin-ichi Maeda, "Einconv: Exploring Unexplored Tensor Network Decompositions for Convolutional Neural Networks", arXiv preprint:1908.04471v2, 2019.

[149] Hugh Perkins, "cltorch: a Hardware-Agnostic Backend for the Torch Deep Neural Network Library, Based on OpenCL", arXiv preprint:1606.04884v1, 2016.

[150] Pai-Yu Tan, Po-Yao Chuang, Yen-Ting Lin, Cheng-Wen Wu, Juin-Ming Lu, "A Power-Efficient Binary-Weight Spiking Neural Network Architecture for Real-Time Object Classification", arXiv preprint:2003.06310v1, 2020.

[151] Juping Zhang, Gan Zheng, Toshiaki Koike-Akino, Kai-Kit Wong, Fraser Burton, "Hybrid Quantum-Classical Neural Networks for Downlink Beamforming Optimization", arXiv preprint:2408.04747v1, 2024.

[152] Jung Hwan Heo, Arash Fayyazi, Amirhossein Esmaili, Massoud Pedram, "Sparse Periodic Systolic Dataflow for Lowering Latency and Power Dissipation of Convolutional Neural Network Accelerators", arXiv preprint:2207.00068v1, 2022.

[153] Baptiste Nguyen, Pierre-Alain Moellic, Sylvain Blayac, "Evaluation of Convolution Primitives for Embedded Neural Networks on 32-bit Microcontrollers", arXiv preprint:2303.10702v1, 2023.

[154] Qianhao Chen, Jietao Chen, "Streaming Lossless Volumetric Compression of Medical Images Using Gated Recurrent Convolutional Neural Network", arXiv preprint:2311.16200v1, 2023.

[155] Wei Wei, Lingjie Xu, Lingling Jin, Wei Zhang, Tianjun Zhang, "AI Matrix - Synthetic Benchmarks for DNN", arXiv preprint:1812.00886v1, 2018.

[156] Zhijian Liu, Haotian Tang, Shengyu Zhao, Kevin Shao, Song Han, "PVNAS: 3D Neural Architecture Search with Point-Voxel Convolution", arXiv preprint:2204.11797v2, 2022.

[157] Qian Jiang, Xiaofan Zhang, Deming Chen, Minh N. Do, Raymond A. Yeh, "EH-DNAS: End-to-End Hardware-aware Differentiable Neural Architecture Search", arXiv preprint:2111.12299v1, 2021.

[158] Azzam Alhussain, Mingjie Lin, "Hardware-Efficient Template-Based Deep CNNs Accelerator Design", arXiv preprint:2207.10723v1, 2022.

[159] Chinthaka Gamanayake, Lahiru Jayasinghe, Benny Ng, Chau Yuen, "Cluster Pruning: An Efficient Filter Pruning Method for Edge AI Vision Applications", arXiv preprint:2003.02449v1, 2020.

[160] Xiong Jun, "FPGA deep learning acceleration based on convolutional neural network", arXiv preprint:2012.03672v1, 2020.

[161] Orian Leitersdorf, Ronny Ronen, Shahar Kvatinsky, "ConvPIM: Evaluating Digital Processing-in-Memory through Convolutional Neural Network Acceleration", arXiv preprint:2305.04122v1, 2023.

[162] Ting Zhang, Guo-Jun Qi, Bin Xiao, Jingdong Wang, "Interleaved Group Convolutions for Deep Neural Networks", arXiv preprint:1707.02725v2, 2017.

[163] Renzo Andri, Beatrice Bussolino, Antonio Cipolletta, Lukas Cavigelli, Zhe Wang, "Going Further With Winograd Convolutions: Tap-Wise Quantization for Efficient Inference on 4x4 Tile", arXiv preprint:2209.12982v1, 2022.

[164] Amit Sarkar, "A Novel FPGA-based CNN Hardware Accelerator: Optimization for Convolutional Layers using Karatsuba Ofman Multiplier", arXiv preprint:2412.20393v1, 2024.

[165] Kaiheng Weng, Xiangxiang Chu, Xiaoming Xu, Junshi Huang, Xiaoming Wei, "EfficientRep:An Efficient Repvgg-style ConvNets with Hardware-aware Neural Network Design", arXiv preprint:2302.00386v1, 2023.

[166] Marco Carreras, Gianfranco Deriu, Luigi Raffo, Luca Benini, Paolo Meloni, "Optimizing Temporal Convolutional Network inference on FPGA-based accelerators", arXiv preprint:2005.03775v1, 2020.

[167] Bradley McDanel, Sai Qian Zhang, H. T. Kung, Xin Dong, "Full-stack Optimization for Accelerating CNNs with FPGA Validation", arXiv preprint:1905.00462v1, 2019.

[168] Juan Zhong, Zheng Liu, Xi Chen, "Transformer-based models and hardware acceleration analysis in autonomous driving: A survey", arXiv preprint:2304.10891v1, 2023.

[169] Daniel T. Speckhard, Karolis Misiunas, Sagi Perel, Tenghui Zhu, Simon Carlile, Malcolm Slaney, "Neural Architecture Search for Energy Efficient Always-on Audio Models", arXiv preprint:2202.05397v2, 2022.

[170] Tse-Wei Chen, Deyu Wang, Wei Tao, Dongchao Wen, Lingxiao Yin, Tadayuki Ito, Kinya Osa, Masami Kato, "CASSOD-Net: Cascaded and Separable Structures of Dilated Convolution for Embedded Vision Systems and Applications", arXiv preprint:2104.14126v1, 2021.

[171] Andrea Mattia Garavagno, Daniele Leonardis, Antonio Frisoli, "Colab NAS: Obtaining lightweight task-specific convolutional neural networks following Occam's razor", arXiv preprint:2212.07700v2, 2022.

[172] Lin Bai, Yecheng Lyu, Xinming Huang, "A Unified Hardware Architecture for Convolutions and Deconvolutions in CNN", arXiv preprint:2006.00053v1, 2020.

[173] Michal Pinos, Vojtech Mrazek, Lukas Sekanina, "Evolutionary Neural Architecture Search Supporting Approximate Multipliers", arXiv preprint:2101.11883v1, 2021.

[174] Surya Selvam, Vinod Ganesan, Pratyush Kumar, "FuSeConv: Fully Separable Convolutions for Fast Inference on Systolic Arrays", arXiv preprint:2105.13434v1, 2021.

[175] Lukas Hedegaard, Alexandros Iosifidis, "Continual 3D Convolutional Neural Networks for Real-time Processing of Videos", arXiv preprint:2106.00050v3, 2021.

[176] Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, Jian Cheng, "Quantized Convolutional Neural Networks for Mobile Devices", arXiv preprint:1512.06473v3, 2015.

[177] Ruiqi Liang, Shuai Wang, Yiying Dong, Liu Li, Ying Kuang, Bohan Zhang, Yuanmu Yang, "Metasurface-generated large and arbitrary analog convolution kernels for accelerated machine vision", arXiv preprint:2409.18614v1, 2024.

[178] Maurice Yang, Mahmoud Faraj, Assem Hussein, Vincent Gaudet, "Efficient Hardware Realization of Convolutional Neural Networks using Intra-Kernel Regular Pruning", arXiv preprint:1803.05909v1, 2018.

[179] Mincheol Park, Dongjin Kim, Cheonjun Park, Yuna Park, Gyeong Eun Gong, Won Woo Ro, Suhyun Kim, "REPrune: Channel Pruning via Kernel Representative Selection", arXiv preprint:2402.17862v3, 2024.

[180] Yaohui Cai, Weizhe Hua, Hongzheng Chen, G. Edward Suh, Christopher De Sa, Zhiru Zhang, "Structured Pruning is All You Need for Pruning CNNs at Initialization", arXiv preprint:2203.02549v2, 2022.

[181] Wei Wang, Liqiang Zhu, "Reliable Identification of Redundant Kernels for Convolutional Neural Network Compression", arXiv preprint:1812.03608v1, 2018.

[182] Po-Hsiang Yu, Sih-Sian Wu, Liang-Gee Chen, "KCP: Kernel Cluster Pruning for Dense Labeling Neural Networks", arXiv preprint:2101.06686v1, 2021.

[183] Svetlana Pavlitska, Oliver Bagge, Federico Peccia, Toghrul Mammadov, J. Marius Zöllner, "Iterative Filter Pruning for Concatenation-based CNN Architectures", arXiv preprint:2405.03715v1, 2024.

[184] Tianxiao Zhang, Wenju Xu, Bo Luo, Guanghui Wang, "Depth-Wise Convolutions in Vision Transformers for Efficient Training on Small Datasets", arXiv preprint:2407.19394v4, 2024.

[185] Sayan Mandal, Sarthak Yadav, Atul Rai, "End-to-End Bengali Speech Recognition", arXiv preprint:2009.09615v2, 2020.

[186] Junjie Li, Ding Liu, "Information Bottleneck Theory on Convolutional Neural Networks", arXiv preprint:1911.03722v2, 2019.