A collection of tiny machine learning models for semantic image segmentation in IoT devices, written in Julia/Flux.
Besides regular mask outputs, all models deliver their internal feature maps as additional outputs, which are useful for model compression through knowledge distillation.
UNet5 is the classic U-Net architecture, with five encoder/decoder levels. UNet4 has four levels.
Reference:
- "U-Net: Convolutional Networks for Biomedical Image Segmentation" (arXiv). Credits: Ronnenberger, Olaf; Fischer, Philipp; and Brox, Thomas.
Mobile-Unet has the same encoder structure as the Mobilenet-V2 classification model, and the same u-shape and skip connection principles as the U-Net.
Reference:
- "Mobile-Unet: An efficient convolutional neural network for fabric defect detection" (doi.org). Credits: Jing, Junfeng; Wang, Zhen; Ratsch, Matthias; and Zhang, Huanhuan.
ESPNet utilizes the Efficient Spatial Pyramid module and the PReLU nonlinearity.
Reference:
- Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation" (arXiv). Credits: Mehta, Sachin; Rastegari, Mohammad; Caspi, Anat; Shapiro, Linda; and Hajishirzi, Hannaneh.
PReLU is a trainable nonlinearity, which is incorporated in ESPNet.
Reference:
- "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification" (arXiv). Credits: Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
Credits for the original architectures go to the references' authors, as aforementioned.
Credits for the implementations in Julia/Flux go to Ciro B Rosa.
- GitHub: https://github.com/cirobr
- LinkedIn: https://www.linkedin.com/in/cirobrosa/
With no arguments, all models accept 3-channels Float32 input and deliver 1-channel mask with sigmoid output activation.
model = UNet5() # sigmoid output
model = UNet4(3,1) # sigmoid output
If ch_out > 1, output activation becomes softmax. For instance, a model with 3-channels input and 2-channels output becomes:
model = UNet5(3,2) # softmax output
# Both UNet5() and UNet() calls are the same classic U-Net
UNet5(3, 1; # input/output channels
activation = relu, # activation function
)
UNet4(3, 1; # input/output channels
activation = relu, # activation function
)
MobileUNet(3, 1; # input/output channels
activation = relu6, # activation function
)
# Model calls for alpha2=5, alpha3=8, which differ from default constructor
ESPNet(3, 1; # input/output channels
activation = "prelu" # activation function (if "prelu", use between quotes)
)
Constructors are models which allow access to a multitude of hyperparameters. Each model from above has been build with the aid of these constructors, where hyperparameters are chosen for performance.
# Both unet5() and unet() calls are the same classic unet
unet5(3, 1; # input/output channels
activation = relu, # activation function
alpha = 1, # channels divider
edrops = (0.0, 0.0, 0.0, 0.0, 0.0), # dropout rates
ddrops = (0.0, 0.0, 0.0, 0.0), # dropout rates
)
unet4(3, 1; # input/output channels
activation = relu, # activation function
alpha = 1, # channels divider
edrops = (0.0, 0.0, 0.0, 0.0, 0.0), # dropout rates
ddrops = (0.0, 0.0, 0.0, 0.0), # dropout rates
)
Both unet5() and unet() call the same classic U-Net with five encoder/decoder stages, each of them delivering features with respectivelly
Argument
mobileunet(3, 1; # input/output channels
activation = relu6, # activation function
edrops = (0.0, 0.0, 0.0, 0.0, 0.0), # dropout rates
ddrops = (0.0, 0.0, 0.0, 0.0), # dropout rates
)
# ConvPReLU is incorporated, no need to pass activation function
espnet(3, 1; # input/output channels
activation = "prelu", # activation function (if "prelu", use between quotes)
alpha2 = 2, # expansion factor in encoder stage 2
alpha3 = 3, # expansion factor in encoder stage 3
edrops = (0.0, 0.0, 0.0), # dropout rates for encoder
ddrops = (0.0, 0.0), # dropout rates for decoder
)
PReLU(ch) # number of channels
