【语义分割专题】语义分割相关工作--ENet网络相关工作

2021-10-19 13:02:17 阅读：176 来源： 互联网

标签：kernel 分割 name normal -- 语义 encoder bottleneck True

ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

Paszke, A., Chaurasia, A., Kim, S., & Culurciello, E. (2016). ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation. ArXiv, abs/1606.02147.

在这里插入图片描述

  # Initial block of the model:
  #         Input
  #        /     \
  #       /       \
  #maxpool2d    conv2d-3x3
  #       \       /  
  #        \     /
  #      concatenate


 # Upsampling bottleneck:
  #     Bottleneck Input
  #        /        \
  #       /          \
  # conv2d-1x1     convTrans2d-1x1
  #      |             | PReLU
  #      |         convTrans2d-3x3
  #      |             | PReLU
  #      |         convTrans2d-1x1
  #      |             |
  # maxunpool2d    Regularizer
  #       \           /  
  #        \         /
  #      Summing + PReLU
  #
  #  Params: 
  #  projection_ratio - ratio between input and output channels
  #  relu - if True: relu used as the activation function else: Prelu us used



  # Regular|Dilated|Downsampling bottlenecks:
  #
  #     Bottleneck Input
  #        /        \
  #       /          \
  # maxpooling2d   conv2d-1x1
  #      |             | PReLU
  #      |         conv2d-3x3
  #      |             | PReLU
  #      |         conv2d-1x1
  #      |             |
  #  Padding2d     Regularizer
  #       \           /  
  #        \         /
  #      Summing + PReLU
  #
  # Params: 
  #  dilation (bool) - if True: creating dilation bottleneck
  #  down_flag (bool) - if True: creating downsampling bottleneck
  #  projection_ratio - ratio between input and output channels
  #  relu - if True: relu used as the activation function else: Prelu us used
  #  p - dropout ratio



  # Asymetric bottleneck:
  #
  #     Bottleneck Input
  #        /        \
  #       /          \
  #      |         conv2d-1x1
  #      |             | PReLU
  #      |         conv2d-1x5
  #      |             |
  #      |         conv2d-5x1
  #      |             | PReLU
  #      |         conv2d-1x1
  #      |             |
  #  Padding2d     Regularizer
  #       \           /  
  #        \         /
  #      Summing + PReLU
  #
  # Params:    
  #  projection_ratio - ratio between input and output channels

论文架构

Abstract
Introduction

目前的图像分割网络主要是用到了VGG16的架构，大量的参数以及长时间的推理时间。

可好好思考一下谷歌新提出的efficientnet网络架构。
Related Work

large architectures and numerous parameters
Network architecture
Design choices

bottleneck：
下采样的bottleneck：
主线包括三个卷积层，
先是2×2投影做降采样;
然后是卷积(有三种可能，Conv普通卷积,asymmetric分解卷积，Dilated空洞卷积)
后面再接一个1×1的做升维注意每个卷积层后均接Batch
Norm和PReLU。
辅线包括最大池化和Padding层
最大池化负责提取上下文信息
Padding负责填充通道，达到后续的残差融合融合后再接PReLU。
非下采样的bottleneck:
主线包括三个卷积层，
先是1×1投影;
然后是卷积(有三种可能，Conv普通卷积,asymmetric分解卷积，Dilated空洞卷积)
后面再接一个1×1的做升维注意每个卷积层后均接Batch
Norm和PReLU。
辅线直接恒等映射(只有下采样才会增加通道数，故这里不需要padding层)
融合后再接PReLU。

Feature map resolution:下采样做成空间信息丢失，会对边缘信息造成影响。要求输入和输出要有相同的分辨率。强有力的下采样也需要强有力的上采样。FCN中通过skip connections；segnet通过池化索引。本文将输入先压缩，只输入小的特征图给网络结构，去除一部分图片的视觉冗余内容。但下采样的好处在于可以获取更大的感受野，获取更多的上下文信息，便于分类。
FCN的解决办法是将encoder阶段的feature map塞给decoder，增加空间信息。
SegNet的解决办法是将encoder阶段做downsampling的indices保留到decoder阶段做upsampling使用。
ENet采用的是SegNet的方法，这可以减少内存需求。同时为了增加更好的上下文信息，使用dilated conv(空洞卷积)扩大上下文信息。

后处理的模块：CRF以及RNN都可以用来提高准确率。
stage

ENet模型大致分为5个stage：
- initial：初始化模块
左边是做3×3/str=2的卷积，右边是做MaxPooling，将两边结果concat一起，做通道合并，这样可以上来显著减少存储空间。
- stage1：encoder阶段。包括5个bottleneck，第一个bottleneck做下采样，后面4个重复的bottleneck
- stage2-3：encoder阶段。stage2的bottleneck2.0做了下采样，后面有时加空洞卷积，或分解卷积。stage3没有下采样，其他都一样。
- stage4-5：属于decoder阶段。比较简单。一个上采样配置两个普通的bottleneck。
Network architecture

Early downsampling:早期处理高分辨率的输入会耗费大量计算资源，ENet的初始化模型会大大减少输入的大小。这是考虑到视觉信息在空间上是高度冗余的，可以压缩成更有效的表示方式。

Decoder size：size 相比于SegNet中encoder和decoder的镜像对称，ENet的Encoder和Decoder不对称，由一个较大的Encoder和一个较小的Decoder组成。

Nonlinear operations：relu并不一定有用；PRELU。非线性激活函数relu的应用降低了精度，分析的原因是网络层数太少了，不够深，不能很快的过滤信息，采用prelu。

Factorizing filters:将n×n的卷积核拆为n×1和1×n(InceptionV3提出的)。可以有效的减少参数量，并提高模型感受野。

Dilated convolutions：空洞卷积，精度蹭蹭蹭蹭上升。空洞卷积Dilated convolutions可以有效的提高感受野。有效的使用Dilated convolutions提高了4%的IoU，使用Dilated convolutions是交叉使用，而非连续使用。

Regularization：因为数据集本身不大，很快会过拟合。使用L2效果不佳，使用stochastic depth还可以，但琢磨了一下stochastic depth就是Spatial Dropout的特例，故最后选择Spatial Dropout，效果相对好一点

Regularization
Results
Conclutison

参考链接 : https://github.com/srihari-humbarwadi/ENet-A-Deep-Neural-Network-Architecture-for-Real-Time-Semantic-Segmentation/blob/master/batch_training.py

在这里插入图片描述

代码

def initial_block(tensor):
    conv = Conv2D(filters=13,kernel_size=(3,3),strides=(2,2),padding="same",name="initial_block_conv",kernel_initializer="he_normal")(tensor)
    pool = MaxPooling2D(pool_size=(2,2),name="initial_blokc_pool")(tensor)
    concat = concatenate([conv,pool],axis=-1,name="initial_block_concat")
    return concat

def bottleneck_encoder(tensor, nfilters, downsampling=False, dilated=False, asymmetric=False, normal=False, drate=0.1,
                       name=''):
    y = tensor
    skip = tensor
    stride = 1
    ksize = 1
    if downsampling:
        stride = 2
        ksize = 2
        skip = MaxPooling2D(pool_size=(2, 2), name=f'max_pool_{name}')(skip)
        skip = Permute((1, 3, 2), name=f'permute_1_{name}')(skip)  # (B, H, W, C) -> (B, H, C, W)
        ch_pad = nfilters - K.int_shape(tensor)[-1]
        skip = ZeroPadding2D(padding=((0, 0), (0, ch_pad)), name=f'zeropadding_{name}')(skip)
        skip = Permute((1, 3, 2), name=f'permute_2_{name}')(skip)  # (B, H, C, W) -> (B, H, W, C)

    y = Conv2D(filters=nfilters // 4, kernel_size=(ksize, ksize), kernel_initializer='he_normal',
               strides=(stride, stride), padding='same', use_bias=False, name=f'1x1_conv_{name}')(y)
    y = BatchNormalization(momentum=0.1, name=f'bn_1x1_{name}')(y)
    y = PReLU(shared_axes=[1, 2], name=f'prelu_1x1_{name}')(y)

    if normal:
        y = Conv2D(filters=nfilters // 4, kernel_size=(3, 3), kernel_initializer='he_normal', padding='same',
                   name=f'3x3_conv_{name}')(y)
    elif asymmetric:
        y = Conv2D(filters=nfilters // 4, kernel_size=(5, 1), kernel_initializer='he_normal', padding='same',
                   use_bias=False, name=f'5x1_conv_{name}')(y)
        y = Conv2D(filters=nfilters // 4, kernel_size=(1, 5), kernel_initializer='he_normal', padding='same',
                   name=f'1x5_conv_{name}')(y)
    elif dilated:
        y = Conv2D(filters=nfilters // 4, kernel_size=(3, 3), kernel_initializer='he_normal',
                   dilation_rate=(dilated, dilated), padding='same', name=f'dilated_conv_{name}')(y)
    y = BatchNormalization(momentum=0.1, name=f'bn_main_{name}')(y)
    y = PReLU(shared_axes=[1, 2], name=f'prelu_{name}')(y)

    y = Conv2D(filters=nfilters, kernel_size=(1, 1), kernel_initializer='he_normal', use_bias=False,
               name=f'final_1x1_{name}')(y)
    y = BatchNormalization(momentum=0.1, name=f'bn_final_{name}')(y)
    y = SpatialDropout2D(rate=drate, name=f'spatial_dropout_final_{name}')(y)

    y = Add(name=f'add_{name}')([y, skip])
    y = PReLU(shared_axes=[1, 2], name=f'prelu_out_{name}')(y)

    return y

def bottleneck_decoder(tensor, nfilters, upsampling=False, normal=False, name=''):
    y = tensor
    skip = tensor
    if upsampling:
        skip = Conv2D(filters=nfilters, kernel_size=(1, 1), kernel_initializer='he_normal', strides=(1, 1),
                      padding='same', use_bias=False, name=f'1x1_conv_skip_{name}')(skip)
        skip = UpSampling2D(size=(2, 2), name=f'upsample_skip_{name}')(skip)

    y = Conv2D(filters=nfilters // 4, kernel_size=(1, 1), kernel_initializer='he_normal', strides=(1, 1),
               padding='same', use_bias=False, name=f'1x1_conv_{name}')(y)
    y = BatchNormalization(momentum=0.1, name=f'bn_1x1_{name}')(y)
    y = PReLU(shared_axes=[1, 2], name=f'prelu_1x1_{name}')(y)

    if upsampling:
        y = Conv2DTranspose(filters=nfilters // 4, kernel_size=(3, 3), kernel_initializer='he_normal', strides=(2, 2),
                            padding='same', name=f'3x3_deconv_{name}')(y)
    elif normal:
        Conv2D(filters=nfilters // 4, kernel_size=(3, 3), strides=(1, 1), kernel_initializer='he_normal',
               padding='same', name=f'3x3_conv_{name}')(y)
    y = BatchNormalization(momentum=0.1, name=f'bn_main_{name}')(y)
    y = PReLU(shared_axes=[1, 2], name=f'prelu_{name}')(y)

    y = Conv2D(filters=nfilters, kernel_size=(1, 1), kernel_initializer='he_normal', use_bias=False,
               name=f'final_1x1_{name}')(y)
    y = BatchNormalization(momentum=0.1, name=f'bn_final_{name}')(y)

    y = Add(name=f'add_{name}')([y, skip])
    y = ReLU(name=f'relu_out_{name}')(y)

    return y

def ENET(input_shape=(None, None, 3), nclasses=11):
    print('. . . . .Building ENet. . . . .')
    img_input = Input(input_shape)

    x = initial_block(img_input)

    x = bottleneck_encoder(x, 64, downsampling=True, normal=True, name='1.0', drate=0.01)
    for _ in range(1, 5):
        x = bottleneck_encoder(x, 64, normal=True, name=f'1.{_}', drate=0.01)

    x = bottleneck_encoder(x, 128, downsampling=True, normal=True, name=f'2.0')
    x = bottleneck_encoder(x, 128, normal=True, name=f'2.1')
    x = bottleneck_encoder(x, 128, dilated=2, name=f'2.2')
    x = bottleneck_encoder(x, 128, asymmetric=True, name=f'2.3')
    x = bottleneck_encoder(x, 128, dilated=4, name=f'2.4')
    x = bottleneck_encoder(x, 128, normal=True, name=f'2.5')
    x = bottleneck_encoder(x, 128, dilated=8, name=f'2.6')
    x = bottleneck_encoder(x, 128, asymmetric=True, name=f'2.7')
    x = bottleneck_encoder(x, 128, dilated=16, name=f'2.8')

    x = bottleneck_encoder(x, 128, normal=True, name=f'3.0')
    x = bottleneck_encoder(x, 128, dilated=2, name=f'3.1')
    x = bottleneck_encoder(x, 128, asymmetric=True, name=f'3.2')
    x = bottleneck_encoder(x, 128, dilated=4, name=f'3.3')
    x = bottleneck_encoder(x, 128, normal=True, name=f'3.4')
    x = bottleneck_encoder(x, 128, dilated=8, name=f'3.5')
    x = bottleneck_encoder(x, 128, asymmetric=True, name=f'3.6')
    x = bottleneck_encoder(x, 128, dilated=16, name=f'3.7')

    x = bottleneck_decoder(x, 64, upsampling=True, name='4.0')
    x = bottleneck_decoder(x, 64, normal=True, name='4.1')
    x = bottleneck_decoder(x, 64, normal=True, name='4.2')

    x = bottleneck_decoder(x, 16, upsampling=True, name='5.0')
    x = bottleneck_decoder(x, 16, normal=True, name='5.1')

    img_output = Conv2DTranspose(nclasses, kernel_size=(2, 2), strides=(2, 2), kernel_initializer='he_normal',
                                 padding='same', name='image_output')(x)
    img_output = Activation('softmax')(img_output)

    model = Model(inputs=img_input, outputs=img_output, name='ENET')
    print('. . . . .Build Compeleted. . . . .')
    return model

其中的loss值定义

def dice_coeff(y_true, y_pred):
    smooth = 1.
    y_true_f = K.flatten(y_true)
    y_pred_f = K.flatten(y_pred)
    intersection = K.sum(y_true_f * y_pred_f)
    score = (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
    return score

def dice_loss(y_true, y_pred):
    loss = 1 - dice_coeff(y_true, y_pred)
    return loss

def total_loss(y_true, y_pred):
    loss = binary_crossentropy(y_true, y_pred) + (3*dice_loss(y_true, y_pred))
    return loss

标签：kernel,分割,name,normal,--,语义,encoder,bottleneck,True
来源： https://blog.csdn.net/Mind_programmonkey/article/details/120843946

本站声明： 1. iCode9 技术分享网（下文简称本站）提供的所有内容，仅供技术学习、探讨和分享；
2. 关于本站的所有留言、评论、转载及引用，纯属内容发起人的个人观点，与本站观点和立场无关；
3. 关于本站的所有言论和文字，纯属内容发起人的个人观点，与本站观点和立场无关；
4. 本站文章均是网友提供，不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属；如您发现该文章侵犯了您的权益，可联系我们第一时间进行删除；
5. 本站为非盈利性的个人网站，所有内容不会用来进行牟利，也不会利用任何形式的广告来间接获益，纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

ICode9

【语义分割专题】语义分割相关工作--ENet网络相关工作

ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation