首页 > 其他分享> 文章详细

【ResNet】残差神经网络

2022-06-09 16:02:53 阅读：177 来源： 互联网

标签：nn 卷积 self ResNet 残差神经网络 block channel out

ResNet网络

论文：Deep Residual Learning for Image Recognition

网络中的亮点：

1 超深的网络结构（突破了1000层）

上图为简单堆叠卷积层和池化层的深层网络在训练和测试集上的表现，可以看到56层的神经网络的效果并没有20层的效果好，造成这种结果的原因可能是：

1.梯度消失或梯度爆炸

假设每一层的误差梯度是一个小于1的数，误差反向传播时，每向前传播一层就需要乘上一个小于1的误差梯度，网络深度越深，梯度就越趋近于0导致梯度消失，同理也会出现梯度爆炸的情况

2.退化问题(degradation problem)

2 提出residual模块

左边残差结构主要用于层数较少的ResNet-34，右边用于层数较多的RestNet-50/101/152.以左面为例，深度为256的输入数据经过256个3x3的卷积核卷积后，使用relu激活函数激活，再经过256个3x3的卷积核卷积后与原输入数据相加（主分支与shortcut捷径的输出特征矩阵shape，即高度宽度以及深度必须相同），最后再进行relu激活。

这里的相加不同于GoogleNet的cat连接操作

右图1x1的卷积核作用是降维和升维

图一

以34层网络为例，将残差结构分为了4部分，每部分包括一定数量的残差结构。如第一部分有三个残差结构，每个残差结构对应两个卷积层，可对应下图的紫色部分。

图二

实线与虚线残差结构

由图二可以观察到，34层的残差神经网络的conv_3、conv_4、conv_5层的第一层残差结构都是虚线。

低层的残差结构

实线残差结构与虚线残差结构的不同：

实线残差结构的输入和输出shape相同，主干线和捷径可以直接相加，虚线残差结构不同，如图三的conv_3输入为56x56x64，输出要求为28x28x128，因此需要步距为2、1x1的128个卷积核进行卷积操作，保证主分支和捷径的shape相同，进行相加操作

(56 - 1) / 2 + 1 = 28

高层残差结构

原论文中的虚线残差结构第一个1x1的卷积层步距为2，第二个3x3的卷积层的步距是1.但在pytorch的官方文档中如上图第一个1x1的卷积层步距为1，第二个3x3的卷积层的步距是2，可以再imageNet的top1上提升0.5%

3 使用Batch Normalization BN层加速训练（丢弃dropout）

目的：使一批(batch)而不是某一张图像的feature map，使其满足均值为0，方差为1的分布

对于一个拥有d维的输入x，对它的每一个维度进行标准化处理，假设输入的图像使rgb三通道的颜色图像，那么d对应的就是channel=3，x=(x1, x2, x3), x1,x2,x3分别代表三个通道的特征矩阵。标准化处理就是分别对R、G、B三个通道进行处理。

迁移学习

简介

使用别人预训练模型的参数训练自己较小的数据集（数据集较小不足以训练模型）

优势：

1.能够快速训练出一个理想的结果

2.当数据集较小时也能训练出理想的结果

![](file:///C:/Users/admin/AppData/Roaming/marktext/images/2022-06-08-14-09-10-image.png?msec=1654757642886)

浅层的网络卷积层能够识别一些特定的信息，随着网络的不断加深，网络能够学习到的信息越来越复杂、抽象，以至于能够识别眼睛、鼻子、嘴巴等，最后通过全连接层把一系列特征进行组合输出所对应的类别的概率。

这些浅层的卷积结构是通用的，即在其他网络中也适用，可以将其训练参数迁移到其他网络中。

常见的迁移学习方式

1.载入权重之后训练所有的参数

2.载入权重之后只训练最后几层全连接层参数

3.载入权重之后在原网络基础上再添加一层全连接层，仅训练最后一个全连接层

![](file:///C:/Users/admin/AppData/Roaming/marktext/images/2022-06-08-14-19-43-image.png?msec=1654757642882)

前两种方法需要修改最后的全连接层的输出，与类别对应

ResNext网络结构

组卷积Group Convolution

![](file:///C:/Users/admin/AppData/Roaming/marktext/images/2022-06-08-14-31-42-image.png?msec=1654757642886)

分组进行卷积后再进行concat拼接

参数对比（卷积核大小为k）：

普通卷积：k*k*Cin*n

组卷积：(k*k*Cin/g*n/g)*g = k*k*Cin*n/g

![](file:///C:/Users/admin/AppData/Roaming/marktext/images/2022-06-08-14-54-16-image.png?msec=1654757642885)

ResNeXt-50(32x4d):32代表组卷积的group数，4d表示组卷积卷积核的个数

代码

model.py

import torch
import torch.nn as nn


# 浅层18 34层网络
class BasicBlock(nn.Module):
    # 定义前几层卷积层与最后一层卷积层卷积核个数的倍数关系
    expansion = 1

    def __init__(self, in_channel, out_channel, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        # output = (input - 3 + 2 * 1) / 1 + 1 = input
        self.conv1 = nn.Conv2d(in_channels=in_channel,
                               out_channels=out_channel,
                               kernel_size=3,
                               stride=stride,
                               padding=1,
                               bias=False)
        # 定义batchnorm归一化featureMap加速训练
        self.bn1 = nn.BatchNorm2d(out_channel)
        self.relu = nn.ReLU()
        self.conv2 = nn.Conv2d(in_channels=out_channel,
                               out_channels=out_channel,
                               kernel_size=3,
                               padding=1,
                               bias=False)
        self.bn2 = nn.BatchNorm2d(out_channel)
        # 定义下采样用于虚线残差结构
        self.downsample = downsample

    def forward(self, x):
        # 分支
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)
        # 主干
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        out += identity
        out = self.relu(out)

        return out


class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, in_channel, out_channel, stride=1, downsample=None):
        super.__init__(Bottleneck, self)
        # output =
        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel,
                               kernel_size=1,
                               stride=1,
                               bias=False)
        self.bn1 = nn.BatchNorm2d(out_channel)
        self.relu = nn.ReLU()

        self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel,
                               kernel_size=3,
                               stride=stride,
                               bias=False,
                               padding=1)
        self.bn2 = nn.BatchNorm2d(out_channel)

        self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel * self.expansion,
                               kernel_size=1,
                               stride=1,
                               bias=False)
        self.bn2 = nn.BatchNorm2d(out_channel)
        self.downsample = downsample


class ResNet(nn.Module):

    def __init__(self, block, block_num, num_classes=1000, include_top=True):
        super(ResNet, self).__init__()
        self.include_top = include_top
        self.in_channel = 64
        # 输入为224x224，输出为112 故padding为3 才能使 (input - 7 + 2 * padding) / 2 + 1 = input / 2
        self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(self.in_channel)
        # inplace为True，将会改变输入的数据 ，否则不会改变原输入，只会产生新的输出
        self.relu = nn.ReLU(inplace=True)
        # (112-3+2*1)/2+1=56
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64,  block_num[0])
        self.layer2 = self._make_layer(block, 128,  block_num[1], stride=2)
        self.layer3 = self._make_layer(block, 256,  block_num[2], stride=2)
        self.layer4 = self._make_layer(block, 512,  block_num[3], stride=2)

        if self.include_top:
            # 定义自适应平均池化下采样层，无论输入是什么形状输出都为1 x 1
            self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
            self.fc = nn.Linear(512 * block.expansion, num_classes)
        # 初始化网络权重参数
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

    # block表明是BasicBlock还是Bottleneck
    # channel为输入特征矩阵深度
    # block_num为层结构的残差结构数
    def _make_layer(self, block, channel, block_num, stride=1):
        downsample = None
        # 根据ResNet的网络结构，除了18 和 34的第一层layer输入深度和输出深度相同，其他情况道德第一层卷积层都需要使用虚线残差网络结构
        if stride != 1 or self.in_channel != block.expansion * channel:
            downsample = nn.Sequential(
                # (input-1)/1+1=input
                nn.Conv2d(self.in_channel, block.expansion * channel, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(block.expansion * channel)
            )
        layers = []
        layers.append(block(self.in_channel, channel, stride=stride, downsample=downsample))
        self.in_channel = channel * block.expansion
        for _ in range(1, block_num):
            layers.append(block(self.in_channel, channel))
        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        if self.include_top:
            x = self.avgpool(x)
            x = torch.flatten(x, 1)
            x = self.fc(x)

        return x


def ResNet34(num_classes=1000, include_top=True):
    return ResNet(BasicBlock, [3, 4, 6, 3], num_classes=num_classes, include_top=include_top)

train.py

import json
import os
import matplotlib.pyplot as plt
import torch
from torch import nn, optim
from torchvision import datasets
from torchvision.transforms import transforms
import torch.utils.data

from model import ResNet34

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
# print(device)

data_transform = {
    'train': transforms.Compose([transforms.RandomResizedCrop(224),
                                 transforms.RandomHorizontalFlip(),
                                 transforms.ToTensor(),
                                 transforms.Normalize([.485, .456, .406], [.229, .224, .225])]),
    'val': transforms.Compose([transforms.Resize(256),
                               transforms.CenterCrop(224),
                               transforms.ToTensor(),
                               transforms.Normalize([.485, .456, .406], [.229, .224, .225])])
}

data_root = os.path.abspath(os.getcwd())
image_path = os.path.join(data_root, "data", "flower_data")

batch_size = 16
train_dataset = datasets.ImageFolder(root=image_path + r"/train",
                                     transform=data_transform['train'])
train_num = len(train_dataset)
train_loader = torch.utils.data.DataLoader(train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True,
                                           num_workers=0)
train_steps = len(train_loader)
# transforms处理后的图像展示
# image,label = train_dataset.__getitem__(1001)
# toPIL = transforms.ToPILImage()
# image = toPIL(image)
# plt.imshow(image)
# plt.show()

flower_dict = train_dataset.class_to_idx
cla_dict = dict((val, key) for (key, val) in flower_dict.items())
# indent:参数根据数据格式缩进显示，读起来更加清晰。
json_str = json.dumps(cla_dict, indent=4)
with open('class_indices.json', 'w') as json_file:
    json_file.write(json_str)

val_dataset = datasets.ImageFolder(root=image_path + r'/val',
                                   transform=data_transform['val'])
val_num = len(val_dataset)
val_loader = torch.utils.data.DataLoader(val_dataset,
                                         batch_size=batch_size,
                                         shuffle=True,
                                         num_workers=0)

net = ResNet34(num_classes=5)
net.to(device)
# 定义交叉熵损失函数
loss_function = nn.CrossEntropyLoss()

params = [p for p in net.parameters() if p.requires_grad]
optimizer = optim.Adam(params, lr=.0001)

best_acc = .0
save_path = './resNet34.pth'
for epoch in range(3):
    net.train()
    running_loss = .0
    for step, data in enumerate(train_loader, start=0):
        images, labels = data
        optimizer.zero_grad()
        logits = net(images.to(device))
        loss = loss_function(logits, labels.to(device))
        # 误差反向传播
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    # validate
    net.eval()
    acc = .0
    with torch.no_grad():
        for step, data in enumerate(val_loader, start=0):
            images, labels = data
            outputs = net(images.to(device))

            predict_y = torch.max(outputs, dim=1)[1]
            acc += torch.eq(predict_y, labels.to(device)).sum().item()
    val_acc = acc / val_num
    print('[epoch %d] train loss: %.3f val_acc: %.3f' %
          (epoch + 1, running_loss / train_steps, val_acc ))
    if val_acc > best_acc:
        best_acc = val_acc
        torch.save(net.state_dict(), save_path)

predict.py

import json

import torch
from torchvision import transforms
from PIL import Image

from model import ResNet34

data_transform = transforms.Compose([transforms.Resize(256),
                               transforms.CenterCrop(224),
                               transforms.ToTensor(),
                               transforms.Normalize([.485, .456, .406], [.229, .224, .225])])

img = Image.open('./test.png')
img = data_transform(img)
img = torch.unsqueeze(img, dim=0)

json_file = open('./class_indices.json')
class_indict = json.load(json_file)

model = ResNet34(num_classes=5)
model_weight_path = './resNet34.pth'
model.load_state_dict(torch.load(model_weight_path))
model.eval()

with torch.no_grad():
    output = torch.squeeze(model(img))
    predict = torch.softmax(output, dim=0)
    predict_cla = torch.argmax(predict).numpy()

print(class_indict[str(predict_cla)], predict[predict_cla].numpy())

标签：nn,卷积,self,ResNet,残差,神经网络,block,channel,out
来源： https://www.cnblogs.com/tod4/p/16359628.html

本站声明： 1. iCode9 技术分享网（下文简称本站）提供的所有内容，仅供技术学习、探讨和分享；
2. 关于本站的所有留言、评论、转载及引用，纯属内容发起人的个人观点，与本站观点和立场无关；
3. 关于本站的所有言论和文字，纯属内容发起人的个人观点，与本站观点和立场无关；
4. 本站文章均是网友提供，不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属；如您发现该文章侵犯了您的权益，可联系我们第一时间进行删除；
5. 本站为非盈利性的个人网站，所有内容不会用来进行牟利，也不会利用任何形式的广告来间接获益，纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

ICode9