ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

PyTorch官方Tutorials Autograd

2021-06-30 20:30:09  阅读:271  来源: 互联网

标签:tensor Autograd torch requires PyTorch graph Tutorials backward grad


PyTorch官方Tutorials

跟着PyTorch官方Tutorials码的,便于理解自己稍有改动代码并添加注释,IDE用的jupyter notebook

链接: Autograd

AUTOMATIC DIFFERENTIATION WITH TORCH.AUTOGRAD

用torch.autograd自动微分

When training neural networks, the most frequently used algorithm is back propagation. In this algorithm, parameters (model weights) are adjusted according to the gradient of the loss function with respect to the given parameter.

训练神经网络的时候 最常使用的算法是后向传播算法 在这个算法中 参数(模型权重)根据由给定参数求得的损失函数的梯度调整

To compute those gradients, PyTorch has a built-in differentiation engine called torch.autograd. It supports automatic computation of gradient for any computational graph.

为了计算梯度 pytorch提供内置的微分引擎torch.grad 它支持任何计算图的自动求导

Consider the simplest one-layer neural network, with input x, parameters w and b, and some loss function. It can be defined in PyTorch in the following manner:

考虑最简单的单层神经网络 有输入x 参数w和b 和一些损失函数 在pytorch中它可以定义如下:

import torch

x = torch.ones(5)
y = torch.zeros(3)
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

Tensors, Functions and Computational graph

This code defines the following computational graph:

上述代码定义了下面的计算图

In this network, w and b are parameters, which we need to optimize. Thus, we need to be able to compute the gradients of loss function with respect to those variables. In order to do that, we set the requires_grad property of those tensors.

在这个网络中 w和b是需要优化的参数 因此需要能够计算这些变量相关的损失函数的梯度 为了计算梯度 设置tensor的requres_grad参数


You can set the value of requires_grad when creating a tensor, or later by using x.requires_grad_(True) method.

可以在创建tensor时设置requires_grad参数 也可以在创建后使用x.requires_grad(True)方法


A function that we apply to tensors to construct computational graph is in fact an object of class Function. This object knows how to compute the function in the forward direction, and also how to compute its derivative during the backward propagation step. A reference to the backward propagation function is stored in grad_fn property of a tensor. You can find more information of Function in the documentation.

用在tensor上的构建计算图的函数实际上是Function类的一个对象 这个对象知道如何在前向计算中计算函数 也知道如何在后向中计算微分 对反向传播函数的一个应用存储在tensor的grad_fn属性中 可以在文档中找到更多这个函数的信息

print("Gradient function for z=",z.grad_fn)
print("Gradient function for loss=",loss.grad_fn)
Gradient function for z= <AddBackward0 object at 0x0000021D3BB0B5C8>
Gradient function for loss= <BinaryCrossEntropyWithLogitsBackward object at 0x0000021D3BB0B588>

Computing Gradients

To optimize weights of parameters in the neural network, we need to compute the derivatives of our loss function with respect to parameters, namely, we need[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-RzQyYWv8-1625055243716)(attachment:image-2.png)]and [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Jyuo0e4w-1625055243718)(attachment:image-3.png)]under some fixed values of x and y. To compute those derivatives, we call loss.backward(), and then retrieve the values from w.grad and b.grad:

为了优化神经网络中的权重 需要计算和参数有关的损失函数的微分 即这两个超大号公式 调用loss.backward() 然后从w.grad和b.grad取得结果

loss.backward()
print(w.grad)
print(b.grad)
tensor([[0.0675, 0.1460, 0.0066],
        [0.0675, 0.1460, 0.0066],
        [0.0675, 0.1460, 0.0066],
        [0.0675, 0.1460, 0.0066],
        [0.0675, 0.1460, 0.0066]])
tensor([0.0675, 0.1460, 0.0066])

We can only obtain the grad properties for the leaf nodes of the computational graph, which have requires_grad property set to True. For all other nodes in our graph, gradients will not be available.
We can only perform gradient calculations using backward once on a given graph, for performance reasons. If we need to do several backward calls on the same graph, we need to pass retain_graph=True to the backward call.

只能获得计算图的叶节点的的梯度属性 它们的requires_grad属性设置为了True 对于其他节点 梯度无法取得

计算图的backward方法出于性能原因只能使用一次 如果想要在一张图上做多次backward需要将retain_graph=True传入backward调用


Disabling Gradient Tracking

关闭自动梯度追踪

By default, all tensors with requires_grad=True are tracking their computational history and support gradient computation. However, there are some cases when we do not need to do that, for example, when we have trained the model and just want to apply it to some input data, i.e. we only want to do forward computations through the network. We can stop tracking computations by surrounding our computation code with torch.no_grad() block:

默认情况下 所有requires_grad=True的tensor都会记录计算历史来支持梯度计算 但是 有些情况下我们不需要那样做 比如 当我们训练模型而且知识想将它用于一些输入数据 因此 我们只需要网络的求安详静思园 可以将计算代码放在torch.no_grad()

z=torch.matmul(x,w)+b
print(z.requires_grad)
#因为w,b的requires_grad=True

with torch.no_grad():
    z=torch.matmul(x,w)+b
print(z.requires_grad)
True
False

Another way to achieve the same result is to use the detach() method on the tensor:

在tensor中使用detach()方法达到相同结果

#z_det和z共用数据 但是z_det的requires_grad=False
z=torch.matmul(x,w)+b
z_det=z.detach()
print(z_det.requires_grad)
False

There are reasons you might want to disable gradient tracking:

可能有时候你会想要关闭梯度追踪

To mark some parameters in your neural network at frozen parameters. This is a very common scenario for finetuning a pretrained network

将神经网络中的某些参数标记为冻结参数。这是微调预训练网络的一个非常常见的场景

To speed up computations when you are only doing forward pass, because computations on tensors that do not track gradients would be more efficient.

当你只做前向运算时 加速计算 因为不追踪梯度时 tensor的计算会更快

More on Computational Graphs

更多关于计算图

Conceptually, autograd keeps a record of data (tensors) and all executed operations (along with the resulting new tensors) in a directed acyclic graph (DAG) consisting of Function objects. In this DAG, leaves are the input tensors, roots are the output tensors. By tracing this graph from roots to leaves, you can automatically compute the gradients using the chain rule.

从概念上讲 autograd保留了一份tensor数据和所有执行过的操作(和生成的新tensor)在一个由Function对象组成的有向无环图中 在这个有向无环图中 叶节点是输入tensor 根节点是输出tensor 通过从根到叶的跟踪 可以使用链式法则自动计算梯度

In a forward pass, autograd does two things simultaneously:

前向传播中 autograd自动做了两件事:

  • run the requested operation to compute a resulting tensor

运行计算结果tensor需要的操作

  • maintain the operation’s gradient function in the DAG.

在有向无环图中维护操作的梯度函数

The backward pass kicks off when .backward() is called on the DAG root. autograd then:

当.backward()在根节点被调用时 后向传播就开始了 然后

  • computes the gradients from each .grad_fn,

从每个grad_fn计算梯度

  • accumulates them in the respective tensor’s .grad attribute

将他们基类在tensor的.grad属性中

  • using the chain rule, propagates all the way to the leaf tensors.

利用链式法则 一直传播到叶节点tensor


DAGs are dynamic in PyTorch An important thing to note is that the graph is recreated from scratch; after each .backward() call, autograd starts populating a new graph. This is exactly what allows you to use control flow statements in your model; you can change the shape, size and operations at every iteration if needed.

在pytorch中 有向无环图DAGs是动态的 需要注意的重要一点是 该图是从头开始创建的 在每次.backward()调用后 autograd开始填充?新图 这正是为什么可以在模型中使用控制流语句 在每次迭代中都可以改变shape size 和 operation


Optional Reading: Tensor Gradients and Jacobian Products

选读: tensor梯度 雅可比积

In many cases, we have a scalar loss function, and we need to compute the gradient with respect to some parameters. However, there are cases when the output function is an arbitrary tensor. In this case, PyTorch allows you to compute so-called Jacobian product, and not the actual gradient.

在很多情况下 有一个标量损失函数 需要计算一些参数的梯度 但是有些情况下输出函数是任意tensor 在这种情况下pytorch允许计算所谓的雅可比积 而不是实际的梯度

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-pkt2Q7EY-1625055243720)(attachment:image.png)]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-erHkWG6d-1625055243722)(attachment:image-2.png)]

#torch.eye(n,m=None,out=None)
#这个函数主要是为了生成对角线全1,其余部分全0的二维数组
inp=torch.eye(5,requires_grad=True)
print("inp=",inp)
out=(inp+1).pow(2)
print("\nout=",out)
out.backward(torch.ones_like(inp),retain_graph=True)
print("\nFirst call\n",inp.grad)
out.backward(torch.ones_like(inp),retain_graph=True)
print("\nSecond call\n",inp.grad)
#梯度清零
inp.grad.zero_()
out.backward(torch.ones_like(inp),retain_graph=True)
print("\nCall after zeroing gradients\n",inp.grad)
inp= tensor([[1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.]], requires_grad=True)

out= tensor([[4., 1., 1., 1., 1.],
        [1., 4., 1., 1., 1.],
        [1., 1., 4., 1., 1.],
        [1., 1., 1., 4., 1.],
        [1., 1., 1., 1., 4.]], grad_fn=<PowBackward0>)

First call
 tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.],
        [2., 2., 2., 2., 4.]])

Second call
 tensor([[8., 4., 4., 4., 4.],
        [4., 8., 4., 4., 4.],
        [4., 4., 8., 4., 4.],
        [4., 4., 4., 8., 4.],
        [4., 4., 4., 4., 8.]])

Call after zeroing gradients
 tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.],
        [2., 2., 2., 2., 4.]])

Notice that when we call backward for the second time with the same argument, the value of the gradient is different. This happens because when doing backward propagation, PyTorch accumulates the gradients, i.e. the value of computed gradients is added to the grad property of all leaf nodes of computational graph. If you want to compute the proper gradients, you need to zero out the grad property before. In real-life training an optimizer helps us to do this.

注意第二次调用相同参数backward时 梯度的值是不同的 这是因为在后向传播中 pytorch累加了梯度值 因此梯度的计算值被加到了计算图中所有叶节点的grad属性中 如果想要计算正确的梯度值 需要在之前清理梯度值 实际使用中 训练一个optimizer帮助我们做了这个工作


Previously we were calling backward() function without parameters. This is essentially equivalent to calling backward(torch.tensor(1.0)), which is a useful way to compute the gradients in case of a scalar-valued function, such as loss during neural network training.

之前我们调用backward()函数 参数为空 这实际上等同属于使用backward(torch.tensor(1.0)) 这是在标量函数计算梯度时有用的方法 比神经网络训练期间的损失函数


翻译好累…但是不翻译看英文记不住 太难了

标签:tensor,Autograd,torch,requires,PyTorch,graph,Tutorials,backward,grad
来源: https://blog.csdn.net/qq_43209726/article/details/118368642

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有