PyTorch官方Tutorials Autograd

2021-06-30 20:30:09 阅读：271 来源： 互联网

标签：tensor Autograd torch requires PyTorch graph Tutorials backward grad

PyTorch官方Tutorials

跟着PyTorch官方Tutorials码的，便于理解自己稍有改动代码并添加注释，IDE用的jupyter notebook

AUTOMATIC DIFFERENTIATION WITH `TORCH.AUTOGRAD`

用torch.autograd自动微分

When training neural networks, the most frequently used algorithm is back propagation. In this algorithm, parameters (model weights) are adjusted according to the gradient of the loss function with respect to the given parameter.

训练神经网络的时候最常使用的算法是后向传播算法在这个算法中参数(模型权重)根据由给定参数求得的损失函数的梯度调整

To compute those gradients, PyTorch has a built-in differentiation engine called torch.autograd. It supports automatic computation of gradient for any computational graph.

为了计算梯度 pytorch提供内置的微分引擎torch.grad 它支持任何计算图的自动求导

Consider the simplest one-layer neural network, with input x, parameters w and b, and some loss function. It can be defined in PyTorch in the following manner:

考虑最简单的单层神经网络有输入x 参数w和b 和一些损失函数在pytorch中它可以定义如下:

import torch

x = torch.ones(5)
y = torch.zeros(3)
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

Tensors, Functions and Computational graph

This code defines the following computational graph:

上述代码定义了下面的计算图

In this network, w and b are parameters, which we need to optimize. Thus, we need to be able to compute the gradients of loss function with respect to those variables. In order to do that, we set the requires_grad property of those tensors.

在这个网络中 w和b是需要优化的参数因此需要能够计算这些变量相关的损失函数的梯度为了计算梯度设置tensor的requres_grad参数

You can set the value of requires_grad when creating a tensor, or later by using x.requires_grad_(True) method.

可以在创建tensor时设置requires_grad参数也可以在创建后使用x.requires_grad(True)方法

A function that we apply to tensors to construct computational graph is in fact an object of class Function. This object knows how to compute the function in the forward direction, and also how to compute its derivative during the backward propagation step. A reference to the backward propagation function is stored in grad_fn property of a tensor. You can find more information of Function in the documentation.

用在tensor上的构建计算图的函数实际上是Function类的一个对象这个对象知道如何在前向计算中计算函数也知道如何在后向中计算微分对反向传播函数的一个应用存储在tensor的grad_fn属性中可以在文档中找到更多这个函数的信息

print("Gradient function for z=",z.grad_fn)
print("Gradient function for loss=",loss.grad_fn)

Gradient function for z= <AddBackward0 object at 0x0000021D3BB0B5C8>
Gradient function for loss= <BinaryCrossEntropyWithLogitsBackward object at 0x0000021D3BB0B588>

Computing Gradients

To optimize weights of parameters in the neural network, we need to compute the derivatives of our loss function with respect to parameters, namely, we need[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-RzQyYWv8-1625055243716)(attachment:image-2.png)]and [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Jyuo0e4w-1625055243718)(attachment:image-3.png)]under some fixed values of x and y. To compute those derivatives, we call loss.backward(), and then retrieve the values from w.grad and b.grad:

为了优化神经网络中的权重需要计算和参数有关的损失函数的微分即这两个超大号公式调用loss.backward() 然后从w.grad和b.grad取得结果

loss.backward()
print(w.grad)
print(b.grad)

tensor([[0.0675, 0.1460, 0.0066],
        [0.0675, 0.1460, 0.0066],
        [0.0675, 0.1460, 0.0066],
        [0.0675, 0.1460, 0.0066],
        [0.0675, 0.1460, 0.0066]])
tensor([0.0675, 0.1460, 0.0066])

We can only obtain the grad properties for the leaf nodes of the computational graph, which have requires_grad property set to True. For all other nodes in our graph, gradients will not be available.
We can only perform gradient calculations using backward once on a given graph, for performance reasons. If we need to do several backward calls on the same graph, we need to pass retain_graph=True to the backward call.

只能获得计算图的叶节点的的梯度属性它们的requires_grad属性设置为了True 对于其他节点梯度无法取得

计算图的backward方法出于性能原因只能使用一次如果想要在一张图上做多次backward需要将retain_graph=True传入backward调用

Disabling Gradient Tracking

关闭自动梯度追踪

By default, all tensors with requires_grad=True are tracking their computational history and support gradient computation. However, there are some cases when we do not need to do that, for example, when we have trained the model and just want to apply it to some input data, i.e. we only want to do forward computations through the network. We can stop tracking computations by surrounding our computation code with torch.no_grad() block:

默认情况下所有requires_grad=True的tensor都会记录计算历史来支持梯度计算但是有些情况下我们不需要那样做比如当我们训练模型而且知识想将它用于一些输入数据因此我们只需要网络的求安详静思园可以将计算代码放在torch.no_grad()下

z=torch.matmul(x,w)+b
print(z.requires_grad)
#因为w,b的requires_grad=True

with torch.no_grad():
    z=torch.matmul(x,w)+b
print(z.requires_grad)

True
False

Another way to achieve the same result is to use the detach() method on the tensor:

在tensor中使用detach()方法达到相同结果

#z_det和z共用数据 但是z_det的requires_grad=False
z=torch.matmul(x,w)+b
z_det=z.detach()
print(z_det.requires_grad)

False

There are reasons you might want to disable gradient tracking:

可能有时候你会想要关闭梯度追踪

To mark some parameters in your neural network at frozen parameters. This is a very common scenario for finetuning a pretrained network

将神经网络中的某些参数标记为冻结参数。这是微调预训练网络的一个非常常见的场景

To speed up computations when you are only doing forward pass, because computations on tensors that do not track gradients would be more efficient.

当你只做前向运算时加速计算因为不追踪梯度时 tensor的计算会更快

More on Computational Graphs

更多关于计算图

Conceptually, autograd keeps a record of data (tensors) and all executed operations (along with the resulting new tensors) in a directed acyclic graph (DAG) consisting of Function objects. In this DAG, leaves are the input tensors, roots are the output tensors. By tracing this graph from roots to leaves, you can automatically compute the gradients using the chain rule.

从概念上讲 autograd保留了一份tensor数据和所有执行过的操作(和生成的新tensor)在一个由Function对象组成的有向无环图中在这个有向无环图中叶节点是输入tensor 根节点是输出tensor 通过从根到叶的跟踪可以使用链式法则自动计算梯度

In a forward pass, autograd does two things simultaneously:

前向传播中 autograd自动做了两件事:

run the requested operation to compute a resulting tensor

运行计算结果tensor需要的操作

maintain the operation’s gradient function in the DAG.

在有向无环图中维护操作的梯度函数

The backward pass kicks off when .backward() is called on the DAG root. autograd then:

当.backward()在根节点被调用时后向传播就开始了然后

computes the gradients from each .grad_fn,

从每个grad_fn计算梯度

accumulates them in the respective tensor’s .grad attribute

将他们基类在tensor的.grad属性中

using the chain rule, propagates all the way to the leaf tensors.

利用链式法则一直传播到叶节点tensor

DAGs are dynamic in PyTorch An important thing to note is that the graph is recreated from scratch; after each .backward() call, autograd starts populating a new graph. This is exactly what allows you to use control flow statements in your model; you can change the shape, size and operations at every iteration if needed.

在pytorch中有向无环图DAGs是动态的需要注意的重要一点是该图是从头开始创建的在每次.backward()调用后 autograd开始填充?新图这正是为什么可以在模型中使用控制流语句在每次迭代中都可以改变shape size 和 operation

Optional Reading: Tensor Gradients and Jacobian Products

选读: tensor梯度雅可比积

In many cases, we have a scalar loss function, and we need to compute the gradient with respect to some parameters. However, there are cases when the output function is an arbitrary tensor. In this case, PyTorch allows you to compute so-called Jacobian product, and not the actual gradient.

在很多情况下有一个标量损失函数需要计算一些参数的梯度但是有些情况下输出函数是任意tensor 在这种情况下pytorch允许计算所谓的雅可比积而不是实际的梯度

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-pkt2Q7EY-1625055243720)(attachment:image.png)]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-erHkWG6d-1625055243722)(attachment:image-2.png)]

#torch.eye(n，m=None，out=None)
#这个函数主要是为了生成对角线全1，其余部分全0的二维数组
inp=torch.eye(5,requires_grad=True)
print("inp=",inp)
out=(inp+1).pow(2)
print("\nout=",out)
out.backward(torch.ones_like(inp),retain_graph=True)
print("\nFirst call\n",inp.grad)
out.backward(torch.ones_like(inp),retain_graph=True)
print("\nSecond call\n",inp.grad)
#梯度清零
inp.grad.zero_()
out.backward(torch.ones_like(inp),retain_graph=True)
print("\nCall after zeroing gradients\n",inp.grad)

inp= tensor([[1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.]], requires_grad=True)

out= tensor([[4., 1., 1., 1., 1.],
        [1., 4., 1., 1., 1.],
        [1., 1., 4., 1., 1.],
        [1., 1., 1., 4., 1.],
        [1., 1., 1., 1., 4.]], grad_fn=<PowBackward0>)

First call
 tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.],
        [2., 2., 2., 2., 4.]])

Second call
 tensor([[8., 4., 4., 4., 4.],
        [4., 8., 4., 4., 4.],
        [4., 4., 8., 4., 4.],
        [4., 4., 4., 8., 4.],
        [4., 4., 4., 4., 8.]])

Call after zeroing gradients
 tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.],
        [2., 2., 2., 2., 4.]])

Notice that when we call backward for the second time with the same argument, the value of the gradient is different. This happens because when doing backward propagation, PyTorch accumulates the gradients, i.e. the value of computed gradients is added to the grad property of all leaf nodes of computational graph. If you want to compute the proper gradients, you need to zero out the grad property before. In real-life training an optimizer helps us to do this.

注意第二次调用相同参数backward时梯度的值是不同的这是因为在后向传播中 pytorch累加了梯度值因此梯度的计算值被加到了计算图中所有叶节点的grad属性中如果想要计算正确的梯度值需要在之前清理梯度值实际使用中训练一个optimizer帮助我们做了这个工作

Previously we were calling backward() function without parameters. This is essentially equivalent to calling backward(torch.tensor(1.0)), which is a useful way to compute the gradients in case of a scalar-valued function, such as loss during neural network training.

之前我们调用backward()函数参数为空这实际上等同属于使用backward(torch.tensor(1.0)) 这是在标量函数计算梯度时有用的方法比神经网络训练期间的损失函数

翻译好累…但是不翻译看英文记不住太难了

标签：tensor,Autograd,torch,requires,PyTorch,graph,Tutorials,backward,grad
来源： https://blog.csdn.net/qq_43209726/article/details/118368642

本站声明： 1. iCode9 技术分享网（下文简称本站）提供的所有内容，仅供技术学习、探讨和分享；
2. 关于本站的所有留言、评论、转载及引用，纯属内容发起人的个人观点，与本站观点和立场无关；
3. 关于本站的所有言论和文字，纯属内容发起人的个人观点，与本站观点和立场无关；
4. 本站文章均是网友提供，不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属；如您发现该文章侵犯了您的权益，可联系我们第一时间进行删除；
5. 本站为非盈利性的个人网站，所有内容不会用来进行牟利，也不会利用任何形式的广告来间接获益，纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

ICode9