pytorch-retain_grad-vs-requires_grad

2025-03-08 约 484 字预计阅读 1 分钟

https://bing.ee123.net/img/rand?artid=146113524

pytorch retain_grad vs requires_grad

requires_grad大家都挺熟悉的，因此穿插在retain_grad的例子里进行捎带讲解就行。下面看一个代码片段：

import torch

# 创建一个标量 tensor，并开启梯度计算
x = torch.tensor(2.0, requires_grad=True)

# 中间计算：y 依赖于 x，是非叶子节点
y = x * 3

# 继续计算，得到 z
z = y * 4

# 反向传播
z.backward()

# 查看梯度
print("x.grad:", x.grad)  
print("y.grad:", y.grad)  

输出结果为：

x.grad: tensor(12.)
y.grad: None
/tmp/ipykernel_219007/1060175670.py:17: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten/src/ATen/core/TensorBody.h:489.)
  print("y.grad:", y.grad)

警告的大致意思是：访问了非叶子节点的 .grad 属性，但非叶子节点的 .grad 属性并不会在反向传播的过程中被自动保存下来（这是为了节省内存，毕竟我们只需要计算那些手动设置 .requires_grad 为 True 的张量的梯度，并进行梯度更新，对吧？）

因此，我们只需要添加一行代码 y.retain_grad() ，修改后的代码如下：

import torch

# 创建一个标量 tensor，并开启梯度计算
x = torch.tensor(2.0, requires_grad=True)

# 中间计算：y 依赖于 x，是非叶子节点
y = x * 3
y.retain_grad()

# 继续计算，得到 z
z = y * 4

# 反向传播
z.backward()

# 查看梯度
print("x.grad:", x.grad)  
print("y.grad:", y.grad)  

输出结果为：

x.grad: tensor(12.)
y.grad: tensor(4.)

可以看到，现在非叶子节点 y 的梯度也在反向传播以后被正确保存了！

目录

pytorch-retain_grad-vs-requires_grad

pytorch retain_grad vs requires_grad