梯度计算中常用的矩阵微积分公式
梯度计算中常用的矩阵微积分公式
标量对向量求导的常用数学公式
设标量函数
y
f ( x ) y = f(\boldsymbol{x})
y
=
f
(
x
) ,其中
x
( x 1 , x 2 , ⋯ , x n ) T \boldsymbol{x} = (x_1, x_2, \cdots, x_n)^{\rm T}
x
=
(
x
1
,
x
2
,
⋯
,
x
n
)
T 是一个
n n
n 维列向量。标量
y y
y 对向量
x \boldsymbol{x}
x 的导数为一个
n n
n 维列向量:
∂ y ∂ x
[ ∂ y ∂ x 1 ∂ y ∂ x 2 ⋮ ∂ y ∂ x n ] \frac{\partial y}{\partial \boldsymbol{x}} = \begin{bmatrix} \dfrac{\partial y}{\partial x_1} \ \dfrac{\partial y}{\partial x_2} \ \vdots \ \dfrac{\partial y}{\partial x_n} \end{bmatrix}
∂
x
∂
y
=
∂
x
1
∂
y
∂
x
2
∂
y
⋮
∂
x
n
∂
y
线性函数 :若
y
a T x y = \boldsymbol{a}^{\rm T} \boldsymbol{x}
y
=
a
T
x ,其中
a \boldsymbol{a}
a 是一个
n n
n 维列向量,则
∂ y ∂ x
a \frac{\partial y}{\partial \boldsymbol{x}} = \boldsymbol{a}
∂
x
∂
y
=
a
二次型函数 :若
y
x T A x y = \boldsymbol{x}^{\rm T} {\bm A} \boldsymbol{x}
y
=
x
T
A
x ,其中
A {\bm A}
A 是一个
n × n n \times n
n
×
n 的矩阵,则
∂ y ∂ x
( A + A T ) x \frac{\partial y}{\partial \boldsymbol{x}} = ({\bm A} + {\bm A}^{\rm T}) \boldsymbol{x}
∂
x
∂
y
=
(
A
A
T
)
x
当
A {\bm A}
A 为对称矩阵时,
A T
A {\bm A}^{\rm T} = {\bm A}
A
T
=
A ,则
∂ y ∂ x
2 A x \frac{\partial y}{\partial \boldsymbol{x}} = 2{\bm A} \boldsymbol{x}
∂
x
∂
y
=
2
A
x
当
A {\bm A}
A 为单位矩阵时,
y
x T x y = \boldsymbol{x}^{\rm T} \boldsymbol{x}
y
=
x
T
x ,则
∂ y ∂ x
∂ ∥ x ∥ 2 ∂ x
∂ x T x ∂ x
2 x \frac{\partial y}{\partial \boldsymbol{x}} = \frac{\partial |{\bm x}|^2}{\partial {\bm x}} = \frac{\partial {\bm x}^{\rm T} {\bm x}}{\partial {\bm x}} =2{\bm x}
∂
x
∂
y
=
∂
x
∂
∥
x
∥
2
=
∂
x
∂
x
T
x
=
2
x
∥ x ∥ 2 |{\bm x}|^2
∥
x
∥
2 表示向量
x {\bm x}
x 的范数(长度)的平方。
向量对向量求导的常用数学公式
若
y
A x {\bm y}= {\bm A} \boldsymbol{x}
y
=
A
x ,其中
A {\bm A}
A 是一个
n × n n \times n
n
×
n 的矩阵,则
∂ y ∂ x
∂ A x ∂ x
A T \frac{\partial {\bm y}}{\partial \boldsymbol{x}} = \frac{\partial {\bm A}{\bm x}}{\partial {\bm x}} = {\bm A}^{\rm T}
∂
x
∂
y
=
∂
x
∂
A
x
=
A
T
A {\bm A}
A 是一个矩阵,
x {\bm x}
x 是一个向量。
对
x {\bm x}
x 求导的结果是矩阵
A {\bm A}
A 的转置
A T {\bm A}^{\rm T}
A
T 。
复合函数的导数
给定函数
g ( u ( x ) ) g(u(x))
g
(
u
(
x
)) ,其中
u
u ( x )
b − A x {\bm u}=u({\bm x}) = {\bm b} - {\bm A}{\bm x}
u
=
u
(
x
)
=
b
−
A
x ,且
g ( u )
∥ u ∥ 2 g({\bm u}) = |{\bm u}|^2
g
(
u
)
=
∥
u
∥
2 。
链式法则
根据链式法则(Chain Rule),有:
∂ g ( u ( x ) ) ∂ x
∂ g ∂ u ⋅ ∂ u ∂ x \frac{\partial g(u({\bm x}))}{\partial {\bm x}} = \frac{\partial g}{\partial {\bm u}} \cdot \frac{\partial {\bm u}}{\partial {\bm x}}
∂
x
∂
g
(
u
(
x
))
=
∂
u
∂
g
⋅
∂
x
∂
u
具体步骤
**计算
∂ u ∂ x \dfrac{\partial {\bm u}}{\partial {\bm x}}
∂
x
∂
u
** :
u ( x )
b − A x {\bm u}({\bm x}) = {\bm b} - {\bm A}{\bm x}
u
(
x
)
=
b
−
A
x
对
x {\bm x}
x 求导得到:
∂ u ∂ x
− A \frac{\partial {\bm u}}{\partial {\bm x}} = -{\bm A}
∂
x
∂
u
=
−
A
**计算
∂ g ( u ) ∂ u \dfrac{\partial g({\bm u})}{\partial {\bm u}}
∂
u
∂
g
(
u
)
** :
g ( u )
∥ u ∥ 2
u T u g({\bm u}) = |{\bm u}|^2 = {\bm u}^{\rm T} {\bm u}
g
(
u
)
=
∥
u
∥
2
=
u
T
u
对
u {\bm u}
u 求导得到:
∂ g ( u ) ∂ u
2 u \frac{\partial g({\bm u})}{\partial {\bm u}} = 2{\bm u}
∂
u
∂
g
(
u
)
=
2
u
应用链式法则 :
∂ g ( u ( x ) ) ∂ x
∂ g ( u ) ∂ u ⋅ ∂ u ∂ x \frac{\partial g(u({\bm x}))}{\partial {\bm x}} = \frac{\partial g({\bm u})}{\partial {\bm u}} \cdot \frac{\partial {\bm u}}{\partial {\bm x}}
∂
x
∂
g
(
u
(
x
))
=
∂
u
∂
g
(
u
)
⋅
∂
x
∂
u
将上面的结果代入:
∂ g ( u ( x ) ) ∂ x
2 u ⋅ ( − A ) \frac{\partial g({\bm u}({\bm x}))}{\partial {\bm x}} = 2{\bm u} \cdot (-{\bm A})
∂
x
∂
g
(
u
(
x
))
=
2
u
⋅
(
−
A
)
由于
u
b − A x {\bm u} = {\bm b} - {\bm A}{\bm x}
u
=
b
−
A
x ,代入得到:
∂ g ( u ( x ) ) ∂ x
− 2 A T ( b − A x ) \frac{\partial g({u}({\bm x}))}{\partial {\bm x}} = -2{\bm A}^{\rm T} ({\bm b} - {\bm A}{\bm x})
∂
x
∂
g
(
u
(
x
))
=
−
2
A
T
(
b
−
A
x
)
最终结果是:
∂ ∥ b − A x ∥ 2 ∂ x
− 2 A T ( b − A x ) \frac{\partial |{\bm b} - {\bm A}{\bm x}|^2}{\partial {\bm x}} = -2{\bm A}^{\rm T} ({\bm b} - {\bm A}{\bm x})
∂
x
∂
∥
b
−
A
x
∥
2
=
−
2
A
T
(
b
−
A
x
)