目录

梯度计算中常用的矩阵微积分公式

梯度计算中常用的矩阵微积分公式

标量对向量求导的常用数学公式

设标量函数

y

f ( x ) y = f(\boldsymbol{x})

y

=

f

(

x

) ,其中

x

( x 1 , x 2 , ⋯   , x n ) T \boldsymbol{x} = (x_1, x_2, \cdots, x_n)^{\rm T}

x

=

(

x

1

,

x

2

,

,

x

n

)

T 是一个

n n

n 维列向量。标量

y y

y 对向量

x \boldsymbol{x}

x 的导数为一个

n n

n 维列向量:

∂ y ∂ x

[ ∂ y ∂ x 1 ∂ y ∂ x 2 ⋮ ∂ y ∂ x n ] \frac{\partial y}{\partial \boldsymbol{x}} = \begin{bmatrix} \dfrac{\partial y}{\partial x_1} \ \dfrac{\partial y}{\partial x_2} \ \vdots \ \dfrac{\partial y}{\partial x_n} \end{bmatrix}

x

y

=

x

1

y

x

2

y

x

n

y

  1. 线性函数 :若

    y

    a T x y = \boldsymbol{a}^{\rm T} \boldsymbol{x}

    y

    =

    a

    T

    x ,其中

    a \boldsymbol{a}

    a 是一个

    n n

    n 维列向量,则

∂ y ∂ x

a \frac{\partial y}{\partial \boldsymbol{x}} = \boldsymbol{a}

x

y

=

a

  1. 二次型函数 :若

    y

    x T A x y = \boldsymbol{x}^{\rm T} {\bm A} \boldsymbol{x}

    y

    =

    x

    T

    A

    x ,其中

    A {\bm A}

    A 是一个

    n × n n \times n

    n

    ×

    n 的矩阵,则

∂ y ∂ x

( A + A T ) x \frac{\partial y}{\partial \boldsymbol{x}} = ({\bm A} + {\bm A}^{\rm T}) \boldsymbol{x}

x

y

=

(

A

A

T

)

x

A {\bm A}

A 为对称矩阵时,

A T

A {\bm A}^{\rm T} = {\bm A}

A

T

=

A ,则

∂ y ∂ x

2 A x \frac{\partial y}{\partial \boldsymbol{x}} = 2{\bm A} \boldsymbol{x}

x

y

=

2

A

x

A {\bm A}

A 为单位矩阵时,

y

x T x y = \boldsymbol{x}^{\rm T} \boldsymbol{x}

y

=

x

T

x ,则

∂ y ∂ x

∂ ∥ x ∥ 2 ∂ x

∂ x T x ∂ x

2 x \frac{\partial y}{\partial \boldsymbol{x}} = \frac{\partial |{\bm x}|^2}{\partial {\bm x}} = \frac{\partial {\bm x}^{\rm T} {\bm x}}{\partial {\bm x}} =2{\bm x}

x

y

=

x

x

2

=

x

x

T

x

=

2

x

∥ x ∥ 2 |{\bm x}|^2

x

2 表示向量

x {\bm x}

x 的范数(长度)的平方。

向量对向量求导的常用数学公式

y

A x {\bm y}= {\bm A} \boldsymbol{x}

y

=

A

x ,其中

A {\bm A}

A 是一个

n × n n \times n

n

×

n 的矩阵,则

∂ y ∂ x

∂ A x ∂ x

A T \frac{\partial {\bm y}}{\partial \boldsymbol{x}} = \frac{\partial {\bm A}{\bm x}}{\partial {\bm x}} = {\bm A}^{\rm T}

x

y

=

x

A

x

=

A

T

A {\bm A}

A 是一个矩阵,

x {\bm x}

x 是一个向量。

x {\bm x}

x 求导的结果是矩阵

A {\bm A}

A 的转置

A T {\bm A}^{\rm T}

A

T 。

复合函数的导数

给定函数

g ( u ( x ) ) g(u(x))

g

(

u

(

x

)) ,其中

u

u ( x )

b − A x {\bm u}=u({\bm x}) = {\bm b} - {\bm A}{\bm x}

u

=

u

(

x

)

=

b

A

x ,且

g ( u )

∥ u ∥ 2 g({\bm u}) = |{\bm u}|^2

g

(

u

)

=

u

2 。

链式法则

根据链式法则(Chain Rule),有:

∂ g ( u ( x ) ) ∂ x

∂ g ∂ u ⋅ ∂ u ∂ x \frac{\partial g(u({\bm x}))}{\partial {\bm x}} = \frac{\partial g}{\partial {\bm u}} \cdot \frac{\partial {\bm u}}{\partial {\bm x}}

x

g

(

u

(

x

))

=

u

g

x

u

具体步骤
  1. **计算

    ∂ u ∂ x \dfrac{\partial {\bm u}}{\partial {\bm x}}

    x

    u

    ​** :

    u ( x )

    b − A x {\bm u}({\bm x}) = {\bm b} - {\bm A}{\bm x}

    u

    (

    x

    )

    =

    b

    A

    x

    x {\bm x}

    x 求导得到:

    ∂ u ∂ x

    − A \frac{\partial {\bm u}}{\partial {\bm x}} = -{\bm A}

    x

    u

    =

    A

  2. **计算

    ∂ g ( u ) ∂ u \dfrac{\partial g({\bm u})}{\partial {\bm u}}

    u

    g

    (

    u

    )

    ​** :

    g ( u )

    ∥ u ∥ 2

    u T u g({\bm u}) = |{\bm u}|^2 = {\bm u}^{\rm T} {\bm u}

    g

    (

    u

    )

    =

    u

    2

    =

    u

    T

    u

    u {\bm u}

    u 求导得到:

    ∂ g ( u ) ∂ u

    2 u \frac{\partial g({\bm u})}{\partial {\bm u}} = 2{\bm u}

    u

    g

    (

    u

    )

    =

    2

    u

  3. 应用链式法则 :

    ∂ g ( u ( x ) ) ∂ x

    ∂ g ( u ) ∂ u ⋅ ∂ u ∂ x \frac{\partial g(u({\bm x}))}{\partial {\bm x}} = \frac{\partial g({\bm u})}{\partial {\bm u}} \cdot \frac{\partial {\bm u}}{\partial {\bm x}}

    x

    g

    (

    u

    (

    x

    ))

    =

    u

    g

    (

    u

    )

    x

    u

    将上面的结果代入:

    ∂ g ( u ( x ) ) ∂ x

    2 u ⋅ ( − A ) \frac{\partial g({\bm u}({\bm x}))}{\partial {\bm x}} = 2{\bm u} \cdot (-{\bm A})

    x

    g

    (

    u

    (

    x

    ))

    =

    2

    u

    (

    A

    )

    由于

    u

    b − A x {\bm u} = {\bm b} - {\bm A}{\bm x}

    u

    =

    b

    A

    x ,代入得到:

    ∂ g ( u ( x ) ) ∂ x

    − 2 A T ( b − A x ) \frac{\partial g({u}({\bm x}))}{\partial {\bm x}} = -2{\bm A}^{\rm T} ({\bm b} - {\bm A}{\bm x})

    x

    g

    (

    u

    (

    x

    ))

    =

    2

    A

    T

    (

    b

    A

    x

    )

最终结果是:

∂ ∥ b − A x ∥ 2 ∂ x

− 2 A T ( b − A x ) \frac{\partial |{\bm b} - {\bm A}{\bm x}|^2}{\partial {\bm x}} = -2{\bm A}^{\rm T} ({\bm b} - {\bm A}{\bm x})

x

b

A

x

2

=

2

A

T

(

b

A

x

)