神经网络中梯度计算求和公式求导问题
神经网络中梯度计算求和公式求导问题
以下是公式一推导出公式二的过程。
表达式一
∂ E ∂ w j k
− 2 ( t k − o k ) ⋅ sigmoid ( ∑ j w j k ⋅ o j ) ⋅ ( 1 − sigmoid ( ∑ j w j k ⋅ o j ) ) ⋅ ∂ ∂ w j k ( ∑ j w j k ⋅ o j ) \frac{\partial E}{\partial w_{jk}} = -2(t_k - o_k) \cdot \text{sigmoid}\left(\sum_j w_{jk} \cdot o_j\right) \cdot (1 - \text{sigmoid}\left(\sum_j w_{jk} \cdot o_j\right)) \cdot \frac{\partial}{\partial w_{jk}} \left(\sum_j w_{jk} \cdot o_j\right)
∂
w
jk
∂
E
=
−
2
(
t
k
−
o
k
)
⋅
sigmoid
(
j
∑
w
jk
⋅
o
j
)
⋅
(
1
−
sigmoid
(
j
∑
w
jk
⋅
o
j
)
)
⋅
∂
w
jk
∂
(
j
∑
w
jk
⋅
o
j
)
表达式二
∂ E ∂ w j k
− 2 ( t k − o k ) ⋅ sigmoid ( ∑ j w j k ⋅ o j ) ⋅ ( 1 − sigmoid ( ∑ j w j k ⋅ o j ) ) ⋅ o j \frac{\partial E}{\partial w_{jk}} = -2(t_k - o_k) \cdot \text{sigmoid}\left(\sum_j w_{jk} \cdot o_j\right) \cdot (1 - \text{sigmoid}\left(\sum_j w_{jk} \cdot o_j\right)) \cdot o_j
∂
w
jk
∂
E
=
−
2
(
t
k
−
o
k
)
⋅
sigmoid
(
j
∑
w
jk
⋅
o
j
)
⋅
(
1
−
sigmoid
(
j
∑
w
jk
⋅
o
j
)
)
⋅
o
j
这是一个关于神经网络中梯度计算的推导问题,主要运用了链式法则来进行求导推导,以下是详细过程:
已知条件
已知要对
∂ E ∂ w j , k \frac{\partial E}{\partial w_{j,k}}
∂
w
j
,
k
∂
E
进行求导,表达式最初形式为:
∂ E ∂ w j , k
− 2 ( t k − o k ) ⋅ sigmoid ( ∑ j w j , k ⋅ o j ) ( 1 − sigmoid ( ∑ j w j , k ⋅ o j ) ) ⋅ ∂ ( ∑ j w j , k ⋅ o j ) ∂ w j , k \frac{\partial E}{\partial w_{j,k}} = -2(t_{k} - o_{k}) \cdot \text{sigmoid}(\sum_{j} w_{j,k} \cdot o_{j})(1 - \text{sigmoid}(\sum_{j} w_{j,k} \cdot o_{j})) \cdot \frac{\partial (\sum_{j} w_{j,k} \cdot o_{j})}{\partial w_{j,k}}
∂
w
j
,
k
∂
E
=
−
2
(
t
k
−
o
k
)
⋅
sigmoid
(
j
∑
w
j
,
k
⋅
o
j
)
(
1
−
sigmoid
(
j
∑
w
j
,
k
⋅
o
j
))
⋅
∂
w
j
,
k
∂
(
∑
j
w
j
,
k
⋅
o
j
)
这里
E E
E 通常表示误差,
t k t_{k}
t
k
是目标值,
o k o_{k}
o
k
是输出值,
w j , k w_{j,k}
w
j
,
k
是权重,
o j o_{j}
o
j
是前一层神经元的输出,
sigmoid \text{sigmoid}
sigmoid 是激活函数。
推导过程
重点关注
∂ ( ∑ j w j , k ⋅ o j ) ∂ w j , k \frac{\partial (\sum_{j} w_{j,k} \cdot o_{j})}{\partial w_{j,k}}
∂
w
j
,
k
∂
(
∑
j
w
j
,
k
⋅
o
j
)
这一项。
根据求和求导的性质,对于
∑ j w j , k ⋅ o j \sum_{j} w_{j,k} \cdot o_{j}
∑
j
w
j
,
k
⋅
o
j
,因为只有当
j j
j 取特定值时,
w j , k w_{j,k}
w
j
,
k
才是变量(其他项的
w i , k w_{i,k}
w
i
,
k
中
i ≠ j i \neq j
i
=
j 对于当前求导来说是常量)。
那么
∑ j w j , k ⋅ o j \sum_{j} w_{j,k} \cdot o_{j}
∑
j
w
j
,
k
⋅
o
j
展开后,对
w j , k w_{j,k}
w
j
,
k
求导时,除了包含
w j , k w_{j,k}
w
j
,
k
的这一项,其他项都为 0(因为它们相对于
w j , k w_{j,k}
w
j
,
k
是常数)。
而包含
w j , k w_{j,k}
w
j
,
k
的这一项为
w j , k ⋅ o j w_{j,k} \cdot o_{j}
w
j
,
k
⋅
o
j
,根据求导公式
( a x ) ′
a (ax)^\prime = a
(
a
x
)
′
=
a (
a a
a 为常数,
x x
x 为变量),对
w j , k ⋅ o j w_{j,k} \cdot o_{j}
w
j
,
k
⋅
o
j
关于
w j , k w_{j,k}
w
j
,
k
求导,结果就是
o j o_{j}
o
j
。
将
∂ ( ∑ j w j , k ⋅ o j ) ∂ w j , k
o j \frac{\partial (\sum_{j} w_{j,k} \cdot o_{j})}{\partial w_{j,k}} = o_{j}
∂
w
j
,
k
∂
(
∑
j
w
j
,
k
⋅
o
j
)
=
o
j
代入原式,就得到了第二个表达式:
∂ E ∂ w j , k
− 2 ( t k − o k ) ⋅ sigmoid ( ∑ j w j , k ⋅ o j ) ( 1 − sigmoid ( ∑ j w j , k ⋅ o j ) ) ⋅ o j \frac{\partial E}{\partial w_{j,k}} = -2(t_{k} - o_{k}) \cdot \text{sigmoid}(\sum_{j} w_{j,k} \cdot o_{j})(1 - \text{sigmoid}(\sum_{j} w_{j,k} \cdot o_{j})) \cdot o_{j}
∂
w
j
,
k
∂
E
=
−
2
(
t
k
−
o
k
)
⋅
sigmoid
(
j
∑
w
j
,
k
⋅
o
j
)
(
1
−
sigmoid
(
j
∑
w
j
,
k
⋅
o
j
))
⋅
o
j
综上,通过对
∂ ( ∑ j w j , k ⋅ o j ) ∂ w j , k \frac{\partial (\sum_{j} w_{j,k} \cdot o_{j})}{\partial w_{j,k}}
∂
w
j
,
k
∂
(
∑
j
w
j
,
k
⋅
o
j
)
进行求导并代入原式,就从第一个表达式推导出了第二个表达式。