论文阅读笔记Deep-Unsupervised-Learning-using-Nonequilibrium-Thermodynamics
论文阅读笔记:Deep Unsupervised Learning using Nonequilibrium Thermodynamics
1、来源
论文连接1:
论文连接2(带appendix):
代码链接:
代码的环境配置(基于theano)参考:
2、论文推理过程
扩散模型的流程如下图所示,可以看出
q ( x 0 , 1 , 2 ⋯ , T − 1 , T ) q(x^{0,1,2\cdots ,T-1, T})
q
(
x
0
,
1
,
2
⋯
,
T
−
1
,
T
) 为正向加噪音过程,
p ( x 0 , 1 , 2 ⋯ , T − 1 , T ) p(x^{0,1,2\cdots ,T-1, T})
p
(
x
0
,
1
,
2
⋯
,
T
−
1
,
T
) 为逆向去噪音过程,具体过程参考 。可以看出,逆向去噪的末端得到的图上还散布一些噪点。
2.1、名词解释
q ( x 0 ) q(x^0)
q
(
x
0
) :
x 0 x^0
x
0 表示数据集的图像分布,例如在使用MNIST数据集时,
x 0 x^0
x
0 就表示MNIST数据集中的图像,而
q ( x 0 ) q(x^0)
q
(
x
0
) 就表示数据集MNIST中数据集的分布情况。
p ( x T ) p(x^T)
p
(
x
T
) :
x T x^T
x
T 表示
x 0 x^0
x
0 的加噪结果,
x T x^T
x
T 是逆向去噪的起点,因此
p ( x T ) p(x^T)
p
(
x
T
) 是去噪起点的分布情况。与
π ( x T ) \pi(x^T)
π
(
x
T
) 相同。
值得注意的是
p ( x t ) p(x^t)
p
(
x
t
) 与
q ( x t ) q(x^t)
q
(
x
t
) 是相同的。
2.2、推理过程
正向加噪过程满足马尔可夫性质,因此有公式1。
q ( x 0 , 1 , 2 ⋯ , T − 1 , T )
q ( x 0 ) ⋅ ∏ t
1 T q ( x t ∣ x t − 1 )
q ( x 0 ) ⋅ q ( x 1 ∣ x 0 ) ⋅ q ( x 2 ∣ x 1 ) … q ( x T ∣ x T − 1 ) . q ( x 1 , 2 ⋯ T ∣ x 0 )
q ( x 1 ∣ x 0 ) ⋅ q ( x 2 ∣ x 1 ) … q ( x T ∣ x T − 1 ) ) . \begin{equation} \begin{split} q(x^{0,1,2\cdots,T-1,T})&=q(x^0)\cdot \prod_{t=1}^{T}{q(x^t|x^{t-1})}=q(x^0)\cdot q(x^1|x^0)\cdot q(x^2|x^1)\dots q(x^T|x^{T-1}). \ q(x^{1,2 \cdots T}|x^0)&=q(x^1|x^0)\cdot q(x^2|x^1)\dots q(x^T|x^{T-1})). \end{split} \end{equation}
q
(
x
0
,
1
,
2
⋯
,
T
−
1
,
T
)
q
(
x
1
,
2
⋯
T
∣
x
0
)
=
q
(
x
0
)
⋅
t
=
1
∏
T
q
(
x
t
∣
x
t
−
1
)
=
q
(
x
0
)
⋅
q
(
x
1
∣
x
0
)
⋅
q
(
x
2
∣
x
1
)
…
q
(
x
T
∣
x
T
−
1
)
.
=
q
(
x
1
∣
x
0
)
⋅
q
(
x
2
∣
x
1
)
…
q
(
x
T
∣
x
T
−
1
))
.
逆向去噪过程如公式2。
p θ ( x 0 , 1 , 2 ⋯ , T − 1 , T )
p θ ( x T ) ⋅ ∏ t
1 T p θ ( x t − 1 ∣ x t )
p θ ( x T ) ⋅ p θ ( x T − 1 ∣ x T ) ⋅ p θ ( x T − 2 ∣ x T − 1 ) … p θ ( x 0 ∣ x 1 ) . \begin{equation} p_{\theta}(x^{0,1,2\cdots,T-1,T})=p_{\theta}(x^T)\cdot \prod_{t=1}^{T}{p_{\theta}(x^{t-1}|x^{t})}=p_{\theta}(x^T)\cdot p_{\theta}(x^{T-1}|x^T)\cdot p_{\theta}(x^{T-2}|x^{T-1})\dots p_{\theta}(x^{0}|x^{1}). \end{equation}
p
θ
(
x
0
,
1
,
2
⋯
,
T
−
1
,
T
)
=
p
θ
(
x
T
)
⋅
t
=
1
∏
T
p
θ
(
x
t
−
1
∣
x
t
)
=
p
θ
(
x
T
)
⋅
p
θ
(
x
T
−
1
∣
x
T
)
⋅
p
θ
(
x
T
−
2
∣
x
T
−
1
)
…
p
θ
(
x
0
∣
x
1
)
.
公式2中的参数
θ \theta
θ 就是深度学习模型中需要学习的参数。为了方便,省略公式2中的
θ \theta
θ ,因此公式2被重写为公式3。
p ( x 0 , 1 , 2 ⋯ , T − 1 , T )
p ( x T ) ⋅ ∏ t
1 T p ( x t − 1 ∣ x t )
p ( x T ) ⋅ p ( x T − 1 ∣ x T ) ⋅ p ( x T − 2 ∣ x T − 1 ) … p ( x 0 ∣ x 1 ) . \begin{equation} p(x^{0,1,2\cdots,T-1,T})=p(x^T)\cdot \prod_{t=1}^{T}{p(x^{t-1}|x^{t})}=p(x^T)\cdot p(x^{T-1}|x^T)\cdot p(x^{T-2}|x^{T-1})\dots p(x^{0}|x^{1}). \end{equation}
p
(
x
0
,
1
,
2
⋯
,
T
−
1
,
T
)
=
p
(
x
T
)
⋅
t
=
1
∏
T
p
(
x
t
−
1
∣
x
t
)
=
p
(
x
T
)
⋅
p
(
x
T
−
1
∣
x
T
)
⋅
p
(
x
T
−
2
∣
x
T
−
1
)
…
p
(
x
0
∣
x
1
)
.
逆向去噪的目标是使得其终点与正向加噪的起点相同。也就是使得
p ( x 0 ) p(x^0)
p
(
x
0
) 最大,即使得 逆向去噪过程为
x 0 x^0
x
0 的概率最大。
p ( x 0 )
∫ p ( x 0 , x 1 ) d x 1 ( 联合分布概率公式 )
∫ p ( x 1 ) ⋅ p ( x 0 ∣ x 1 ) d x 1 ( 贝叶斯概率公式 )
∫ ∫ p ( x 1 , x 2 ) d x 2 ⋅ p ( x 0 ∣ x 1 ) d x 1 ( 积分套积分 )
∫ ∫ p ( x 2 ) ⋅ p ( x 1 ∣ x 2 ) ⋅ p ( x 0 ∣ x 1 ) d x 1 d x 2 ( 改写为二重积分 )
∫ ∫ p ( x 2 ) ⋅ p ( x 1 ∣ x 2 ) ⋅ p ( x 0 ∣ x 1 ) d x 1 d x 2
⋮
∫ ∫ ⋯ ∫ p ( x T ) ⋅ p ( x T − 1 ∣ x T ) ⋅ p ( x T − 2 ∣ x − 1 ) ⋯ p ( x 0 ∣ x 1 ) ⋅ d x 1 d x 2 ⋯ d x T
∫ p ( x 0 , 1 , 2 ⋯ T ) d x 1 , 2 ⋯ T ( T − 1 重积分 )
∫ d x 1 , 2 ⋯ T ⋅ p ( x 0 , 1 , 2 ⋯ T ) ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) q ( x 1 , 2 ⋯ T ∣ x 0 )
∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ p ( x 0 , 1 , 2 ⋯ T ) q ( x 1 , 2 ⋯ T ∣ x 0 )
∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ p ( x T ) ⋅ p ( x T − 1 ∣ x T ) ⋅ p ( x T − 2 ∣ x T − 1 ) … p ( x 0 ∣ x 1 ) q ( x 1 ∣ x 0 ) ⋅ q ( x 2 ∣ x 1 ) … q ( x T ∣ x T − 1 )
∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ p ( x T ) ⋅ p ( x T − 1 ∣ x T ) ⋅ p ( x T − 2 ∣ x T − 1 ) … p ( x 0 ∣ x 1 ) q ( x 1 ∣ x 0 ) ⋅ q ( x 2 ∣ x 1 ) … q ( x T ∣ x T − 1 )
∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ p ( x T ) ⋅ ∏ t
1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 )
E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) p ( x T ) ⋅ ∏ t
1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ( 改写为期望的形式 ) \begin{equation} \begin{split} p(x^0)&=\int p(x^0,x^1)dx^{1} (联合分布概率公式)\ &=\int p(x^1)\cdot p(x^0|x^1)dx^1 (贝叶斯概率公式) \ &=\int \int p(x1,x2)dx^2 \cdot p(x^0|x^1)dx^1 (积分套积分)\ &=\int \int p(x^2)\cdot p(x^1|x^2) \cdot p(x^0|x^1)dx^1 dx^2(改写为二重积分)\ &= \int \int p(x^2) \cdot p(x^1|x^2) \cdot p(x^0|x^1) dx^1 dx^2 \ &= \vdots \ &= \int \int \cdots \int p(x^T)\cdot p(x^{T-1}|x^{T})\cdot p(x^{T-2}|x^{-1})\cdots p(x^0|x^1) \cdot dx^1 dx^2 \cdots dx^T \ &= \int p(x^{0,1,2 \cdots T})dx^{1,2\cdots T} (T-1重积分) \ &= \int dx^{1,2\cdots T} \cdot p(x^{0,1,2 \cdots T}) \cdot \frac{q(x^{1,2 \cdots T}| x^0)}{q(x^{1,2 \cdots T}|x^0)} \ &= \int dx^{1,2\cdots T} \cdot q(x^{1,2 \cdots T}| x^0) \cdot \frac{ p(x^{0,1,2 \cdots T}) }{q(x^{1,2 \cdots T}|x^0)} \ &= \int dx^{1,2\cdots T} \cdot q(x^{1,2 \cdots T}| x^0) \cdot \frac{ p(x^T)\cdot p(x^{T-1}|x^T)\cdot p(x^{T-2}|x^{T-1})\dots p(x^{0}|x^{1})}{q(x^1|x^0)\cdot q(x^2|x^1)\dots q(x^T|x^{T-1})} \ &= \int dx^{1,2\cdots T} \cdot q(x^{1,2 \cdots T}| x^0) \cdot p(x^T)\cdot \frac{ p(x^{T-1}|x^T)\cdot p(x^{T-2}|x^{T-1})\dots p(x^{0}|x^{1})}{q(x^1|x^0)\cdot q(x^2|x^1)\dots q(x^T|x^{T-1})} \ &= \int dx^{1,2\cdots T} \cdot q(x^{1,2 \cdots T}| x^0) \cdot p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})} \ &= E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})} (改写为期望的形式)\ \end{split} \end{equation}
p
(
x
0
)
=
∫
p
(
x
0
,
x
1
)
d
x
1
(
联合分布概率公式
)
=
∫
p
(
x
1
)
⋅
p
(
x
0
∣
x
1
)
d
x
1
(
贝叶斯概率公式
)
=
∫∫
p
(
x
1
,
x
2
)
d
x
2
⋅
p
(
x
0
∣
x
1
)
d
x
1
(
积分套积分
)
=
∫∫
p
(
x
2
)
⋅
p
(
x
1
∣
x
2
)
⋅
p
(
x
0
∣
x
1
)
d
x
1
d
x
2
(
改写为二重积分
)
=
∫∫
p
(
x
2
)
⋅
p
(
x
1
∣
x
2
)
⋅
p
(
x
0
∣
x
1
)
d
x
1
d
x
2
=
⋮
=
∫∫
⋯
∫
p
(
x
T
)
⋅
p
(
x
T
−
1
∣
x
T
)
⋅
p
(
x
T
−
2
∣
x
−
1
)
⋯
p
(
x
0
∣
x
1
)
⋅
d
x
1
d
x
2
⋯
d
x
T
=
∫
p
(
x
0
,
1
,
2
⋯
T
)
d
x
1
,
2
⋯
T
(
T
−
1
重积分
)
=
∫
d
x
1
,
2
⋯
T
⋅
p
(
x
0
,
1
,
2
⋯
T
)
⋅
q
(
x
1
,
2
⋯
T
∣
x
0
)
q
(
x
1
,
2
⋯
T
∣
x
0
)
=
∫
d
x
1
,
2
⋯
T
⋅
q
(
x
1
,
2
⋯
T
∣
x
0
)
⋅
q
(
x
1
,
2
⋯
T
∣
x
0
)
p
(
x
0
,
1
,
2
⋯
T
)
=
∫
d
x
1
,
2
⋯
T
⋅
q
(
x
1
,
2
⋯
T
∣
x
0
)
⋅
q
(
x
1
∣
x
0
)
⋅
q
(
x
2
∣
x
1
)
…
q
(
x
T
∣
x
T
−
1
)
p
(
x
T
)
⋅
p
(
x
T
−
1
∣
x
T
)
⋅
p
(
x
T
−
2
∣
x
T
−
1
)
…
p
(
x
0
∣
x
1
)
=
∫
d
x
1
,
2
⋯
T
⋅
q
(
x
1
,
2
⋯
T
∣
x
0
)
⋅
p
(
x
T
)
⋅
q
(
x
1
∣
x
0
)
⋅
q
(
x
2
∣
x
1
)
…
q
(
x
T
∣
x
T
−
1
)
p
(
x
T
−
1
∣
x
T
)
⋅
p
(
x
T
−
2
∣
x
T
−
1
)
…
p
(
x
0
∣
x
1
)
=
∫
d
x
1
,
2
⋯
T
⋅
q
(
x
1
,
2
⋯
T
∣
x
0
)
⋅
p
(
x
T
)
⋅
t
=
1
∏
T
q
(
x
t
∣
x
t
−
1
)
p
(
x
t
−
1
∣
x
t
)
=
E
x
1
,
2
,
⋯
T
∼
q
(
x
1
,
2
⋯
T
∣
x
0
)
p
(
x
T
)
⋅
t
=
1
∏
T
q
(
x
t
∣
x
t
−
1
)
p
(
x
t
−
1
∣
x
t
)
(
改写为期望的形式
)
因此公式3中的参数
θ \theta
θ 应满足
θ
a r g max θ p ( x 0 ) . \begin{equation} \theta= arg \underset {\theta}{\text{max}} p(x^0). \end{equation}
θ
=
a
r
g
θ
max
p
(
x
0
)
.
公式4是对数据集中的一张图片进行求解,然而数据集中通常是有成千上万张图像的。假设数据集中有
N N
N 张图像,因此有公式6,其目的是求得一组参数
θ \theta
θ ,使得
L L
L 取得最大值。值得注意的是
q ( x 0 ) q(x^0)
q
(
x
0
) 表示数据集中每张图片被采样出来的概率。
L
∑ n
0 N q ( x 0 ) ⋅ l o g ( p ( x 0 ) )
∫ d x 0 ⋅ q ( x 0 ) ⋅ l o g ( p ( x 0 ) )
∫ d x 0 ⋅ q ( x 0 ) ⋅ l o g [ E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) p ( x T ) ⋅ ∏ t
1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] ≥ ∫ d x 0 ⋅ q ( x 0 ) ⋅ E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) l o g [ p ( x T ) ⋅ ∏ t
1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ]
∫ d x 0 ⋅ q ( x 0 ) ∫ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ l o g [ p ( x T ) ⋅ ∏ t
1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] ⋅ d x 1 , 2 ⋯ T
∫ d x 0 , 1 , 2 ⋯ T q ( x 0 ) ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ l o g [ p ( x T ) ⋅ ∏ t
1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ]
∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x T ) ⋅ ∏ t
1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ]
∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ ∏ t
1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] + ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x T ) ]
K \begin{equation} \begin{split} L&=\sum_{n=0}^{N} q(x^0)\cdot log(p(x^0)) \ &=\int dx^0\cdot q(x^0)\cdot log(p(x^0)) \ &=\int dx^0\cdot q(x^0)\cdot log [ E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}] \ & \geq \int dx^0\cdot q(x^0)\cdot E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} log [p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}]\ &= \int dx^0\cdot q(x^0) \int q(x^{1,2 \cdots T}| x^0) \cdot log [p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}] \cdot dx^{1,2\cdots T}\ &= \int dx^{0,1,2\cdots T} q(x^0) \cdot q(x^{1,2 \cdots T}| x^0) \cdot log [p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}] \ &= \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log [p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}] \ &= \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log [\prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}] + \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log [p(x^T)] \ &= K \ \end{split} \end{equation}
L
=
n
=
0
∑
N
q
(
x
0
)
⋅
l
o
g
(
p
(
x
0
))
=
∫
d
x
0
⋅
q
(
x
0
)
⋅
l
o
g
(
p
(
x
0
))
=
∫
d
x
0
⋅
q
(
x
0
)
⋅
l
o
g
[
E
x
1
,
2
,
⋯
T
∼
q
(
x
1
,
2
⋯
T
∣
x
0
)
p
(
x
T
)
⋅
t
=
1
∏
T
q
(
x
t
∣
x
t
−
1
)
p
(
x
t
−
1
∣
x
t
)
]
≥
∫
d
x
0
⋅
q
(
x
0
)
⋅
E
x
1
,
2
,
⋯
T
∼
q
(
x
1
,
2
⋯
T
∣
x
0
)
l
o
g
[
p
(
x
T
)
⋅
t
=
1
∏
T
q
(
x
t
∣
x
t
−
1
)
p
(
x
t
−
1
∣
x
t
)
]
=
∫
d
x
0
⋅
q
(
x
0
)
∫
q
(
x
1
,
2
⋯
T
∣
x
0
)
⋅
l
o
g
[
p
(
x
T
)
⋅
t
=
1
∏
T
q
(
x
t
∣
x
t
−
1
)
p
(
x
t
−
1
∣
x
t
)
]
⋅
d
x
1
,
2
⋯
T
=
∫
d
x
0
,
1
,
2
⋯
T
q
(
x
0
)
⋅
q
(
x
1
,
2
⋯
T
∣
x
0
)
⋅
l
o
g
[
p
(
x
T
)
⋅
t
=
1
∏
T
q
(
x
t
∣
x
t
−
1
)
p
(
x
t
−
1
∣
x
t
)
]
=
∫
d
x
0
,
1
,
2
⋯
T
⋅
q
(
x
0
,
1
,
2
⋯
T
)
⋅
l
o
g
[
p
(
x
T
)
⋅
t
=
1
∏
T
q
(
x
t
∣
x
t
−
1
)
p
(
x
t
−
1
∣
x
t
)
]
=
∫
d
x
0
,
1
,
2
⋯
T
⋅
q
(
x
0
,
1
,
2
⋯
T
)
⋅
l
o
g
[
t
=
1
∏
T
q
(
x
t
∣
x
t
−
1
)
p
(
x
t
−
1
∣
x
t
)
]
∫
d
x
0
,
1
,
2
⋯
T
⋅
q
(
x
0
,
1
,
2
⋯
T
)
⋅
l
o
g
[
p
(
x
T
)]
=
K
因此有公式
K
∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ ∏ t
1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] ⏟ K 1 + ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x T ) ] ⏟ K 2
K 1 + K 2 \begin{equation} \begin{split} K &= \underbrace{\int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log [\prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}]}{K1} + \underbrace{\int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log [p(x^T)]}{K_2} \ &=K_1 + K_2 \end{split} \end{equation}
K
=
K
1
∫
d
x
0
,
1
,
2
⋯
T
⋅
q
(
x
0
,
1
,
2
⋯
T
)
⋅
l
o
g
[
t
=
1
∏
T
q
(
x
t
∣
x
t
−
1
)
p
(
x
t
−
1
∣
x
t
)
]
K
2
∫
d
x
0
,
1
,
2
⋯
T
⋅
q
(
x
0
,
1
,
2
⋯
T
)
⋅
l
o
g
[
p
(
x
T
)]
=
K
1
K
2
首先考虑
K K
K 中的第二项
K 2 K_2
K
2
。
K 2
∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x T ) ]
∫ q ( x 0 ) ⋅ q ( x 1 ∣ x 0 ) ⋅ q ( x 2 ∣ x 1 ) ⋯ q ( x T ∣ x T − 1 ) ⋅ l o g [ p ( x T ) ] ⋅ d x 0 d x 1 ⋯ d x T
∫ ( ∫ q ( x 1 , x 0 ) ⋅ d x 0 ) ⋅ q ( x 2 ∣ x 1 ) ⋯ q ( x T ∣ x T − 1 ) ⋅ l o g [ p ( x T ) ] ⋅ d x 1 ⋯ d x T
∫ q ( x 1 ) ⋅ q ( x 2 ∣ x 1 ) ⋯ q ( x T ∣ x T − 1 ) ⋅ l o g [ p ( x T ) ] ⋅ d x 1 ⋯ d x T
∫ q ( x T ) ⋅ l o g [ p ( x T ) ] ⋅ d x T
∫ p ( x T ) ⋅ l o g [ p ( x T ) ] ⋅ d x T
− H p ( x T ) \begin{equation} \begin{split} K_2 &= \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log [p(x^T)] \ &= \int q(x^0)\cdot q(x^1|x^0) \cdot q(x^2|x^1) \cdots q(x^{T}|x^{T-1})\cdot log [p(x^T)] \cdot dx^0 dx^1\cdots dx^{T} \ &= \int \bigg( \int q(x^1, x^0) \cdot dx^0 \bigg) \cdot q(x^2|x^1) \cdots q(x^{T}|x^{T-1})\cdot log [p(x^T)] \cdot dx^1\cdots dx^{T} \ &= \int q(x^1) \cdot q(x^2|x^1) \cdots q(x^{T}|x^{T-1})\cdot log [p(x^T)] \cdot dx^1\cdots dx^{T} \ &= \int q(x^T) \cdot log [p(x^T)] \cdot dx^{T} \ &= \int p(x^T) \cdot log [p(x^T)] \cdot dx^{T} \ &=-H_p(x^T) \end{split} \end{equation}
K
2
=
∫
d
x
0
,
1
,
2
⋯
T
⋅
q
(
x
0
,
1
,
2
⋯
T
)
⋅
l
o
g
[
p
(
x
T
)]
=
∫
q
(
x
0
)
⋅
q
(
x
1
∣
x
0
)
⋅
q
(
x
2
∣
x
1
)
⋯
q
(
x
T
∣
x
T
−
1
)
⋅
l
o
g
[
p
(
x
T
)]
⋅
d
x
0
d
x
1
⋯
d
x
T
=
∫
(
∫
q
(
x
1
,
x
0
)
⋅
d
x
0
)
⋅
q
(
x
2
∣
x
1
)
⋯
q
(
x
T
∣
x
T
−
1
)
⋅
l
o
g
[
p
(
x
T
)]
⋅
d
x
1
⋯
d
x
T
=
∫
q
(
x
1
)
⋅
q
(
x
2
∣
x
1
)
⋯
q
(
x
T
∣
x
T
−
1
)
⋅
l
o
g
[
p
(
x
T
)]
⋅
d
x
1
⋯
d
x
T
=
∫
q
(
x
T
)
⋅
l
o
g
[
p
(
x
T
)]
⋅
d
x
T
=
∫
p
(
x
T
)
⋅
l
o
g
[
p
(
x
T
)]
⋅
d
x
T
=
−
H
p
(
x
T
)
p ( x T ) p(x^T)
p
(
x
T
) 是一个均值为0,方差为1的高斯分布。参考 ,可以计算出
K 2 K_2
K
2
如下所示。
K 2
− H p ( x T )
− ( 1 2 l o g [ 2 π σ 2 ] + 1 2 )
− ( 1 2 l o g [ 2 π ] + 1 2 ) \begin{equation} \begin{split} K_2 &=-H_p(x^T) \ &=-\bigg( \frac{1}{2} log[2 \pi \sigma^2] + \frac{1}{2} \bigg)\ &=-\bigg( \frac{1}{2} log[2 \pi ] + \frac{1}{2} \bigg) \end{split} \end{equation}
K
2
=
−
H
p
(
x
T
)
=
−
(
2
1
l
o
g
[
2
π
σ
2
]
2
1
)
=
−
(
2
1
l
o
g
[
2
π
]
2
1
)
在代码中的计算过程如下图红框所示。
接下来考虑
K 1 K_1
K
1
。值得注意的是,论文中说明,为了避免边界效应,因此强迫
p ( x 0 ∣ x 1 )
q ( x 1 ∣ x 0 ) p(x^{0}|x^1)=q(x^1|x^{0})
p
(
x
0
∣
x
1
)
=
q
(
x
1
∣
x
0
) 。
K 1
∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ ∏ t
1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ]
∑ t
1 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ]
∑ t
2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] + ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x 0 ∣ x 1 ) q ( x 1 ∣ x 0 ) ]
∑ t
2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ]
∑ t
2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t ) ⋅ q ( x t − 1 ) q ( x t ) ]
∑ t
2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ⋅ q ( x t − 1 ∣ x 0 ) q ( x t ∣ x 0 ) ]
∑ t
2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] + ∑ t
2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ q ( x t − 1 ∣ x 0 ) q ( x t ∣ x 0 ) ]
∑ t
2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] + ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ ∏ t
2 T q ( x t − 1 ∣ x 0 ) q ( x t ∣ x 0 ) ]
∑ t
2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] + ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ q ( x 1 ∣ x 0 ) q ( x T ∣ x 0 ) ]
∑ t
2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] + ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ q ( x 1 ∣ x 0 ) ] − ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ q ( x T ∣ x 0 ) ]
∑ t
2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] + ∫ d x 0 d x 1 ⋯ d x T ⋅ q ( x 0 ) ⋅ q ( x 1 ∣ x 0 ) ⋅ q ( x 2 ∣ x 1 ) ⋯ q ( x T ∣ x T − 1 ) ⋅ l o g [ q ( x 1 ∣ x 0 ) ] − ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ q ( x T ∣ x 0 ) ] \begin{equation} \begin{split} K_1 &=\int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}\bigg] \ &= \sum_{t=1}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}\bigg] \ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}\bigg] + \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{0}|x^1)}{q(x^1|x^{0})}\bigg] \ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}\bigg] \ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^{t})}\cdot \frac{q(x^{t-1})}{q(x^t)}\bigg] \ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^{t}, x^0)}\cdot \frac{q(x^{t-1}|x^0)}{q(x^t|x^0)}\bigg] \ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^{t}, x^0)}\bigg] + \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{q(x^{t-1}|x^0)}{q(x^t|x^0)}\bigg]\ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^{t}, x^0)}\bigg] + \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\prod_{t=2}^{T} \frac{q(x^{t-1}|x^0)}{q(x^t|x^0)}\bigg]\ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^{t}, x^0)}\bigg] + \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{q(x^{1}|x^0)}{q(x^T|x^0)}\bigg]\ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^{t}, x^0)}\bigg] + \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[q(x^{1}|x^0)\bigg] - \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[q(x^T|x^0)\bigg]\ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^{t}, x^0)}\bigg] + \int dx^{0}dx^{1} \cdots dx^{T} \cdot q(x^{0}) \cdot q(x^1|x^0) \cdot q(x^2|x^1) \cdots q(x^T|x^{T-1}) \cdot log \bigg[q(x^{1}|x^0)\bigg] - \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[q(x^T|x^0)\bigg]\ \end{split} \end{equation}
K
1
=
∫
d
x
0
,
1
,
2
⋯
T
⋅
q
(
x
0
,
1
,
2
⋯
T
)
⋅
l
o
g
[
t
=
1
∏
T
q
(
x
t
∣
x
t
−
1
)
p
(
x
t
−
1
∣
x
t
)
]
=
t
=
1
∑
T
∫
d
x
0
,
1
,
2
⋯
T
⋅
q
(
x
0
,
1
,
2
⋯
T
)
⋅
l
o
g
[
q
(
x
t
∣
x
t
−
1
)
p
(
x
t
−
1
∣
x
t
)
]
=
t
=
2
∑
T
∫
d
x
0
,
1
,
2
⋯
T
⋅
q
(
x
0
,
1
,
2
⋯
T
)
⋅
l
o
g
[
q
(
x
t
∣
x
t
−
1
)
p
(
x
t
−
1
∣
x
t
)
]
∫
d
x
0
,
1
,
2
⋯
T
⋅
q
(
x
0
,
1
,
2
⋯
T
)
⋅
l
o
g
[
q
(
x
1
∣
x
0
)
p
(
x
0
∣
x
1
)
]
=
t
=
2
∑
T
∫
d
x
0
,
1
,
2
⋯
T
⋅
q
(
x
0
,
1
,
2
⋯
T
)
⋅
l
o
g
[
q
(
x
t
∣
x
t
−
1
)
p
(
x
t
−
1
∣
x
t
)
]
=
t
=
2
∑
T
∫
d
x
0
,
1
,
2
⋯
T
⋅
q
(
x
0
,
1
,
2
⋯
T
)
⋅
l
o
g
[
q
(
x
t
−
1
∣
x
t
)
p
(
x
t
−
1
∣
x
t
)
⋅
q
(
x
t
)
q
(
x
t
−
1
)
]
=
t
=
2
∑
T
∫
d
x
0
,
1
,
2
⋯
T
⋅
q
(
x
0
,
1
,
2
⋯
T
)
⋅
l
o
g
[
q
(
x
t
−
1
∣
x
t
,
x
0
)
p
(
x
t
−
1
∣
x
t
)
⋅
q
(
x
t
∣
x
0
)
q
(
x
t
−
1
∣
x
0
)
]
=
t
=
2
∑
T
∫
d
x
0
,
1
,
2
⋯
T
⋅
q
(
x
0
,
1
,
2
⋯
T
)
⋅
l
o
g
[
q
(
x
t
−
1
∣
x
t
,
x
0
)
p
(
x
t
−
1
∣
x
t
)
]
t
=
2
∑
T
∫
d
x
0
,
1
,
2
⋯
T
⋅
q
(
x
0
,
1
,
2
⋯
T
)
⋅
l
o
g
[
q
(
x
t
∣
x
0
)
q
(
x
t
−
1
∣
x
0
)
]
=
t
=
2
∑
T
∫
d
x
0
,
1
,
2
⋯
T
⋅
q
(
x
0
,
1
,
2
⋯
T
)
⋅
l
o
g
[
q
(
x
t
−
1
∣
x
t
,
x
0
)
p
(
x
t
−
1
∣
x
t
)
]
∫
d
x
0
,
1
,
2
⋯
T
⋅
q
(
x
0
,
1
,
2
⋯
T
)
⋅
l
o
g
[
t
=
2
∏
T
q
(
x
t
∣
x
0
)
q
(
x
t
−
1
∣
x
0
)
]
=
t
=
2
∑
T
∫
d
x
0
,
1
,
2
⋯
T
⋅
q
(
x
0
,
1
,
2
⋯
T
)
⋅
l
o
g
[
q
(
x
t
−
1
∣
x
t
,
x
0
)
p
(
x
t
−
1
∣
x
t
)
]
∫
d
x
0
,
1
,
2
⋯
T
⋅
q
(
x
0
,
1
,
2
⋯
T
)
⋅
l
o
g
[
q
(
x
T
∣
x
0
)
q
(
x
1
∣
x
0
)
]
=
t
=
2
∑
T
∫
d
x
0
,
1
,
2
⋯
T
⋅
q
(
x
0
,
1
,
2
⋯
T
)
⋅
l
o
g
[
q
(
x
t
−
1
∣
x
t
,
x
0
)
p
(
x
t
−
1
∣
x
t
)
]
∫
d
x
0
,
1
,
2
⋯
T
⋅
q
(
x
0
,
1
,
2
⋯
T
)
⋅
l
o
g
[
q
(
x
1
∣
x
0
)
]
−
∫
d
x
0
,
1
,
2
⋯
T
⋅
q
(
x
0
,
1
,
2
⋯
T
)
⋅
l
o
g
[
q
(
x
T
∣
x
0
)
]
=
t
=
2
∑
T
∫
d
x
0
,
1
,
2
⋯
T
⋅
q
(
x
0
,
1
,
2
⋯
T
)
⋅
l
o
g
[
q
(
x
t
−
1
∣
x
t
,
x
0
)
p
(
x
t
−
1
∣
x
t
)
]
∫
d
x
0
d
x
1
⋯
d
x
T
⋅
q
(
x
0
)
⋅
q
(
x
1
∣
x
0
)
⋅
q
(
x
2
∣
x
1
)
⋯
q
(
x
T
∣
x
T
−
1
)
⋅
l
o
g
[
q
(
x
1
∣
x
0
)
]
−
∫
d
x
0
,
1
,
2
⋯
T
⋅
q
(
x
0
,
1
,
2
⋯
T
)
⋅
l
o
g
[
q
(
x
T
∣
x
0
)
]