目录

论文阅读笔记Deep-Unsupervised-Learning-using-Nonequilibrium-Thermodynamics

论文阅读笔记:Deep Unsupervised Learning using Nonequilibrium Thermodynamics

1、来源

论文连接1:

论文连接2(带appendix):

代码链接:

代码的环境配置(基于theano)参考:

2、论文推理过程

扩散模型的流程如下图所示,可以看出

q ( x 0 , 1 , 2 ⋯   , T − 1 , T ) q(x^{0,1,2\cdots ,T-1, T})

q

(

x

0

,

1

,

2

,

T

1

,

T

) 为正向加噪音过程,

p ( x 0 , 1 , 2 ⋯   , T − 1 , T ) p(x^{0,1,2\cdots ,T-1, T})

p

(

x

0

,

1

,

2

,

T

1

,

T

) 为逆向去噪音过程,具体过程参考 。可以看出,逆向去噪的末端得到的图上还散布一些噪点。

https://i-blog.csdnimg.cn/direct/b50f0489a8124a8b8e33249138b25f4f.jpeg

2.1、名词解释

q ( x 0 ) q(x^0)

q

(

x

0

) :

x 0 x^0

x

0 表示数据集的图像分布,例如在使用MNIST数据集时,

x 0 x^0

x

0 就表示MNIST数据集中的图像,而

q ( x 0 ) q(x^0)

q

(

x

0

) 就表示数据集MNIST中数据集的分布情况。

p ( x T ) p(x^T)

p

(

x

T

) :

x T x^T

x

T 表示

x 0 x^0

x

0 的加噪结果,

x T x^T

x

T 是逆向去噪的起点,因此

p ( x T ) p(x^T)

p

(

x

T

) 是去噪起点的分布情况。与

π ( x T ) \pi(x^T)

π

(

x

T

) 相同。

值得注意的是

p ( x t ) p(x^t)

p

(

x

t

) 与

q ( x t ) q(x^t)

q

(

x

t

) 是相同的。

2.2、推理过程

正向加噪过程满足马尔可夫性质,因此有公式1。

q ( x 0 , 1 , 2 ⋯   , T − 1 , T )

q ( x 0 ) ⋅ ∏ t

1 T q ( x t ∣ x t − 1 )

q ( x 0 ) ⋅ q ( x 1 ∣ x 0 ) ⋅ q ( x 2 ∣ x 1 ) … q ( x T ∣ x T − 1 ) . q ( x 1 , 2 ⋯ T ∣ x 0 )

q ( x 1 ∣ x 0 ) ⋅ q ( x 2 ∣ x 1 ) … q ( x T ∣ x T − 1 ) ) . \begin{equation} \begin{split} q(x^{0,1,2\cdots,T-1,T})&=q(x^0)\cdot \prod_{t=1}^{T}{q(x^t|x^{t-1})}=q(x^0)\cdot q(x^1|x^0)\cdot q(x^2|x^1)\dots q(x^T|x^{T-1}). \ q(x^{1,2 \cdots T}|x^0)&=q(x^1|x^0)\cdot q(x^2|x^1)\dots q(x^T|x^{T-1})). \end{split} \end{equation}

q

(

x

0

,

1

,

2

,

T

1

,

T

)

q

(

x

1

,

2

T

x

0

)

=

q

(

x

0

)

t

=

1

T

q

(

x

t

x

t

1

)

=

q

(

x

0

)

q

(

x

1

x

0

)

q

(

x

2

x

1

)

q

(

x

T

x

T

1

)

.

=

q

(

x

1

x

0

)

q

(

x

2

x

1

)

q

(

x

T

x

T

1

))

.

逆向去噪过程如公式2。

p θ ( x 0 , 1 , 2 ⋯   , T − 1 , T )

p θ ( x T ) ⋅ ∏ t

1 T p θ ( x t − 1 ∣ x t )

p θ ( x T ) ⋅ p θ ( x T − 1 ∣ x T ) ⋅ p θ ( x T − 2 ∣ x T − 1 ) … p θ ( x 0 ∣ x 1 ) . \begin{equation} p_{\theta}(x^{0,1,2\cdots,T-1,T})=p_{\theta}(x^T)\cdot \prod_{t=1}^{T}{p_{\theta}(x^{t-1}|x^{t})}=p_{\theta}(x^T)\cdot p_{\theta}(x^{T-1}|x^T)\cdot p_{\theta}(x^{T-2}|x^{T-1})\dots p_{\theta}(x^{0}|x^{1}). \end{equation}

p

θ

(

x

0

,

1

,

2

,

T

1

,

T

)

=

p

θ

(

x

T

)

t

=

1

T

p

θ

(

x

t

1

x

t

)

=

p

θ

(

x

T

)

p

θ

(

x

T

1

x

T

)

p

θ

(

x

T

2

x

T

1

)

p

θ

(

x

0

x

1

)

.

公式2中的参数

θ \theta

θ 就是深度学习模型中需要学习的参数。为了方便,省略公式2中的

θ \theta

θ ,因此公式2被重写为公式3。

p ( x 0 , 1 , 2 ⋯   , T − 1 , T )

p ( x T ) ⋅ ∏ t

1 T p ( x t − 1 ∣ x t )

p ( x T ) ⋅ p ( x T − 1 ∣ x T ) ⋅ p ( x T − 2 ∣ x T − 1 ) … p ( x 0 ∣ x 1 ) . \begin{equation} p(x^{0,1,2\cdots,T-1,T})=p(x^T)\cdot \prod_{t=1}^{T}{p(x^{t-1}|x^{t})}=p(x^T)\cdot p(x^{T-1}|x^T)\cdot p(x^{T-2}|x^{T-1})\dots p(x^{0}|x^{1}). \end{equation}

p

(

x

0

,

1

,

2

,

T

1

,

T

)

=

p

(

x

T

)

t

=

1

T

p

(

x

t

1

x

t

)

=

p

(

x

T

)

p

(

x

T

1

x

T

)

p

(

x

T

2

x

T

1

)

p

(

x

0

x

1

)

.

逆向去噪的目标是使得其终点与正向加噪的起点相同。也就是使得

p ( x 0 ) p(x^0)

p

(

x

0

) 最大,即使得 逆向去噪过程为

x 0 x^0

x

0 的概率最大。

p ( x 0 )

∫ p ( x 0 , x 1 ) d x 1 ( 联合分布概率公式 )

∫ p ( x 1 ) ⋅ p ( x 0 ∣ x 1 ) d x 1 ( 贝叶斯概率公式 )

∫ ∫ p ( x 1 , x 2 ) d x 2 ⋅ p ( x 0 ∣ x 1 ) d x 1 ( 积分套积分 )

∫ ∫ p ( x 2 ) ⋅ p ( x 1 ∣ x 2 ) ⋅ p ( x 0 ∣ x 1 ) d x 1 d x 2 ( 改写为二重积分 )

∫ ∫ p ( x 2 ) ⋅ p ( x 1 ∣ x 2 ) ⋅ p ( x 0 ∣ x 1 ) d x 1 d x 2

∫ ∫ ⋯ ∫ p ( x T ) ⋅ p ( x T − 1 ∣ x T ) ⋅ p ( x T − 2 ∣ x − 1 ) ⋯ p ( x 0 ∣ x 1 ) ⋅ d x 1 d x 2 ⋯ d x T

∫ p ( x 0 , 1 , 2 ⋯ T ) d x 1 , 2 ⋯ T ( T − 1 重积分 )

∫ d x 1 , 2 ⋯ T ⋅ p ( x 0 , 1 , 2 ⋯ T ) ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) q ( x 1 , 2 ⋯ T ∣ x 0 )

∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ p ( x 0 , 1 , 2 ⋯ T ) q ( x 1 , 2 ⋯ T ∣ x 0 )

∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ p ( x T ) ⋅ p ( x T − 1 ∣ x T ) ⋅ p ( x T − 2 ∣ x T − 1 ) … p ( x 0 ∣ x 1 ) q ( x 1 ∣ x 0 ) ⋅ q ( x 2 ∣ x 1 ) … q ( x T ∣ x T − 1 )

∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ p ( x T ) ⋅ p ( x T − 1 ∣ x T ) ⋅ p ( x T − 2 ∣ x T − 1 ) … p ( x 0 ∣ x 1 ) q ( x 1 ∣ x 0 ) ⋅ q ( x 2 ∣ x 1 ) … q ( x T ∣ x T − 1 )

∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ p ( x T ) ⋅ ∏ t

1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 )

E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) p ( x T ) ⋅ ∏ t

1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ( 改写为期望的形式 ) \begin{equation} \begin{split} p(x^0)&=\int p(x^0,x^1)dx^{1} (联合分布概率公式)\ &=\int p(x^1)\cdot p(x^0|x^1)dx^1 (贝叶斯概率公式) \ &=\int \int p(x1,x2)dx^2 \cdot p(x^0|x^1)dx^1 (积分套积分)\ &=\int \int p(x^2)\cdot p(x^1|x^2) \cdot p(x^0|x^1)dx^1 dx^2(改写为二重积分)\ &= \int \int p(x^2) \cdot p(x^1|x^2) \cdot p(x^0|x^1) dx^1 dx^2 \ &= \vdots \ &= \int \int \cdots \int p(x^T)\cdot p(x^{T-1}|x^{T})\cdot p(x^{T-2}|x^{-1})\cdots p(x^0|x^1) \cdot dx^1 dx^2 \cdots dx^T \ &= \int p(x^{0,1,2 \cdots T})dx^{1,2\cdots T} (T-1重积分) \ &= \int dx^{1,2\cdots T} \cdot p(x^{0,1,2 \cdots T}) \cdot \frac{q(x^{1,2 \cdots T}| x^0)}{q(x^{1,2 \cdots T}|x^0)} \ &= \int dx^{1,2\cdots T} \cdot q(x^{1,2 \cdots T}| x^0) \cdot \frac{ p(x^{0,1,2 \cdots T}) }{q(x^{1,2 \cdots T}|x^0)} \ &= \int dx^{1,2\cdots T} \cdot q(x^{1,2 \cdots T}| x^0) \cdot \frac{ p(x^T)\cdot p(x^{T-1}|x^T)\cdot p(x^{T-2}|x^{T-1})\dots p(x^{0}|x^{1})}{q(x^1|x^0)\cdot q(x^2|x^1)\dots q(x^T|x^{T-1})} \ &= \int dx^{1,2\cdots T} \cdot q(x^{1,2 \cdots T}| x^0) \cdot p(x^T)\cdot \frac{ p(x^{T-1}|x^T)\cdot p(x^{T-2}|x^{T-1})\dots p(x^{0}|x^{1})}{q(x^1|x^0)\cdot q(x^2|x^1)\dots q(x^T|x^{T-1})} \ &= \int dx^{1,2\cdots T} \cdot q(x^{1,2 \cdots T}| x^0) \cdot p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})} \ &= E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})} (改写为期望的形式)\ \end{split} \end{equation}

p

(

x

0

)

=

p

(

x

0

,

x

1

)

d

x

1

(

联合分布概率公式

)

=

p

(

x

1

)

p

(

x

0

x

1

)

d

x

1

(

贝叶斯概率公式

)

=

∫∫

p

(

x

1

,

x

2

)

d

x

2

p

(

x

0

x

1

)

d

x

1

(

积分套积分

)

=

∫∫

p

(

x

2

)

p

(

x

1

x

2

)

p

(

x

0

x

1

)

d

x

1

d

x

2

(

改写为二重积分

)

=

∫∫

p

(

x

2

)

p

(

x

1

x

2

)

p

(

x

0

x

1

)

d

x

1

d

x

2

=

=

∫∫

p

(

x

T

)

p

(

x

T

1

x

T

)

p

(

x

T

2

x

1

)

p

(

x

0

x

1

)

d

x

1

d

x

2

d

x

T

=

p

(

x

0

,

1

,

2

T

)

d

x

1

,

2

T

(

T

1

重积分

)

=

d

x

1

,

2

T

p

(

x

0

,

1

,

2

T

)

q

(

x

1

,

2

T

x

0

)

q

(

x

1

,

2

T

x

0

)

=

d

x

1

,

2

T

q

(

x

1

,

2

T

x

0

)

q

(

x

1

,

2

T

x

0

)

p

(

x

0

,

1

,

2

T

)

=

d

x

1

,

2

T

q

(

x

1

,

2

T

x

0

)

q

(

x

1

x

0

)

q

(

x

2

x

1

)

q

(

x

T

x

T

1

)

p

(

x

T

)

p

(

x

T

1

x

T

)

p

(

x

T

2

x

T

1

)

p

(

x

0

x

1

)

=

d

x

1

,

2

T

q

(

x

1

,

2

T

x

0

)

p

(

x

T

)

q

(

x

1

x

0

)

q

(

x

2

x

1

)

q

(

x

T

x

T

1

)

p

(

x

T

1

x

T

)

p

(

x

T

2

x

T

1

)

p

(

x

0

x

1

)

=

d

x

1

,

2

T

q

(

x

1

,

2

T

x

0

)

p

(

x

T

)

t

=

1

T

q

(

x

t

x

t

1

)

p

(

x

t

1

x

t

)

=

E

x

1

,

2

,

T

q

(

x

1

,

2

T

x

0

)

p

(

x

T

)

t

=

1

T

q

(

x

t

x

t

1

)

p

(

x

t

1

x

t

)

(

改写为期望的形式

)

因此公式3中的参数

θ \theta

θ 应满足

θ

a r g max θ p ( x 0 ) . \begin{equation} \theta= arg \underset {\theta}{\text{max}} p(x^0). \end{equation}

θ

=

a

r

g

θ

max

p

(

x

0

)

.

公式4是对数据集中的一张图片进行求解,然而数据集中通常是有成千上万张图像的。假设数据集中有

N N

N 张图像,因此有公式6,其目的是求得一组参数

θ \theta

θ ,使得

L L

L 取得最大值。值得注意的是

q ( x 0 ) q(x^0)

q

(

x

0

) 表示数据集中每张图片被采样出来的概率。

L

∑ n

0 N q ( x 0 ) ⋅ l o g ( p ( x 0 ) )

∫ d x 0 ⋅ q ( x 0 ) ⋅ l o g ( p ( x 0 ) )

∫ d x 0 ⋅ q ( x 0 ) ⋅ l o g [ E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) p ( x T ) ⋅ ∏ t

1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] ≥ ∫ d x 0 ⋅ q ( x 0 ) ⋅ E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) l o g [ p ( x T ) ⋅ ∏ t

1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ]

∫ d x 0 ⋅ q ( x 0 ) ∫ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ l o g [ p ( x T ) ⋅ ∏ t

1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] ⋅ d x 1 , 2 ⋯ T

∫ d x 0 , 1 , 2 ⋯ T q ( x 0 ) ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ l o g [ p ( x T ) ⋅ ∏ t

1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ]

∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x T ) ⋅ ∏ t

1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ]

∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ ∏ t

1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] + ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x T ) ]

K \begin{equation} \begin{split} L&=\sum_{n=0}^{N} q(x^0)\cdot log(p(x^0)) \ &=\int dx^0\cdot q(x^0)\cdot log(p(x^0)) \ &=\int dx^0\cdot q(x^0)\cdot log [ E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}] \ & \geq \int dx^0\cdot q(x^0)\cdot E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} log [p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}]\ &= \int dx^0\cdot q(x^0) \int q(x^{1,2 \cdots T}| x^0) \cdot log [p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}] \cdot dx^{1,2\cdots T}\ &= \int dx^{0,1,2\cdots T} q(x^0) \cdot q(x^{1,2 \cdots T}| x^0) \cdot log [p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}] \ &= \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log [p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}] \ &= \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log [\prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}] + \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log [p(x^T)] \ &= K \ \end{split} \end{equation}

L

=

n

=

0

N

q

(

x

0

)

l

o

g

(

p

(

x

0

))

=

d

x

0

q

(

x

0

)

l

o

g

(

p

(

x

0

))

=

d

x

0

q

(

x

0

)

l

o

g

[

E

x

1

,

2

,

T

q

(

x

1

,

2

T

x

0

)

p

(

x

T

)

t

=

1

T

q

(

x

t

x

t

1

)

p

(

x

t

1

x

t

)

]

d

x

0

q

(

x

0

)

E

x

1

,

2

,

T

q

(

x

1

,

2

T

x

0

)

l

o

g

[

p

(

x

T

)

t

=

1

T

q

(

x

t

x

t

1

)

p

(

x

t

1

x

t

)

]

=

d

x

0

q

(

x

0

)

q

(

x

1

,

2

T

x

0

)

l

o

g

[

p

(

x

T

)

t

=

1

T

q

(

x

t

x

t

1

)

p

(

x

t

1

x

t

)

]

d

x

1

,

2

T

=

d

x

0

,

1

,

2

T

q

(

x

0

)

q

(

x

1

,

2

T

x

0

)

l

o

g

[

p

(

x

T

)

t

=

1

T

q

(

x

t

x

t

1

)

p

(

x

t

1

x

t

)

]

=

d

x

0

,

1

,

2

T

q

(

x

0

,

1

,

2

T

)

l

o

g

[

p

(

x

T

)

t

=

1

T

q

(

x

t

x

t

1

)

p

(

x

t

1

x

t

)

]

=

d

x

0

,

1

,

2

T

q

(

x

0

,

1

,

2

T

)

l

o

g

[

t

=

1

T

q

(

x

t

x

t

1

)

p

(

x

t

1

x

t

)

]

d

x

0

,

1

,

2

T

q

(

x

0

,

1

,

2

T

)

l

o

g

[

p

(

x

T

)]

=

K

因此有公式

K

∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ ∏ t

1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] ⏟ K 1 + ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x T ) ] ⏟ K 2

K 1 + K 2 \begin{equation} \begin{split} K &= \underbrace{\int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log [\prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}]}{K1} + \underbrace{\int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log [p(x^T)]}{K_2} \ &=K_1 + K_2 \end{split} \end{equation}

K

=

K

1

d

x

0

,

1

,

2

T

q

(

x

0

,

1

,

2

T

)

l

o

g

[

t

=

1

T

q

(

x

t

x

t

1

)

p

(

x

t

1

x

t

)

]

K

2

d

x

0

,

1

,

2

T

q

(

x

0

,

1

,

2

T

)

l

o

g

[

p

(

x

T

)]

=

K

1

K

2

首先考虑

K K

K 中的第二项

K 2 K_2

K

2

K 2

∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x T ) ]

∫ q ( x 0 ) ⋅ q ( x 1 ∣ x 0 ) ⋅ q ( x 2 ∣ x 1 ) ⋯ q ( x T ∣ x T − 1 ) ⋅ l o g [ p ( x T ) ] ⋅ d x 0 d x 1 ⋯ d x T

∫ ( ∫ q ( x 1 , x 0 ) ⋅ d x 0 ) ⋅ q ( x 2 ∣ x 1 ) ⋯ q ( x T ∣ x T − 1 ) ⋅ l o g [ p ( x T ) ] ⋅ d x 1 ⋯ d x T

∫ q ( x 1 ) ⋅ q ( x 2 ∣ x 1 ) ⋯ q ( x T ∣ x T − 1 ) ⋅ l o g [ p ( x T ) ] ⋅ d x 1 ⋯ d x T

∫ q ( x T ) ⋅ l o g [ p ( x T ) ] ⋅ d x T

∫ p ( x T ) ⋅ l o g [ p ( x T ) ] ⋅ d x T

− H p ( x T ) \begin{equation} \begin{split} K_2 &= \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log [p(x^T)] \ &= \int q(x^0)\cdot q(x^1|x^0) \cdot q(x^2|x^1) \cdots q(x^{T}|x^{T-1})\cdot log [p(x^T)] \cdot dx^0 dx^1\cdots dx^{T} \ &= \int \bigg( \int q(x^1, x^0) \cdot dx^0 \bigg) \cdot q(x^2|x^1) \cdots q(x^{T}|x^{T-1})\cdot log [p(x^T)] \cdot dx^1\cdots dx^{T} \ &= \int q(x^1) \cdot q(x^2|x^1) \cdots q(x^{T}|x^{T-1})\cdot log [p(x^T)] \cdot dx^1\cdots dx^{T} \ &= \int q(x^T) \cdot log [p(x^T)] \cdot dx^{T} \ &= \int p(x^T) \cdot log [p(x^T)] \cdot dx^{T} \ &=-H_p(x^T) \end{split} \end{equation}

K

2

=

d

x

0

,

1

,

2

T

q

(

x

0

,

1

,

2

T

)

l

o

g

[

p

(

x

T

)]

=

q

(

x

0

)

q

(

x

1

x

0

)

q

(

x

2

x

1

)

q

(

x

T

x

T

1

)

l

o

g

[

p

(

x

T

)]

d

x

0

d

x

1

d

x

T

=

(

q

(

x

1

,

x

0

)

d

x

0

)

q

(

x

2

x

1

)

q

(

x

T

x

T

1

)

l

o

g

[

p

(

x

T

)]

d

x

1

d

x

T

=

q

(

x

1

)

q

(

x

2

x

1

)

q

(

x

T

x

T

1

)

l

o

g

[

p

(

x

T

)]

d

x

1

d

x

T

=

q

(

x

T

)

l

o

g

[

p

(

x

T

)]

d

x

T

=

p

(

x

T

)

l

o

g

[

p

(

x

T

)]

d

x

T

=

H

p

(

x

T

)

p ( x T ) p(x^T)

p

(

x

T

) 是一个均值为0,方差为1的高斯分布。参考 ,可以计算出

K 2 K_2

K

2

如下所示。

K 2

− H p ( x T )

− ( 1 2 l o g [ 2 π σ 2 ] + 1 2 )

− ( 1 2 l o g [ 2 π ] + 1 2 ) \begin{equation} \begin{split} K_2 &=-H_p(x^T) \ &=-\bigg( \frac{1}{2} log[2 \pi \sigma^2] + \frac{1}{2} \bigg)\ &=-\bigg( \frac{1}{2} log[2 \pi ] + \frac{1}{2} \bigg) \end{split} \end{equation}

K

2

=

H

p

(

x

T

)

=

(

2

1

l

o

g

[

2

π

σ

2

]

2

1

)

=

(

2

1

l

o

g

[

2

π

]

2

1

)

在代码中的计算过程如下图红框所示。

https://i-blog.csdnimg.cn/direct/794796ca446b4619a7d3c2f2537a60a0.png

接下来考虑

K 1 K_1

K

1

。值得注意的是,论文中说明,为了避免边界效应,因此强迫

p ( x 0 ∣ x 1 )

q ( x 1 ∣ x 0 ) p(x^{0}|x^1)=q(x^1|x^{0})

p

(

x

0

x

1

)

=

q

(

x

1

x

0

) 。

K 1

∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ ∏ t

1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ]

∑ t

1 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ]

∑ t

2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] + ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x 0 ∣ x 1 ) q ( x 1 ∣ x 0 ) ]

∑ t

2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ]

∑ t

2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t ) ⋅ q ( x t − 1 ) q ( x t ) ]

∑ t

2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ⋅ q ( x t − 1 ∣ x 0 ) q ( x t ∣ x 0 ) ]

∑ t

2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] + ∑ t

2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ q ( x t − 1 ∣ x 0 ) q ( x t ∣ x 0 ) ]

∑ t

2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] + ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ ∏ t

2 T q ( x t − 1 ∣ x 0 ) q ( x t ∣ x 0 ) ]

∑ t

2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] + ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ q ( x 1 ∣ x 0 ) q ( x T ∣ x 0 ) ]

∑ t

2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] + ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ q ( x 1 ∣ x 0 ) ] − ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ q ( x T ∣ x 0 ) ]

∑ t

2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] + ∫ d x 0 d x 1 ⋯ d x T ⋅ q ( x 0 ) ⋅ q ( x 1 ∣ x 0 ) ⋅ q ( x 2 ∣ x 1 ) ⋯ q ( x T ∣ x T − 1 ) ⋅ l o g [ q ( x 1 ∣ x 0 ) ] − ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ q ( x T ∣ x 0 ) ] \begin{equation} \begin{split} K_1 &=\int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}\bigg] \ &= \sum_{t=1}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}\bigg] \ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}\bigg] + \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{0}|x^1)}{q(x^1|x^{0})}\bigg] \ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}\bigg] \ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^{t})}\cdot \frac{q(x^{t-1})}{q(x^t)}\bigg] \ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^{t}, x^0)}\cdot \frac{q(x^{t-1}|x^0)}{q(x^t|x^0)}\bigg] \ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^{t}, x^0)}\bigg] + \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{q(x^{t-1}|x^0)}{q(x^t|x^0)}\bigg]\ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^{t}, x^0)}\bigg] + \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\prod_{t=2}^{T} \frac{q(x^{t-1}|x^0)}{q(x^t|x^0)}\bigg]\ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^{t}, x^0)}\bigg] + \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{q(x^{1}|x^0)}{q(x^T|x^0)}\bigg]\ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^{t}, x^0)}\bigg] + \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[q(x^{1}|x^0)\bigg] - \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[q(x^T|x^0)\bigg]\ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^{t}, x^0)}\bigg] + \int dx^{0}dx^{1} \cdots dx^{T} \cdot q(x^{0}) \cdot q(x^1|x^0) \cdot q(x^2|x^1) \cdots q(x^T|x^{T-1}) \cdot log \bigg[q(x^{1}|x^0)\bigg] - \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[q(x^T|x^0)\bigg]\ \end{split} \end{equation}

K

1

=

d

x

0

,

1

,

2

T

q

(

x

0

,

1

,

2

T

)

l

o

g

[

t

=

1

T

q

(

x

t

x

t

1

)

p

(

x

t

1

x

t

)

]

=

t

=

1

T

d

x

0

,

1

,

2

T

q

(

x

0

,

1

,

2

T

)

l

o

g

[

q

(

x

t

x

t

1

)

p

(

x

t

1

x

t

)

]

=

t

=

2

T

d

x

0

,

1

,

2

T

q

(

x

0

,

1

,

2

T

)

l

o

g

[

q

(

x

t

x

t

1

)

p

(

x

t

1

x

t

)

]

d

x

0

,

1

,

2

T

q

(

x

0

,

1

,

2

T

)

l

o

g

[

q

(

x

1

x

0

)

p

(

x

0

x

1

)

]

=

t

=

2

T

d

x

0

,

1

,

2

T

q

(

x

0

,

1

,

2

T

)

l

o

g

[

q

(

x

t

x

t

1

)

p

(

x

t

1

x

t

)

]

=

t

=

2

T

d

x

0

,

1

,

2

T

q

(

x

0

,

1

,

2

T

)

l

o

g

[

q

(

x

t

1

x

t

)

p

(

x

t

1

x

t

)

q

(

x

t

)

q

(

x

t

1

)

]

=

t

=

2

T

d

x

0

,

1

,

2

T

q

(

x

0

,

1

,

2

T

)

l

o

g

[

q

(

x

t

1

x

t

,

x

0

)

p

(

x

t

1

x

t

)

q

(

x

t

x

0

)

q

(

x

t

1

x

0

)

]

=

t

=

2

T

d

x

0

,

1

,

2

T

q

(

x

0

,

1

,

2

T

)

l

o

g

[

q

(

x

t

1

x

t

,

x

0

)

p

(

x

t

1

x

t

)

]

t

=

2

T

d

x

0

,

1

,

2

T

q

(

x

0

,

1

,

2

T

)

l

o

g

[

q

(

x

t

x

0

)

q

(

x

t

1

x

0

)

]

=

t

=

2

T

d

x

0

,

1

,

2

T

q

(

x

0

,

1

,

2

T

)

l

o

g

[

q

(

x

t

1

x

t

,

x

0

)

p

(

x

t

1

x

t

)

]

d

x

0

,

1

,

2

T

q

(

x

0

,

1

,

2

T

)

l

o

g

[

t

=

2

T

q

(

x

t

x

0

)

q

(

x

t

1

x

0

)

]

=

t

=

2

T

d

x

0

,

1

,

2

T

q

(

x

0

,

1

,

2

T

)

l

o

g

[

q

(

x

t

1

x

t

,

x

0

)

p

(

x

t

1

x

t

)

]

d

x

0

,

1

,

2

T

q

(

x

0

,

1

,

2

T

)

l

o

g

[

q

(

x

T

x

0

)

q

(

x

1

x

0

)

]

=

t

=

2

T

d

x

0

,

1

,

2

T

q

(

x

0

,

1

,

2

T

)

l

o

g

[

q

(

x

t

1

x

t

,

x

0

)

p

(

x

t

1

x

t

)

]

d

x

0

,

1

,

2

T

q

(

x

0

,

1

,

2

T

)

l

o

g

[

q

(

x

1

x

0

)

]

d

x

0

,

1

,

2

T

q

(

x

0

,

1

,

2

T

)

l

o

g

[

q

(

x

T

x

0

)

]

=

t

=

2

T

d

x

0

,

1

,

2

T

q

(

x

0

,

1

,

2

T

)

l

o

g

[

q

(

x

t

1

x

t

,

x

0

)

p

(

x

t

1

x

t

)

]

d

x

0

d

x

1

d

x

T

q

(

x

0

)

q

(

x

1

x

0

)

q

(

x

2

x

1

)

q

(

x

T

x

T

1

)

l

o

g

[

q

(

x

1

x

0

)

]

d

x

0

,

1

,

2

T

q

(

x

0

,

1

,

2

T

)

l

o

g

[

q

(

x

T

x

0

)

]