Deep Learning Basics Lecture 8: Autoencoder & DBM
Princeton University COS 495 Instructor: Yingyu Liang
Lecture 8: Autoencoder & DBM Princeton University COS 495 - - PowerPoint PPT Presentation
Deep Learning Basics Lecture 8: Autoencoder & DBM Princeton University COS 495 Instructor: Yingyu Liang Autoencoder Autoencoder Neural networks trained to attempt to copy its input to its output Contain two parts: Encoder: map
Princeton University COS 495 Instructor: Yingyu Liang
β π¦ π Hidden representation (the code) Reconstruction Input
β π¦ π Decoder π(β ) Encoder π(β ) β = π π¦ , π = π β = π(π π¦ )
hopefully can learn useful properties of the data
Hinton and Zemel, 1994).
π π¦, π = π(π¦, π π π¦ ) β π¦ π
π π¦, π = π(π¦, π π π¦ )
the model to have other properties
ππ = π(π¦, π π π¦ ) + π(β) β π¦ π
log π(π¦) = log ΰ·
ββ²
π(ββ², π¦)
max log π(π¦) = max log ΰ·
ββ²
π(ββ², π¦)
representation, and Οββ² π(ββ², π¦) can be approximated by π(β, π¦)
max log π(β, π¦) = max log π(π¦|β) + log π(β) Regularization Loss
π 2 exp(β π 2 β 1)
ππ = π(π¦, π π π¦ ) + π β 1
to be identity
π π¦, π = π(π¦, π π ΰ·€ π¦ ) where ΰ·€ π¦ is π¦ + ππππ‘π
distributions over binary vectors
exp(βπΉ π¦ ) π
π π¦ = exp(βπΉ π¦ ) π
πΉ π¦ = βπ¦πππ¦ β πππ¦ where π is the weight matrix and π is the bias parameter
π¦ = π¦π€, π¦β , π¦π€ visible, π¦β hidden πΉ π¦ = βπ¦π€
πππ¦π€ β π¦π€ πππ¦β β π¦β πππ¦β β πππ¦π€ β πππ¦β
1, π¦π€ 2, β¦ , π¦π€ π
log π π = ΰ·
π
log π(π¦π€
π )
where π π¦π€ = ΰ·
π¦β
π(π¦π€, π¦β) = ΰ·
π¦β
1 π exp(βπΉ(π¦π€, π¦β))
machine
π π€, β = exp(βπΉ π€, β ) π where the energy function is πΉ π€, β = βπ€ππβ β πππ€ β ππβ with the weight matrix π and the bias π, π
π = ΰ·
π€
ΰ·
β
exp(βπΉ π€, β )
Figure from Deep Learning, Goodfellow, Bengio and Courville
π β|π€ = π(π€, β) π(π€) = ΰ·
π
π(βπ|π€) and π βπ = 1|π€ = π π
π + π€ππ :,π
is logistic function
π π€|β = π(π€, β) π(β) = ΰ·
π
π(π€π|β) and π π€π = 1|β = π ππ + π
π,:β
is logistic function
π π€, β1, β2, β3 = exp(βπΉ π€, β1, β2, β3 ) π
πΉ π€, β1, β2, β3 = βπ€ππ1β1 β (β1)ππ2β2 β (β2)ππ3β3 with the weight matrices π1, π2, π3
π = ΰ·
π€,β1,β2,β3
exp(βπΉ π€, β1, β2, β3 )
Figure from Deep Learning, Goodfellow, Bengio and Courville