Learning Light Transport the Reinforced Way Ken Dahm and Alexander - - PowerPoint PPT Presentation
Learning Light Transport the Reinforced Way Ken Dahm and Alexander - - PowerPoint PPT Presentation
Learning Light Transport the Reinforced Way Ken Dahm and Alexander Keller Light Transport Simulation How to do importance sampling compute functionals of a Fredholm integral equation of the 2nd kind L ( x , ) = L e ( x , )+ + ( x
Light Transport Simulation
How to do importance sampling
compute functionals of a Fredholm integral equation of the 2nd kind
L(x,ω) = Le(x,ω)+
- S 2
+(x) L(h(x,ωi),−ωi)fr(ωi,x,ω)cosθidωi 2
Light Transport Simulation
How to do importance sampling
example: direct illumination
L(x,ω) = Le(x,ω)+
- S 2
+(x) Le(h(x,ωi),−ωi)fr(ωi,x,ω)cosθidωi 2
Light Transport Simulation
How to do importance sampling
example: direct illumination
L(x,ω) = Le(x,ω)+
- S 2
+(x) Le(h(x,ωi),−ωi)fr(ωi,x,ω)cosθidωi
≈ Le(x,ω)+ 1 N
N−1
∑
i=0
Le(h(x,ωi),−ωi)fr(ωi,x,ω)cosθi p(ωi)
2
Light Transport Simulation
How to do importance sampling
example: direct illumination
L(x,ω) = Le(x,ω)+
- S 2
+(x) Le(h(x,ωi),−ωi)fr(ωi,x,ω)cosθidωi
≈ Le(x,ω)+ 1 N
N−1
∑
i=0
Le(h(x,ωi),−ωi)fr(ωi,x,ω)cosθi p(ωi) p ∼ fr cosθ
Light Transport Simulation
How to do importance sampling
example: direct illumination
L(x,ω) = Le(x,ω)+
- S 2
+(x) Le(h(x,ωi),−ωi)fr(ωi,x,ω)cosθidωi
≈ Le(x,ω)+ 1 N
N−1
∑
i=0
Le(h(x,ωi),−ωi)fr(ωi,x,ω)cosθi p(ωi) p ∼ fr cosθ p ∼ Lefr cosθ
2
Machine Learning
Machine Learning
Taxonomy supervised learning: learning from labeled data goal: extrapolate/generalize response to unseen data example: artificial neural networks
4
Machine Learning
Taxonomy supervised learning: learning from labeled data goal: extrapolate/generalize response to unseen data example: artificial neural networks unsupervised learning: learning from unlabeled data goal: identify structure in data example: autoencoder networks
4
Machine Learning
Taxonomy supervised learning: learning from labeled data goal: extrapolate/generalize response to unseen data example: artificial neural networks unsupervised learning: learning from unlabeled data goal: identify structure in data example: autoencoder networks semi-supervised learning: reward-based supervision goal: maximize reward example: reinforcement learning
4
The Reinforcement Learning Problem
Maximize reward
policy πt : S → A(St) – to select an action At ∈ A(St) – given current state St ∈ S
Agent St Environment At St+1 Rt+1(At | St)
5
The Reinforcement Learning Problem
Maximize reward
policy πt : S → A(St) – to select an action At ∈ A(St) – given current state St ∈ S state transition yields reward
Rt+1(At | St) ∈ R Agent St Environment At St+1 Rt+1(At | St)
5
The Reinforcement Learning Problem
Maximize reward
policy πt : S → A(St) – to select an action At ∈ A(St) – given current state St ∈ S state transition yields reward
Rt+1(At | St) ∈ R Agent St Environment At St+1 Rt+1(At | St)
classic goal: find policy to maximize discounted cumulative reward
V π(St) ≡
∞
∑
k=0
γkRt+1+k(At+k | St+k), where 0 < γ < 1
5
The Reinforcement Learning Problem
Q-Learning [Watkins 1989]
finds optimal action-selection policy for any given Markov decision process
Q′(s,a) = (1−α) · Q(s,a)+α ·
- r(s,a)+γV(s′)
- for a learning rate α ∈ [0,1]
6
The Reinforcement Learning Problem
Q-Learning [Watkins 1989]
finds optimal action-selection policy for any given Markov decision process
Q′(s,a) = (1−α) · Q(s,a)+α ·
- r(s,a)+γV(s′)
- for a learning rate α ∈ [0,1]
with the following options for the discounted cumulative reward V(s′) ≡ maxa′∈A Q(s′,a′) consider best action
6
The Reinforcement Learning Problem
Q-Learning [Watkins 1989]
finds optimal action-selection policy for any given Markov decision process
Q′(s,a) = (1−α) · Q(s,a)+α ·
- r(s,a)+γV(s′)
- for a learning rate α ∈ [0,1]
with the following options for the discounted cumulative reward V(s′) ≡ maxa′∈A Q(s′,a′) consider best action ∑a′∈A π(s′,a′)Q(s′,a′) policy weighted average over discrete action space
6
The Reinforcement Learning Problem
Q-Learning [Watkins 1989]
finds optimal action-selection policy for any given Markov decision process
Q′(s,a) = (1−α) · Q(s,a)+α ·
- r(s,a)+γV(s′)
- for a learning rate α ∈ [0,1]
with the following options for the discounted cumulative reward V(s′) ≡ maxa′∈A Q(s′,a′) consider best action ∑a′∈A π(s′,a′)Q(s′,a′) policy weighted average over discrete action space
- A π(s′,a′)Q(s′,a′)da′
policy weighted average over continuous action space
6
Light Transport Simulation and Reinforcement Learning
Light Transport Simulation and Reinforcement Learning
Structural equivalence of the integral equations
matching terms
L(x,ω) = Le(x,ω) +
- S 2
+(x)
fr(ωi,x,ω)cosθi L(h(x,ωi),−ωi) dωi Q′(s,a) = (1−α)Q(s,a)+α
- r(s,a)
+ γ
- A
π(s′,a′) Q(s′,a′) da′
- 8
Light Transport Simulation and Reinforcement Learning
Structural equivalence of the integral equations
matching terms
L(x,ω) = Le(x,ω) +
- S 2
+(x)
fr(ωi,x,ω)cosθi L(h(x,ωi),−ωi) dωi Q′(s,a) = (1−α)Q(s,a)+α
- r(s,a)
+ γ
- A
π(s′,a′) Q(s′,a′) da′
- 8
Light Transport Simulation and Reinforcement Learning
Structural equivalence of the integral equations
matching terms
L(x,ω) = Le(x,ω) +
- S 2
+(x)
fr(ωi,x,ω)cosθi L(h(x,ωi),−ωi) dωi Q′(s,a) = (1−α)Q(s,a)+α
- r(s,a)
+ γ
- A
π(s′,a′) Q(s′,a′) da′
- 8
Light Transport Simulation and Reinforcement Learning
Structural equivalence of the integral equations
matching terms
L(x,ω) = Le(x,ω) +
- S 2
+(x)
fr(ωi,x,ω)cosθi L(h(x,ωi),−ωi) dωi Q′(s,a) = (1−α)Q(s,a)+α
- r(s,a)
+ γ
- A
π(s′,a′) Q(s′,a′) da′
- 8
Light Transport Simulation and Reinforcement Learning
Structural equivalence of the integral equations
matching terms
L(x,ω) = Le(x,ω) +
- S 2
+(x)
fr(ωi,x,ω)cosθi L(h(x,ωi),−ωi) dωi Q′(s,a) = (1−α)Q(s,a)+α
- r(s,a)
+ γ
- A
π(s′,a′) Q(s′,a′) da′
- 8
Light Transport Simulation and Reinforcement Learning
Structural equivalence of the integral equations
matching terms
L(x,ω) = Le(x,ω) +
- S 2
+(x)
fr(ωi,x,ω)cosθi L(h(x,ωi),−ωi) dωi Q′(s,a) = (1−α)Q(s,a)+α
- r(s,a)
+ γ
- A
π(s′,a′) Q(s′,a′) da′
- hints at learning the incident radiance
Q′(x,ω) = (1−α)Q(x,ω)+α
- Le(y,−ω)+
- S 2
+(y) fr(ωi,y,−ω)cosθiQ(y,ωi)dωi
- as a policy for selecting an action ω in state x to reach the next state y := h(x,ω)
the learning rate α is the only parameter left 8
Light Transport Simulation and Reinforcement Learning
Discretization of Q in analogy to irradiance representations
action space: a ∈ S 2
+(y)
– equally sized patches
x y z
=
√ 1−u2 cos(2πv) √ 1−u2 sin(2πv) u
9
Light Transport Simulation and Reinforcement Learning
Discretization of Q in analogy to irradiance representations
action space: a ∈ S 2
+(y)
– equally sized patches
x y z
=
√ 1−u2 cos(2πv) √ 1−u2 sin(2πv) u
state space: s ∈ dV – Voronoi diagram of low-discrepancy sequence
xi = Φ2(i) i/N
- for i = 0,...,N −1
– nearest neighbor search 9
Light Transport Simulation and Reinforcement Learning
Discretization of Q in analogy to irradiance representations
action space: a ∈ S 2
+(y)
– equally sized patches
x y z
=
√ 1−u2 cos(2πv) √ 1−u2 sin(2πv) u
state space: s ∈ dV – Voronoi diagram of low-discrepancy sequence
xi = Φ2(i) i/N
- for i = 0,...,N −1
– nearest neighbor search including surface normal 9
Learning Light Transport the Reinforced Way
Online reinforcement learning algorithm for guiding light transport paths
Function trace(camera,pixel,scene) throughput ← 1 color ← 0 ray ← setupPrimaryRay(camera,pixel) for i ← 0 to ∞ do h,n ← intersect(scene, ray) if isAreaLight(h) then color ← throughput * getEmission(h) break fs, ω, pω ← sampleBsdf(h,n) throughput ← throughput * fs * cos(n,ω) / pω ray ← h, ω write(pixel,color)
11
Learning Light Transport the Reinforced Way
Online reinforcement learning algorithm for guiding light transport paths
Function trace(camera,pixel,scene) throughput ← 1 color ← 0 ray ← setupPrimaryRay(camera,pixel) for i ← 0 to ∞ do h,n ← intersect(scene, ray) if i > 0 then updateQtable(ray.o,h) if isAreaLight(h) then color ← throughput * getEmission(h) break fs, ω, pω ← sampleScatteringDir(h) throughput ← throughput * fs * cos(n,ω) / pω ray ← h, ω write(pixel,color)
11
Learning Light Transport the Reinforced Way
Online reinforcement learning algorithm for guiding light transport paths
Function trace(camera,pixel,scene) throughput ← 1 color ← 0 ray ← setupPrimaryRay(camera,pixel) for i ← 0 to ∞ do h,n ← intersect(scene, ray) if i > 0 then updateQtable(ray.o,h) if isAreaLight(h) then color ← throughput * getEmission(h) break fs, ω, pω ← sampleScatteringDir(h) throughput ← throughput * fs * cos(n,ω) / pω ray ← h, ω write(pixel,color)
Function updateQtable(ray,h) idxPrev ← findCell(ray.o) idxCurr ← findCell(h) update ← 0 if isAreaLight(h) then update ← max(getEmission(h)) else update ← max(getLastAttenuation(ray) * qmax[idxCurr]) idxQ ← findIndex(idxPrev,ray) qtable[idxQ] ← (1−α)∗qtable[idxQ]+α ∗update
11
Learning Light Transport the Reinforced Way
Online reinforcement learning algorithm for guiding light transport paths
Function trace(camera,pixel,scene) throughput ← 1 color ← 0 ray ← setupPrimaryRay(camera,pixel) for i ← 0 to ∞ do h,n ← intersect(scene, ray) if i > 0 then updateQtable(ray.o,h) if isAreaLight(h) then color ← throughput * getEmission(h) break fs, ω, pω ← sampleScatteringDir(h) throughput ← throughput * fs * cos(n,ω) / pω ray ← h, ω write(pixel,color)
Function updateQtable(ray,h) idxPrev ← findCell(ray.o) idxCurr ← findCell(h) update ← 0 if isAreaLight(h) then update ← max(getEmission(h)) else update ← max(getLastAttenuation(ray) * qmax[idxCurr]) idxQ ← findIndex(idxPrev,ray) qtable[idxQ] ← (1−α)∗qtable[idxQ]+α ∗update Function sampleScatteringDir(h) idx ← findCell(h) idxPatch,ps ← sampleCdf(idx) ω ← uniformSamplePatch(idxPatch) return fs,ω,ps ∗numpatches/(2∗π)
11
Results
128 paths traced with BRDF importance sampling in a scene with challenging visibility
Path tracing with online reinforcement learning at the same number of paths
Metropolis light transport at the same number of paths
2048 paths traced with BRDF importance sampling in a scene with challenging visibility
Path tracing with online reinforcement learning at the same number of paths
Metropolis light transport at the same number of paths
Paths tracing with next event estimation and BRDF importance sampling
Path tracing with next event estimation and online reinforcement learning at the same number of samples
Learning Light Transport the Reinforced Way
Summary
structural identity of reinforcement learning and a Fredholm integral equation of the 2nd kind dramatic variance reduction – simple to implement using data structures common in real-time games – shorter path length – more coherent path generation/traversal investigating interaction with Russian roulette absorption and splitting – as average path length is shortened already investigating high dimensional function approximation via neural networks https://arxiv.org/abs/1701.07403 21