[PPT] - Learning Light Transport the Reinforced Way Ken Dahm and Alexander PowerPoint Presentation

SLIDE 1

Learning Light Transport the Reinforced Way

Ken Dahm and Alexander Keller

SLIDE 2

Light Transport Simulation

How to do importance sampling

compute functionals of a Fredholm integral equation of the 2nd kind

L(x,ω) = Le(x,ω)+

S 2

+(x) L(h(x,ωi),−ωi)fr(ωi,x,ω)cosθidωi 2

SLIDE 3

Light Transport Simulation

How to do importance sampling

example: direct illumination

L(x,ω) = Le(x,ω)+

S 2

+(x) Le(h(x,ωi),−ωi)fr(ωi,x,ω)cosθidωi 2

SLIDE 4

Light Transport Simulation

How to do importance sampling

example: direct illumination

L(x,ω) = Le(x,ω)+

S 2

+(x) Le(h(x,ωi),−ωi)fr(ωi,x,ω)cosθidωi

≈ Le(x,ω)+ 1 N

N−1

∑

i=0

Le(h(x,ωi),−ωi)fr(ωi,x,ω)cosθi p(ωi)

2

SLIDE 5

Light Transport Simulation

How to do importance sampling

example: direct illumination

L(x,ω) = Le(x,ω)+

S 2

+(x) Le(h(x,ωi),−ωi)fr(ωi,x,ω)cosθidωi

≈ Le(x,ω)+ 1 N

N−1

∑

i=0

Le(h(x,ωi),−ωi)fr(ωi,x,ω)cosθi p(ωi) p ∼ fr cosθ

SLIDE 6

Light Transport Simulation

How to do importance sampling

example: direct illumination

L(x,ω) = Le(x,ω)+

S 2

+(x) Le(h(x,ωi),−ωi)fr(ωi,x,ω)cosθidωi

≈ Le(x,ω)+ 1 N

N−1

∑

i=0

Le(h(x,ωi),−ωi)fr(ωi,x,ω)cosθi p(ωi) p ∼ fr cosθ p ∼ Lefr cosθ

2

SLIDE 7

Machine Learning

SLIDE 8

Machine Learning

Taxonomy supervised learning: learning from labeled data goal: extrapolate/generalize response to unseen data example: artificial neural networks

4

SLIDE 9

Machine Learning

Taxonomy supervised learning: learning from labeled data goal: extrapolate/generalize response to unseen data example: artificial neural networks unsupervised learning: learning from unlabeled data goal: identify structure in data example: autoencoder networks

4

SLIDE 10

Machine Learning

Taxonomy supervised learning: learning from labeled data goal: extrapolate/generalize response to unseen data example: artificial neural networks unsupervised learning: learning from unlabeled data goal: identify structure in data example: autoencoder networks semi-supervised learning: reward-based supervision goal: maximize reward example: reinforcement learning

4

SLIDE 11

The Reinforcement Learning Problem

Maximize reward

policy πt : S → A(St) – to select an action At ∈ A(St) – given current state St ∈ S

Agent St Environment At St+1 Rt+1(At | St)

5

SLIDE 12

The Reinforcement Learning Problem

Maximize reward

policy πt : S → A(St) – to select an action At ∈ A(St) – given current state St ∈ S state transition yields reward

Rt+1(At | St) ∈ R Agent St Environment At St+1 Rt+1(At | St)

5

SLIDE 13

The Reinforcement Learning Problem

Maximize reward

policy πt : S → A(St) – to select an action At ∈ A(St) – given current state St ∈ S state transition yields reward

Rt+1(At | St) ∈ R Agent St Environment At St+1 Rt+1(At | St)

classic goal: find policy to maximize discounted cumulative reward

V π(St) ≡

∞

∑

k=0

γkRt+1+k(At+k | St+k), where 0 < γ < 1

5

SLIDE 14

The Reinforcement Learning Problem

Q-Learning [Watkins 1989]

finds optimal action-selection policy for any given Markov decision process

Q′(s,a) = (1−α) · Q(s,a)+α ·

r(s,a)+γV(s′)
for a learning rate α ∈ [0,1]

6

SLIDE 15

The Reinforcement Learning Problem

Q-Learning [Watkins 1989]

finds optimal action-selection policy for any given Markov decision process

Q′(s,a) = (1−α) · Q(s,a)+α ·

r(s,a)+γV(s′)
for a learning rate α ∈ [0,1]

with the following options for the discounted cumulative reward V(s′) ≡                  maxa′∈A Q(s′,a′) consider best action

6

SLIDE 16

The Reinforcement Learning Problem

Q-Learning [Watkins 1989]

finds optimal action-selection policy for any given Markov decision process

Q′(s,a) = (1−α) · Q(s,a)+α ·

r(s,a)+γV(s′)
for a learning rate α ∈ [0,1]

with the following options for the discounted cumulative reward V(s′) ≡                  maxa′∈A Q(s′,a′) consider best action ∑a′∈A π(s′,a′)Q(s′,a′) policy weighted average over discrete action space

6

SLIDE 17

The Reinforcement Learning Problem

Q-Learning [Watkins 1989]

finds optimal action-selection policy for any given Markov decision process

Q′(s,a) = (1−α) · Q(s,a)+α ·

r(s,a)+γV(s′)
for a learning rate α ∈ [0,1]

with the following options for the discounted cumulative reward V(s′) ≡                  maxa′∈A Q(s′,a′) consider best action ∑a′∈A π(s′,a′)Q(s′,a′) policy weighted average over discrete action space

A π(s′,a′)Q(s′,a′)da′

policy weighted average over continuous action space

6

SLIDE 18

Light Transport Simulation and Reinforcement Learning

SLIDE 19

Light Transport Simulation and Reinforcement Learning

Structural equivalence of the integral equations

matching terms

L(x,ω) = Le(x,ω) +

S 2

+(x)

fr(ωi,x,ω)cosθi L(h(x,ωi),−ωi) dωi Q′(s,a) = (1−α)Q(s,a)+α

r(s,a)

+ γ

A

π(s′,a′) Q(s′,a′) da′

8

SLIDE 20

Light Transport Simulation and Reinforcement Learning

Structural equivalence of the integral equations

matching terms

L(x,ω) = Le(x,ω) +

S 2

+(x)

fr(ωi,x,ω)cosθi L(h(x,ωi),−ωi) dωi Q′(s,a) = (1−α)Q(s,a)+α

r(s,a)

+ γ

A

π(s′,a′) Q(s′,a′) da′

8

SLIDE 21

Light Transport Simulation and Reinforcement Learning

Structural equivalence of the integral equations

matching terms

L(x,ω) = Le(x,ω) +

S 2

+(x)

fr(ωi,x,ω)cosθi L(h(x,ωi),−ωi) dωi Q′(s,a) = (1−α)Q(s,a)+α

r(s,a)

+ γ

A

π(s′,a′) Q(s′,a′) da′

8

SLIDE 22

Light Transport Simulation and Reinforcement Learning

Structural equivalence of the integral equations

matching terms

L(x,ω) = Le(x,ω) +

S 2

+(x)

fr(ωi,x,ω)cosθi L(h(x,ωi),−ωi) dωi Q′(s,a) = (1−α)Q(s,a)+α

r(s,a)

+ γ

A

π(s′,a′) Q(s′,a′) da′

8

SLIDE 23

Light Transport Simulation and Reinforcement Learning

Structural equivalence of the integral equations

matching terms

L(x,ω) = Le(x,ω) +

S 2

+(x)

fr(ωi,x,ω)cosθi L(h(x,ωi),−ωi) dωi Q′(s,a) = (1−α)Q(s,a)+α

r(s,a)

+ γ

A

π(s′,a′) Q(s′,a′) da′

8

SLIDE 24

Light Transport Simulation and Reinforcement Learning

Structural equivalence of the integral equations

matching terms

L(x,ω) = Le(x,ω) +

S 2

+(x)

fr(ωi,x,ω)cosθi L(h(x,ωi),−ωi) dωi Q′(s,a) = (1−α)Q(s,a)+α

r(s,a)

+ γ

A

π(s′,a′) Q(s′,a′) da′

hints at learning the incident radiance

Q′(x,ω) = (1−α)Q(x,ω)+α

Le(y,−ω)+
S 2

+(y) fr(ωi,y,−ω)cosθiQ(y,ωi)dωi

as a policy for selecting an action ω in state x to reach the next state y := h(x,ω)

the learning rate α is the only parameter left 8

SLIDE 25

Light Transport Simulation and Reinforcement Learning

Discretization of Q in analogy to irradiance representations

action space: a ∈ S 2

+(y)

– equally sized patches

 

x y z

  =  

√ 1−u2 cos(2πv) √ 1−u2 sin(2πv) u

 

9

SLIDE 26

Light Transport Simulation and Reinforcement Learning

Discretization of Q in analogy to irradiance representations

action space: a ∈ S 2

+(y)

– equally sized patches

 

x y z

  =  

√ 1−u2 cos(2πv) √ 1−u2 sin(2πv) u

 

state space: s ∈ dV – Voronoi diagram of low-discrepancy sequence

xi = Φ2(i) i/N

for i = 0,...,N −1

– nearest neighbor search 9

SLIDE 27

Light Transport Simulation and Reinforcement Learning

Discretization of Q in analogy to irradiance representations

action space: a ∈ S 2

+(y)

– equally sized patches

 

x y z

  =  

√ 1−u2 cos(2πv) √ 1−u2 sin(2πv) u

 

state space: s ∈ dV – Voronoi diagram of low-discrepancy sequence

xi = Φ2(i) i/N

for i = 0,...,N −1

– nearest neighbor search including surface normal 9

SLIDE 28

SLIDE 29

Learning Light Transport the Reinforced Way

Online reinforcement learning algorithm for guiding light transport paths

Function trace(camera,pixel,scene) throughput ← 1 color ← 0 ray ← setupPrimaryRay(camera,pixel) for i ← 0 to ∞ do h,n ← intersect(scene, ray) if isAreaLight(h) then color ← throughput * getEmission(h) break fs, ω, pω ← sampleBsdf(h,n) throughput ← throughput * fs * cos(n,ω) / pω ray ← h, ω write(pixel,color)

11

SLIDE 30

Learning Light Transport the Reinforced Way

Online reinforcement learning algorithm for guiding light transport paths

Function trace(camera,pixel,scene) throughput ← 1 color ← 0 ray ← setupPrimaryRay(camera,pixel) for i ← 0 to ∞ do h,n ← intersect(scene, ray) if i > 0 then updateQtable(ray.o,h) if isAreaLight(h) then color ← throughput * getEmission(h) break fs, ω, pω ← sampleScatteringDir(h) throughput ← throughput * fs * cos(n,ω) / pω ray ← h, ω write(pixel,color)

11

SLIDE 31

Learning Light Transport the Reinforced Way

Online reinforcement learning algorithm for guiding light transport paths

Function trace(camera,pixel,scene) throughput ← 1 color ← 0 ray ← setupPrimaryRay(camera,pixel) for i ← 0 to ∞ do h,n ← intersect(scene, ray) if i > 0 then updateQtable(ray.o,h) if isAreaLight(h) then color ← throughput * getEmission(h) break fs, ω, pω ← sampleScatteringDir(h) throughput ← throughput * fs * cos(n,ω) / pω ray ← h, ω write(pixel,color)

Function updateQtable(ray,h) idxPrev ← findCell(ray.o) idxCurr ← findCell(h) update ← 0 if isAreaLight(h) then update ← max(getEmission(h)) else update ← max(getLastAttenuation(ray) * qmax[idxCurr]) idxQ ← findIndex(idxPrev,ray) qtable[idxQ] ← (1−α)∗qtable[idxQ]+α ∗update

11

SLIDE 32

Learning Light Transport the Reinforced Way

Online reinforcement learning algorithm for guiding light transport paths

Function trace(camera,pixel,scene) throughput ← 1 color ← 0 ray ← setupPrimaryRay(camera,pixel) for i ← 0 to ∞ do h,n ← intersect(scene, ray) if i > 0 then updateQtable(ray.o,h) if isAreaLight(h) then color ← throughput * getEmission(h) break fs, ω, pω ← sampleScatteringDir(h) throughput ← throughput * fs * cos(n,ω) / pω ray ← h, ω write(pixel,color)

Function updateQtable(ray,h) idxPrev ← findCell(ray.o) idxCurr ← findCell(h) update ← 0 if isAreaLight(h) then update ← max(getEmission(h)) else update ← max(getLastAttenuation(ray) * qmax[idxCurr]) idxQ ← findIndex(idxPrev,ray) qtable[idxQ] ← (1−α)∗qtable[idxQ]+α ∗update Function sampleScatteringDir(h) idx ← findCell(h) idxPatch,ps ← sampleCdf(idx) ω ← uniformSamplePatch(idxPatch) return fs,ω,ps ∗numpatches/(2∗π)

11

SLIDE 33

Results

SLIDE 34

128 paths traced with BRDF importance sampling in a scene with challenging visibility

SLIDE 35

Path tracing with online reinforcement learning at the same number of paths

SLIDE 36

Metropolis light transport at the same number of paths

SLIDE 37

2048 paths traced with BRDF importance sampling in a scene with challenging visibility

SLIDE 38

Path tracing with online reinforcement learning at the same number of paths

SLIDE 39

Metropolis light transport at the same number of paths

SLIDE 40

Paths tracing with next event estimation and BRDF importance sampling

SLIDE 41

Path tracing with next event estimation and online reinforcement learning at the same number of samples

SLIDE 42

Learning Light Transport the Reinforced Way

Summary

structural identity of reinforcement learning and a Fredholm integral equation of the 2nd kind dramatic variance reduction – simple to implement using data structures common in real-time games – shorter path length – more coherent path generation/traversal investigating interaction with Russian roulette absorption and splitting – as average path length is shortened already investigating high dimensional function approximation via neural networks https://arxiv.org/abs/1701.07403 21