Learning Light Transport the Reinforced Way Ken Dahm and Alexander - - PowerPoint PPT Presentation

learning light transport the reinforced way
SMART_READER_LITE
LIVE PREVIEW

Learning Light Transport the Reinforced Way Ken Dahm and Alexander - - PowerPoint PPT Presentation

Learning Light Transport the Reinforced Way Ken Dahm and Alexander Keller Light Transport Simulation How to do importance sampling compute functionals of a Fredholm integral equation of the 2nd kind L ( x , ) = L e ( x , )+ + ( x


slide-1
SLIDE 1

Learning Light Transport the Reinforced Way

Ken Dahm and Alexander Keller

slide-2
SLIDE 2

Light Transport Simulation

How to do importance sampling

compute functionals of a Fredholm integral equation of the 2nd kind

L(x,ω) = Le(x,ω)+

  • S 2

+(x) L(h(x,ωi),−ωi)fr(ωi,x,ω)cosθidωi 2

slide-3
SLIDE 3

Light Transport Simulation

How to do importance sampling

example: direct illumination

L(x,ω) = Le(x,ω)+

  • S 2

+(x) Le(h(x,ωi),−ωi)fr(ωi,x,ω)cosθidωi 2

slide-4
SLIDE 4

Light Transport Simulation

How to do importance sampling

example: direct illumination

L(x,ω) = Le(x,ω)+

  • S 2

+(x) Le(h(x,ωi),−ωi)fr(ωi,x,ω)cosθidωi

≈ Le(x,ω)+ 1 N

N−1

i=0

Le(h(x,ωi),−ωi)fr(ωi,x,ω)cosθi p(ωi)

2

slide-5
SLIDE 5

Light Transport Simulation

How to do importance sampling

example: direct illumination

L(x,ω) = Le(x,ω)+

  • S 2

+(x) Le(h(x,ωi),−ωi)fr(ωi,x,ω)cosθidωi

≈ Le(x,ω)+ 1 N

N−1

i=0

Le(h(x,ωi),−ωi)fr(ωi,x,ω)cosθi p(ωi) p ∼ fr cosθ

slide-6
SLIDE 6

Light Transport Simulation

How to do importance sampling

example: direct illumination

L(x,ω) = Le(x,ω)+

  • S 2

+(x) Le(h(x,ωi),−ωi)fr(ωi,x,ω)cosθidωi

≈ Le(x,ω)+ 1 N

N−1

i=0

Le(h(x,ωi),−ωi)fr(ωi,x,ω)cosθi p(ωi) p ∼ fr cosθ p ∼ Lefr cosθ

2

slide-7
SLIDE 7

Machine Learning

slide-8
SLIDE 8

Machine Learning

Taxonomy supervised learning: learning from labeled data goal: extrapolate/generalize response to unseen data example: artificial neural networks

4

slide-9
SLIDE 9

Machine Learning

Taxonomy supervised learning: learning from labeled data goal: extrapolate/generalize response to unseen data example: artificial neural networks unsupervised learning: learning from unlabeled data goal: identify structure in data example: autoencoder networks

4

slide-10
SLIDE 10

Machine Learning

Taxonomy supervised learning: learning from labeled data goal: extrapolate/generalize response to unseen data example: artificial neural networks unsupervised learning: learning from unlabeled data goal: identify structure in data example: autoencoder networks semi-supervised learning: reward-based supervision goal: maximize reward example: reinforcement learning

4

slide-11
SLIDE 11

The Reinforcement Learning Problem

Maximize reward

policy πt : S → A(St) – to select an action At ∈ A(St) – given current state St ∈ S

Agent St Environment At St+1 Rt+1(At | St)

5

slide-12
SLIDE 12

The Reinforcement Learning Problem

Maximize reward

policy πt : S → A(St) – to select an action At ∈ A(St) – given current state St ∈ S state transition yields reward

Rt+1(At | St) ∈ R Agent St Environment At St+1 Rt+1(At | St)

5

slide-13
SLIDE 13

The Reinforcement Learning Problem

Maximize reward

policy πt : S → A(St) – to select an action At ∈ A(St) – given current state St ∈ S state transition yields reward

Rt+1(At | St) ∈ R Agent St Environment At St+1 Rt+1(At | St)

classic goal: find policy to maximize discounted cumulative reward

V π(St) ≡

k=0

γkRt+1+k(At+k | St+k), where 0 < γ < 1

5

slide-14
SLIDE 14

The Reinforcement Learning Problem

Q-Learning [Watkins 1989]

finds optimal action-selection policy for any given Markov decision process

Q′(s,a) = (1−α) · Q(s,a)+α ·

  • r(s,a)+γV(s′)
  • for a learning rate α ∈ [0,1]

6

slide-15
SLIDE 15

The Reinforcement Learning Problem

Q-Learning [Watkins 1989]

finds optimal action-selection policy for any given Markov decision process

Q′(s,a) = (1−α) · Q(s,a)+α ·

  • r(s,a)+γV(s′)
  • for a learning rate α ∈ [0,1]

with the following options for the discounted cumulative reward V(s′) ≡                  maxa′∈A Q(s′,a′) consider best action

6

slide-16
SLIDE 16

The Reinforcement Learning Problem

Q-Learning [Watkins 1989]

finds optimal action-selection policy for any given Markov decision process

Q′(s,a) = (1−α) · Q(s,a)+α ·

  • r(s,a)+γV(s′)
  • for a learning rate α ∈ [0,1]

with the following options for the discounted cumulative reward V(s′) ≡                  maxa′∈A Q(s′,a′) consider best action ∑a′∈A π(s′,a′)Q(s′,a′) policy weighted average over discrete action space

6

slide-17
SLIDE 17

The Reinforcement Learning Problem

Q-Learning [Watkins 1989]

finds optimal action-selection policy for any given Markov decision process

Q′(s,a) = (1−α) · Q(s,a)+α ·

  • r(s,a)+γV(s′)
  • for a learning rate α ∈ [0,1]

with the following options for the discounted cumulative reward V(s′) ≡                  maxa′∈A Q(s′,a′) consider best action ∑a′∈A π(s′,a′)Q(s′,a′) policy weighted average over discrete action space

  • A π(s′,a′)Q(s′,a′)da′

policy weighted average over continuous action space

6

slide-18
SLIDE 18

Light Transport Simulation and Reinforcement Learning

slide-19
SLIDE 19

Light Transport Simulation and Reinforcement Learning

Structural equivalence of the integral equations

matching terms

L(x,ω) = Le(x,ω) +

  • S 2

+(x)

fr(ωi,x,ω)cosθi L(h(x,ωi),−ωi) dωi Q′(s,a) = (1−α)Q(s,a)+α

  • r(s,a)

+ γ

  • A

π(s′,a′) Q(s′,a′) da′

  • 8
slide-20
SLIDE 20

Light Transport Simulation and Reinforcement Learning

Structural equivalence of the integral equations

matching terms

L(x,ω) = Le(x,ω) +

  • S 2

+(x)

fr(ωi,x,ω)cosθi L(h(x,ωi),−ωi) dωi Q′(s,a) = (1−α)Q(s,a)+α

  • r(s,a)

+ γ

  • A

π(s′,a′) Q(s′,a′) da′

  • 8
slide-21
SLIDE 21

Light Transport Simulation and Reinforcement Learning

Structural equivalence of the integral equations

matching terms

L(x,ω) = Le(x,ω) +

  • S 2

+(x)

fr(ωi,x,ω)cosθi L(h(x,ωi),−ωi) dωi Q′(s,a) = (1−α)Q(s,a)+α

  • r(s,a)

+ γ

  • A

π(s′,a′) Q(s′,a′) da′

  • 8
slide-22
SLIDE 22

Light Transport Simulation and Reinforcement Learning

Structural equivalence of the integral equations

matching terms

L(x,ω) = Le(x,ω) +

  • S 2

+(x)

fr(ωi,x,ω)cosθi L(h(x,ωi),−ωi) dωi Q′(s,a) = (1−α)Q(s,a)+α

  • r(s,a)

+ γ

  • A

π(s′,a′) Q(s′,a′) da′

  • 8
slide-23
SLIDE 23

Light Transport Simulation and Reinforcement Learning

Structural equivalence of the integral equations

matching terms

L(x,ω) = Le(x,ω) +

  • S 2

+(x)

fr(ωi,x,ω)cosθi L(h(x,ωi),−ωi) dωi Q′(s,a) = (1−α)Q(s,a)+α

  • r(s,a)

+ γ

  • A

π(s′,a′) Q(s′,a′) da′

  • 8
slide-24
SLIDE 24

Light Transport Simulation and Reinforcement Learning

Structural equivalence of the integral equations

matching terms

L(x,ω) = Le(x,ω) +

  • S 2

+(x)

fr(ωi,x,ω)cosθi L(h(x,ωi),−ωi) dωi Q′(s,a) = (1−α)Q(s,a)+α

  • r(s,a)

+ γ

  • A

π(s′,a′) Q(s′,a′) da′

  • hints at learning the incident radiance

Q′(x,ω) = (1−α)Q(x,ω)+α

  • Le(y,−ω)+
  • S 2

+(y) fr(ωi,y,−ω)cosθiQ(y,ωi)dωi

  • as a policy for selecting an action ω in state x to reach the next state y := h(x,ω)

the learning rate α is the only parameter left 8

slide-25
SLIDE 25

Light Transport Simulation and Reinforcement Learning

Discretization of Q in analogy to irradiance representations

action space: a ∈ S 2

+(y)

– equally sized patches

 

x y z

  =  

√ 1−u2 cos(2πv) √ 1−u2 sin(2πv) u

 

9

slide-26
SLIDE 26

Light Transport Simulation and Reinforcement Learning

Discretization of Q in analogy to irradiance representations

action space: a ∈ S 2

+(y)

– equally sized patches

 

x y z

  =  

√ 1−u2 cos(2πv) √ 1−u2 sin(2πv) u

 

state space: s ∈ dV – Voronoi diagram of low-discrepancy sequence

xi = Φ2(i) i/N

  • for i = 0,...,N −1

– nearest neighbor search 9

slide-27
SLIDE 27

Light Transport Simulation and Reinforcement Learning

Discretization of Q in analogy to irradiance representations

action space: a ∈ S 2

+(y)

– equally sized patches

 

x y z

  =  

√ 1−u2 cos(2πv) √ 1−u2 sin(2πv) u

 

state space: s ∈ dV – Voronoi diagram of low-discrepancy sequence

xi = Φ2(i) i/N

  • for i = 0,...,N −1

– nearest neighbor search including surface normal 9

slide-28
SLIDE 28
slide-29
SLIDE 29

Learning Light Transport the Reinforced Way

Online reinforcement learning algorithm for guiding light transport paths

Function trace(camera,pixel,scene) throughput ← 1 color ← 0 ray ← setupPrimaryRay(camera,pixel) for i ← 0 to ∞ do h,n ← intersect(scene, ray) if isAreaLight(h) then color ← throughput * getEmission(h) break fs, ω, pω ← sampleBsdf(h,n) throughput ← throughput * fs * cos(n,ω) / pω ray ← h, ω write(pixel,color)

11

slide-30
SLIDE 30

Learning Light Transport the Reinforced Way

Online reinforcement learning algorithm for guiding light transport paths

Function trace(camera,pixel,scene) throughput ← 1 color ← 0 ray ← setupPrimaryRay(camera,pixel) for i ← 0 to ∞ do h,n ← intersect(scene, ray) if i > 0 then updateQtable(ray.o,h) if isAreaLight(h) then color ← throughput * getEmission(h) break fs, ω, pω ← sampleScatteringDir(h) throughput ← throughput * fs * cos(n,ω) / pω ray ← h, ω write(pixel,color)

11

slide-31
SLIDE 31

Learning Light Transport the Reinforced Way

Online reinforcement learning algorithm for guiding light transport paths

Function trace(camera,pixel,scene) throughput ← 1 color ← 0 ray ← setupPrimaryRay(camera,pixel) for i ← 0 to ∞ do h,n ← intersect(scene, ray) if i > 0 then updateQtable(ray.o,h) if isAreaLight(h) then color ← throughput * getEmission(h) break fs, ω, pω ← sampleScatteringDir(h) throughput ← throughput * fs * cos(n,ω) / pω ray ← h, ω write(pixel,color)

Function updateQtable(ray,h) idxPrev ← findCell(ray.o) idxCurr ← findCell(h) update ← 0 if isAreaLight(h) then update ← max(getEmission(h)) else update ← max(getLastAttenuation(ray) * qmax[idxCurr]) idxQ ← findIndex(idxPrev,ray) qtable[idxQ] ← (1−α)∗qtable[idxQ]+α ∗update

11

slide-32
SLIDE 32

Learning Light Transport the Reinforced Way

Online reinforcement learning algorithm for guiding light transport paths

Function trace(camera,pixel,scene) throughput ← 1 color ← 0 ray ← setupPrimaryRay(camera,pixel) for i ← 0 to ∞ do h,n ← intersect(scene, ray) if i > 0 then updateQtable(ray.o,h) if isAreaLight(h) then color ← throughput * getEmission(h) break fs, ω, pω ← sampleScatteringDir(h) throughput ← throughput * fs * cos(n,ω) / pω ray ← h, ω write(pixel,color)

Function updateQtable(ray,h) idxPrev ← findCell(ray.o) idxCurr ← findCell(h) update ← 0 if isAreaLight(h) then update ← max(getEmission(h)) else update ← max(getLastAttenuation(ray) * qmax[idxCurr]) idxQ ← findIndex(idxPrev,ray) qtable[idxQ] ← (1−α)∗qtable[idxQ]+α ∗update Function sampleScatteringDir(h) idx ← findCell(h) idxPatch,ps ← sampleCdf(idx) ω ← uniformSamplePatch(idxPatch) return fs,ω,ps ∗numpatches/(2∗π)

11

slide-33
SLIDE 33

Results

slide-34
SLIDE 34

128 paths traced with BRDF importance sampling in a scene with challenging visibility

slide-35
SLIDE 35

Path tracing with online reinforcement learning at the same number of paths

slide-36
SLIDE 36

Metropolis light transport at the same number of paths

slide-37
SLIDE 37

2048 paths traced with BRDF importance sampling in a scene with challenging visibility

slide-38
SLIDE 38

Path tracing with online reinforcement learning at the same number of paths

slide-39
SLIDE 39

Metropolis light transport at the same number of paths

slide-40
SLIDE 40

Paths tracing with next event estimation and BRDF importance sampling

slide-41
SLIDE 41

Path tracing with next event estimation and online reinforcement learning at the same number of samples

slide-42
SLIDE 42

Learning Light Transport the Reinforced Way

Summary

structural identity of reinforcement learning and a Fredholm integral equation of the 2nd kind dramatic variance reduction – simple to implement using data structures common in real-time games – shorter path length – more coherent path generation/traversal investigating interaction with Russian roulette absorption and splitting – as average path length is shortened already investigating high dimensional function approximation via neural networks https://arxiv.org/abs/1701.07403 21