A Bayesian Approach to Generative Adversarial Imitation Learning - - PowerPoint PPT Presentation

a bayesian approach to generative adversarial imitation
SMART_READER_LITE
LIVE PREVIEW

A Bayesian Approach to Generative Adversarial Imitation Learning - - PowerPoint PPT Presentation

A Bayesian Approach to Generative Adversarial Imitation Learning NeurIPS 2018 Presenter Wonseok Jeon @ KAIST Joint work with Seokin Seo @ KAIST Kee-Eung Kim @ KAIST & PROWLER.io Imitation Learning A Markov decision process (MDP)


slide-1
SLIDE 1

A Bayesian Approach to Generative Adversarial Imitation Learning

NeurIPS 2018 Presenter Wonseok Jeon @ KAIST Joint work with Seokin Seo @ KAIST Kee-Eung Kim @ KAIST & PROWLER.io

slide-2
SLIDE 2

Imitation Learning

  • A Markov decision process (MDP)
  • A policy

Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

without cost

slide-3
SLIDE 3

Imitation Learning

  • A Markov decision process (MDP)
  • A policy
  • Instead, there is a set of expert’s demonstrations:
  • Learn a policy that mimics well.

Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

without cost

slide-4
SLIDE 4

Generative Adversarial Imitation Learning (GAIL)

  • Use generative adversarial networks (GANs) for imitation

learning:

1. Sample trajectories by using and (expert demonstrations).

  • 2. Train discriminator.
  • 3. Update policy by using reinforcement learning (RL), e.g., TRPO,

PPO.

Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

slide-5
SLIDE 5

Generative Adversarial Imitation Learning (GAIL)

  • GAIL requires model-free RL inner loops.
  • The environment simulation is required.
  • Sample-efficiency issues
  • Obtaining trajectory samples from the environment is often very costly,

e.g., physical robots in a real world.

Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

I don’t want to move a lot…

slide-6
SLIDE 6

Generative Adversarial Imitation Learning (GAIL)

  • GAIL requires model-free RL inner loops.
  • The environment simulation is required.
  • Sample-efficiency issues
  • Obtaining trajectory samples from the environment is often very costly,

e.g., physical robots in a real world.

  • Motivation
  • For each iteration, the discriminator is updated by using minibatches.
  • How about using Bayesian classification to train discriminator?
  • Expected to make more refined cost function for imitation learning!

Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

I don’t want to move a lot…

slide-7
SLIDE 7

Bayesian Framework for GAIL

  • Probabilistic model for trajectories
  • For each trajectories , a sequence of

state-action pairs satisfies Markov property:

Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

trajectory

slide-8
SLIDE 8

Bayesian Framework for GAIL

  • Probabilistic model for trajectories
  • For each trajectories , a sequence of

state-action pairs satisfies Markov property:

  • Two policies: agent’s policy

, expert’s policy

Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

agent’s trajectory expert’s trajectory

slide-9
SLIDE 9

Bayesian Framework for GAIL

  • Role of discriminator
  • The probability that models whether comes from the expert
  • r the agent

Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

agent’s trajectory trajectory discriminator expert’s trajectory

slide-10
SLIDE 10

Bayesian Framework for GAIL

  • Posterior distributions
  • Posterior for discriminator (conditioned on perfect trajectory discrimination)
  • Posterior for policy (conditioned on preventing perfect discrimination)

Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

slide-11
SLIDE 11

Bayesian Framework for GAIL

  • Posterior distributions
  • Posterior for discriminator (conditioned on perfect trajectory discrimination)
  • Posterior for policy (conditioned on preventing perfect discrimination)

Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

GAIL uses maximum likelihood estimation (MLE) for both policy and discriminator updates!

slide-12
SLIDE 12

Bayesian GAIL: GAIL with Posterior-Predictive Cost

  • The objective is

Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

reinforcement learning posterior-predictive cost

slide-13
SLIDE 13

Bayesian GAIL: GAIL with Posterior-Predictive Cost

  • The objective is
  • Learning Curve for 5 MuJoCo tasks!

Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

reinforcement learning posterior-predictive cost

slide-14
SLIDE 14

Bayesian GAIL: GAIL with Posterior-Predictive Cost

  • The objective is
  • Learning Curve for 5 MuJoCo tasks!

Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

reinforcement learning posterior-predictive cost

For more information, please come to our poster session! Wed Dec 5th 5-7 PM @ Room 210 & 230 AB #129 Thanks!