SLIDE 22 E3: Performance on Complicated Tasks
50k 100k 150k 200k steps 1.0 0.5 0.0 0.5 1.0 episode return
Empty Room
125k 250k 375k 500k steps 1.0 0.5 0.0 0.5 1.0 episode return
Four Rooms
250k 500k 750k 1M steps 1 1 2 3 4 episode return
Fetch Reach novelty-pursuit bonus vanilla
Figure 5: Training curves of learned policies over 5 random seeds on the Empty Room, Four Rooms, and FetchReach environments.
4.5M 9M 13.5M 18M steps 10 20 30 40 episode return
SuperMarioBros-1-1
4.5M 9M 13.5M 18M steps 10 20 30 40 episode return
SuperMarioBros-1-2
4.5M 9M 13.5M 18M steps 2 4 6 8 episode return
SuperMarioBros-1-3 novelty-pursuit bonus vanilla
Figure 6: Training curves of learned policies over 3 random seeds on the game of SuperMarioBros.
Ziniu Li (CUHKSZ & Polixir) Efficient Exploration by Novelty Pursuit October 12, 2020 22 / 30