osu!mania Reinforcement Learning Agent - - PowerPoint PPT Presentation

osu mania reinforcement learning agent
SMART_READER_LITE
LIVE PREVIEW

osu!mania Reinforcement Learning Agent - - PowerPoint PPT Presentation

osu!mania Reinforcement Learning Agent ichrysomallis@isc.tuc.gr 2014030078 Contents Introduction osu!mania game Graphical User Interface customization Agents environment Approach and


slide-1
SLIDE 1
  • su!mania Reinforcement Learning

Agent

Χρυσομάλλης Ιάσων ichrysomallis@isc.tuc.gr 2014030078

slide-2
SLIDE 2

Contents

 Introduction  osu!mania game  Graphical User Interface customization  Agent’s environment  Approach and variable definition  Q-learning  Deep reinforcement learning  Future plans

2

slide-3
SLIDE 3

Introduction

Topic: Develop an agent able to learn how to play the video game osu!mania, through reinforcement learning. Two agents:

 Q-learning agent  Deep reinforcement learning agent

3

slide-4
SLIDE 4
  • su!mania game

Rhythm game, notes are falling

1.

Single-tap notes

2.

Hold notes

3.

Judgment bar

4.

Player keys

5.

Combo

6.

Hitburst

7.

Overall accuracy

8.

Score

4

slide-5
SLIDE 5

Graphical User Interface customization

Fully customizable environment, all elements can be changed Each element is painted with solid color RGB = [X, 100, 100], where X is in accordance with the element’s identity (see numbers)

5

slide-6
SLIDE 6

Agent’s environment

Record screenshots and translate information based on the RGB values given Small fraction of the screen includes relevant information, specific boxes are being recorded

6

slide-7
SLIDE 7

Approach and variable definition (1)

Identical behavior on each column, problem can be narrowed down to single column learning

 Agent’s actions : 1.

Instantaneous key tap

2.

Key press (no release)

3.

Key release

4.

Do nothing

7

slide-8
SLIDE 8

Approach and variable definition (2)

 Rewards:  Epsilon:

  • Initial value = 1
  • Decay value = 0.9977
  • Minimum value = 0.01

8

slide-9
SLIDE 9

Approach and variable definition (3)

 State:

 One column of 200 pixels  Only red (R) layer  Three possible values (no note, singe-tap note, hold note)

Deep reinforcement learning:

  • Raw input of the column

Q-learning:

  • Only 8 pixels due to state complexity, taking one pixel every 15 pixels
  • f the recorded column

9

slide-10
SLIDE 10

Q-learning

 Algorithm:  Steps:

 Receive current state  Choose an action based on epsilon  Execute the action  Receive new state  Check if song is over  Update Q-table

10

slide-11
SLIDE 11

Deep reinforcement learning

 Neural network model (Keras):  Steps:

Identical steps apart from last one. Save transitions in temporary memory and train the model with a smaller, randomly selected sample group (batch).

11

slide-12
SLIDE 12

Results

Q-learning agent

12

DQN agent

slide-13
SLIDE 13

Future plans

 Try different combinations of neural network model

layers

 Design the neural network model in TensorFlow  Run the agent on GPU, instead of CPU  Make use of a high end computer

13