CSC321 Lecture 22: Q-Learning
Roger Grosse
Roger Grosse CSC321 Lecture 22: Q-Learning 1 / 21
CSC321 Lecture 22: Q-Learning Roger Grosse Roger Grosse CSC321 - - PowerPoint PPT Presentation
CSC321 Lecture 22: Q-Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Q-Learning 1 / 21 Overview Second of 3 lectures on reinforcement learning Last time: policy gradient (e.g. REINFORCE) Optimize a policy directly, dont represent
Roger Grosse CSC321 Lecture 22: Q-Learning 1 / 21
Roger Grosse CSC321 Lecture 22: Q-Learning 2 / 21
Roger Grosse CSC321 Lecture 22: Q-Learning 3 / 21
Roger Grosse CSC321 Lecture 22: Q-Learning 4 / 21
Roger Grosse CSC321 Lecture 22: Q-Learning 5 / 21
Roger Grosse CSC321 Lecture 22: Q-Learning 6 / 21
Roger Grosse CSC321 Lecture 22: Q-Learning 7 / 21
Roger Grosse CSC321 Lecture 22: Q-Learning 8 / 21
Roger Grosse CSC321 Lecture 22: Q-Learning 9 / 21
Roger Grosse CSC321 Lecture 22: Q-Learning 9 / 21
Roger Grosse CSC321 Lecture 22: Q-Learning 10 / 21
Roger Grosse CSC321 Lecture 22: Q-Learning 11 / 21
Roger Grosse CSC321 Lecture 22: Q-Learning 12 / 21
Roger Grosse CSC321 Lecture 22: Q-Learning 13 / 21
Roger Grosse CSC321 Lecture 22: Q-Learning 14 / 21
Roger Grosse CSC321 Lecture 22: Q-Learning 15 / 21
Roger Grosse CSC321 Lecture 22: Q-Learning 16 / 21
Roger Grosse CSC321 Lecture 22: Q-Learning 17 / 21
Roger Grosse CSC321 Lecture 22: Q-Learning 18 / 21
Roger Grosse CSC321 Lecture 22: Q-Learning 19 / 21
Roger Grosse CSC321 Lecture 22: Q-Learning 20 / 21
Roger Grosse CSC321 Lecture 22: Q-Learning 21 / 21