CS 343H: Honors AI
Lecture 14: Reinforcement Learning, part 3 3/3/2014 Kristen Grauman UT Austin Slides courtesy of Dan Klein, UC Berkeley
1
CS 343H: Honors AI Lecture 14: Reinforcement Learning, part 3 - - PowerPoint PPT Presentation
CS 343H: Honors AI Lecture 14: Reinforcement Learning, part 3 3/3/2014 Kristen Grauman UT Austin Slides courtesy of Dan Klein, UC Berkeley 1 Announcements Midterm this Thursday in class Can bring one sheet (two sided) of notes
1
2
3
4
5
exploration policy
7
established, eventually stop exploring.
unknown states as well!
Regular Q-Update Modified Q-Update
experience
9
10
numbers (often 0/1) that capture important properties of the state
features (e.g. action moves closer to food)
11
12
less all states with that state’s features
13
Exact Q’s Approximate Q’s
14
Q(s’, -) = 0
20 20 40 10 20 30 40 10 20 30 20 22 24 26
15
20
16
Approximate q update explained:
17
Imagine we had only one point x with features f(x), target value y, and weights w: “target” “prediction”
2 4 6 8 10 12 14 16 18 20
5 10 15 20 25 30
19
both sitting on top of a dot.
20
Q(s,West) = ? Q(s, South) = ? Based on this approx. Q function, the action chosen would be ?
sitting on top of a dot.
21
Q(s’,West) = ? Q(s’, East) = ? What is the sample value (assuming ɣ= 1)?
sitting on top of a dot.
maximize utilities) aren’t the ones that approximate V / Q best
the course
value that predicts rewards
tune by hill climbing on feature weights.
23
24
25