AlphaGo, etc. Lab 4 Due Feb. 29 (you have two weeks 1.5 - - PowerPoint PPT Presentation

alphago etc lab 4
SMART_READER_LITE
LIVE PREVIEW

AlphaGo, etc. Lab 4 Due Feb. 29 (you have two weeks 1.5 - - PowerPoint PPT Presentation

AlphaGo, etc. Lab 4 Due Feb. 29 (you have two weeks 1.5 remaining) new game0.py with show_values for debugging Exam on Tuesday in lab I sent out a topics list last night. On Monday in lecture, well be doing review


slide-1
SLIDE 1

AlphaGo, etc.

slide-2
SLIDE 2

Lab 4

  • Due Feb. 29 (you have two weeks … 1.5 remaining)
  • new game0.py with show_values for debugging
slide-3
SLIDE 3

Exam on Tuesday in lab

  • I sent out a topics list last night.
  • On Monday in lecture, we’ll be doing review problems, plus Q&A.

○ We’ll also do Q&A at the end today if there’s time. ○ I plan to send out review problems over the weekend.

What sorts of questions will be on the exam?

  • selecting an appropriate algorithm for various problems

○ state space search vs. local search; BFS vs. A*; minimax vs. MCTS...

  • setting up an appropriate model for the problem and algorithm

○ generating neighbors; identifying a goal; describing utilities; choosing a heuristic...

  • stepping through algorithms

○ identify the next state; list the order nodes are expanded; eliminate dominated strategies...

slide-4
SLIDE 4
slide-5
SLIDE 5

AlphaGo

neural networks normal MCTS

slide-6
SLIDE 6

AlphaGo neural networks

selection evaluation evaluation

slide-7
SLIDE 7
  • used a large database of online expert games
  • learned two versions of the neural network

○ a fast network P for use in evaluation ○ an accurate network P for use in selection

Step 1: learn to predict human moves

CS63 topic neural networks week 7, 14?

slide-8
SLIDE 8

Step 2: improve the accurate network

  • run large numbers of self-play games
  • update the network using reinforcement learning

○ weights updated by stochastic gradient ascent

CS63 topic reinforcement learning weeks 9-10 CS63 topic stochastic gradient ascent week 3

slide-9
SLIDE 9

Step 3: learn a board evaluation network, V

  • use random samples from the self-play database
  • prediction target: probability that black wins from a

given board

slide-10
SLIDE 10

AlphaGo tree policy

select nodes randomly according to weight: prior is determined by the improved policy network P

slide-11
SLIDE 11

AlphaGo default policy

When expanding a node, its initial value combines:

  • an evaluation from value network V
  • a rollout using fast policy P

A rollout according to P selects random moves with the estimated probability a human would select them instead of uniformly randomly.

slide-12
SLIDE 12

AlphaGo results

  • Beat a low-rank professional player (Fan Hui) 5 games to 0.
  • Will take on a top professional player (Lee Sedol) March 8-15 in Seoul.
  • There are good reasons to think AlphaGo may lose:

○ AlphaGo’s estimated ELO rating is lower than Lee’s. ○ Professionals who analyzed AlphaGo’s moves don’t think it can win. ○ Deep Blue lost to Kasparov on its first attempt after beating lower-ranked grandmasters.

slide-13
SLIDE 13

Key idea: represent simultaneous moves with information sets.

Transforming normal to extensive form

A B A 5,5 2,8 B 1,3 3,0 1 2 A B A B 2 2 (5,5) (2,8) (1,3) (3,0) A B 1

slide-14
SLIDE 14

Key idea: strategies are complete policies, specifying an action for every information set.

Transforming extensive to normal form

L R LLL 1,2 4,4 LLR 1,2 4,4 LRL 0,3 4,4 LRR 0,3 4,4 RLL 1,4 3,2 RLR 1,4 0,0 RRL 1,4 3,2 RRR 1,4 0,0 1 2

1 2 3

1,2 0,3 4,4 3,2 0,0 1 1 2 2 1 1,4

L L L L L R R R R R

slide-15
SLIDE 15

Improvements

  • iterative deepening
  • branch and bound, IDA*
  • multiple searches

LOCAL SEARCH

  • state spaces
  • cost functions
  • neighbor generation

Hill-Climbing

  • random restarts
  • random moves
  • simulated annealing
  • temperature, decay rate

Population Search

  • (stochastic) beam search
  • gibbs sampling
  • genetic algorithms
  • select/crossover/mutate
  • state representation
  • satisfiability
  • gradient ascent

GAME THEORY DESIGN DIMENSIONS

  • modularity
  • representation scheme
  • discreteness
  • planning horizon
  • uncertainty
  • dynamic environment
  • number of agents
  • learning
  • computational limitations

STATE SPACE SEARCH

  • state space modeling
  • completeness
  • optimality
  • time/space complexity

Uninformed Search

  • depth-first
  • breadth-first
  • uniform cost

Informed Search

  • greedy
  • A*
  • heuristics, admissibility

Utility

  • preferences
  • expected utility maximizing

Extensive-Form Games

  • game tree representation
  • backwards induction
  • minimax
  • alpha-beta pruning
  • heuristic evaluation

Normal Form Games

  • payoff matrix repr.
  • removing dominated strats
  • pure-strategy Nash eq.
  • find one
  • mixed strategy Nash eq.
  • verify one
  • matrix/tree equivalence

MONTE CARLO SEARCH

  • random sampling evaluation
  • explore/exploit tradeoff

Monte Carlo Tree Search

  • tree policy
  • default policy
  • UCT/UCB