Computer Poker Research at The University of Alberta Richard Gibson - - PowerPoint PPT Presentation

computer poker research at the university of alberta
SMART_READER_LITE
LIVE PREVIEW

Computer Poker Research at The University of Alberta Richard Gibson - - PowerPoint PPT Presentation

Computer Poker Research at The University of Alberta Richard Gibson Computing Science Honours Seminar February 25, 2013 Games have been used to showcase advances in artificial intelligence... Checkers Source: spectrum.ieee.org Chess VS


slide-1
SLIDE 1

Computer Poker Research at The University of Alberta

Richard Gibson Computing Science Honours Seminar February 25, 2013

slide-2
SLIDE 2

Games have been used to showcase advances in artificial intelligence...

slide-3
SLIDE 3

Checkers

Source: spectrum.ieee.org

slide-4
SLIDE 4

Chess

VS

Source: robertamsterdam.com Source: Wikipedia

slide-5
SLIDE 5

Goal: Build a computer poker program capable of defeating the world's best human players!

slide-6
SLIDE 6

Overview

  • Texas Hold'em

Why is poker research interesting?

Computer Poker Research Group

  • Creating Polaris, a poker-playing program

Nash equilibrium

Abstraction

  • Polaris in Action

Annual Computer Poker Competition (Programs vs. Programs)

Man vs. Machine Competitions

  • Future Directions
slide-7
SLIDE 7

Overview

  • Texas Hold'em

Why is poker research interesting?

Computer Poker Research Group

  • Creating Polaris, a poker-playing program

Nash equilibrium

Abstraction

  • Polaris in Action

Annual Computer Poker Competition (Programs vs. Programs)

Man vs. Machine Competitions

  • Future Directions
slide-8
SLIDE 8

Texas Hold'em Poker

Source: ebaumsworld.com

Dealer

Source: Wikipedia

slide-9
SLIDE 9

Texas Hold'em Poker

Source: ebaumsworld.com

Dealer

Raise!

slide-10
SLIDE 10

Texas Hold'em Poker

Source: ebaumsworld.com

Dealer

Call.

slide-11
SLIDE 11

Texas Hold'em Poker

Source: ebaumsworld.com

Dealer

Flop Pot

slide-12
SLIDE 12

Texas Hold'em Poker

Source: ebaumsworld.com

Dealer

Check.

slide-13
SLIDE 13

Texas Hold'em Poker

Source: ebaumsworld.com

Dealer

Check.

slide-14
SLIDE 14

Texas Hold'em Poker

Source: ebaumsworld.com

Dealer

Turn

slide-15
SLIDE 15

Texas Hold'em Poker

Source: ebaumsworld.com

Dealer

Bet!

slide-16
SLIDE 16

Texas Hold'em Poker

Source: ebaumsworld.com

Dealer

Call.

slide-17
SLIDE 17

Texas Hold'em Poker

Source: ebaumsworld.com

Dealer

River

slide-18
SLIDE 18

Texas Hold'em Poker

Source: ebaumsworld.com

Dealer

Check.

slide-19
SLIDE 19

Texas Hold'em Poker

Source: ebaumsworld.com

Dealer

Bet!

slide-20
SLIDE 20

Texas Hold'em Poker

Source: ebaumsworld.com

Dealer

Raise!

slide-21
SLIDE 21

Texas Hold'em Poker

Source: ebaumsworld.com

Dealer

Call.

slide-22
SLIDE 22

Texas Hold'em Poker

Source: ebaumsworld.com

Dealer

slide-23
SLIDE 23

Texas Hold'em Poker

Source: ebaumsworld.com

Dealer

Winner! Loser.

slide-24
SLIDE 24

Why is Poker Interesting?

  • Poker is challenging, thought-provoking, and most importantly,

fun!

  • ... but is that enough?

Source: maps.google.com

slide-25
SLIDE 25

Why is Poker Interesting?

  • Card deals introduce elements of chance.

Flop? Flop? Flop? . . . . . .

slide-26
SLIDE 26

Why is Poker Interesting?

  • Degree of winnings can vary.

Pot 2 Pot 1 Pot 3

slide-27
SLIDE 27

Why is Poker Interesting?

? ?

  • Imperfect information!

Source: Wikipedia

slide-28
SLIDE 28

Why is Poker Interesting?

  • Poker decisions are analogous to real-life decisions.

Example: Driving a car.

Source: clker.com

slide-29
SLIDE 29

Why is Poker Interesting?

  • Poker decisions are analogous to real-life decisions.

Example: Online Advertisement Auctions.

Source: blog.revizzit.com

slide-30
SLIDE 30

Why is Poker Interesting?

  • Poker decisions are analogous to real-life decisions.

Example: Sequential Auctions.

Source: wikipedia.com

slide-31
SLIDE 31

Why is Poker Interesting?

  • Poker decisions are analogous to real-life decisions.

Example: “Adaptive Treatment Strategies”

– For instance: Insulin for diabetes patients

Source: clker.com

?

[Chen and Bowling, NIPS 2012]

slide-32
SLIDE 32

Computer Poker Research Group (CPRG)

slide-33
SLIDE 33

Computer Poker Research Group (CPRG)

  • Some of our old programs include:

– Loki (1997) – Poki (1999) – PsOpti / Sparbot (2002) – Vexbot (2003)

Limit Texas Hold'em Heads-up (2-player) Limit Texas Hold'em

slide-34
SLIDE 34

Computer Poker Research Group (CPRG)

  • Our current programs:

– Polaris (vs. Humans) – Hyperborean (vs. Programs)

  • Games we play:

– Heads-up Limit Texas Hold'em – Heads-up No-limit Texas Hold'em – Three-player Limit Texas Hold'em

slide-35
SLIDE 35

Computer Poker Research Group (CPRG)

  • Our current programs:

– Polaris (vs. Humans) – Hyperborean (vs. Programs)

  • Games we play:

– Heads-up Limit Texas Hold'em – Heads-up No-limit Texas Hold'em – Three-player Limit Texas Hold'em

slide-36
SLIDE 36

Overview

  • Texas Hold'em

Why is poker research interesting?

Computer Poker Research Group

  • Creating Polaris, a poker-playing program

Nash equilibrium

Abstraction

  • Polaris in Action

Annual Computer Poker Competition (Programs vs. Programs)

Man vs. Machine Competitions

  • Future Directions
slide-37
SLIDE 37

Creating Polaris

  • Model Texas Hold'em as an extensive-form game
  • 1
  • 1

+2 +2 f c r f c r k r f c r k r f c r

. . .

slide-38
SLIDE 38

Creating Polaris

Extensive-Form Game

Strategy Profile

slide-39
SLIDE 39

Creating Polaris

  • A strategy profile provides probabilities for each action
  • 1
  • 1

+2 +2 0.2 0.8 0.2 0.8 0.9 0.1 1 0.3 0.7 0 0.4 0.6

. . .

slide-40
SLIDE 40

Creating Polaris

  • What type of strategy profile do we want?

– Nash equilibrium

  • Example: Rock-Paper-Scissors

Source: clker.com

slide-41
SLIDE 41

Creating Polaris

  • 1

r p s +1 +1

  • 1

+1

  • 1

r p s r p s r p s

slide-42
SLIDE 42

Creating Polaris

  • A Nash equilibrium strategy profile for Rock-Paper-Scissors.

– “No one can change their strategy and do better.”

  • 1

1/3 1/3 1/3 +1 +1

  • 1

+1

  • 1

1/31/3 1/3 1/31/3 1/3 1/31/3 1/3

slide-43
SLIDE 43

Creating Polaris

  • A Nash equilibrium is a defensive strategy:

– “I can't lose no matter what my opponent does.”

  • 1

1/3 1/3 1/3 +1 +1

  • 1

+1

  • 1

? ? ? ? ? ? ? ? ?

slide-44
SLIDE 44

Creating Polaris

  • But wait, you said we want to win as much as

possible!

Pot 2 Pot 1 Pot 3

slide-45
SLIDE 45

Creating Polaris

  • But wait, you said we want to win as much as

possible!

  • Requires opponent modelling.
  • Some progress made:

– [Bard and Bowling, AAAI 2007] – [Johanson, Zinkevich, and Bowling, NIPS 2007] – [Johanson and Bowling, AISTATS 2009]

but still lots of work to be done!

slide-46
SLIDE 46

Creating Polaris

Extensive-Form Game

Nash Equilibrium Strategy Profile

slide-47
SLIDE 47

Creating Polaris

  • Use minimax (alpha-beta) search to compute Nash?

Source: clker.com

slide-48
SLIDE 48

Creating Polaris

  • Use minimax (alpha-beta) search?
  • 1
  • 1

+2 +2 f c r f c r k r f c r k r f c r

. . .

Source: clker.com

slide-49
SLIDE 49

Creating Polaris

  • Instead, use Counterfactual Regret Minimization (CFR)

[Zinkevich et al., NIPS 2007].

“Play” Poker

Deal Cards

Update Strategy Profile

slide-50
SLIDE 50

Creating Polaris

  • Instead, use Counterfactual Regret Minimization (CFR)

[Zinkevich et al., NIPS 2007].

  • Repeat billions of times

“Play” Poker

Deal Cards

Update Strategy Profile Nash Equilibrium Strategy Profile

Limit

slide-51
SLIDE 51

Creating Polaris

  • “Huge” problem (no pun intended):

Extensive-Form Game 1018

Strategy Profile

5 million GB

slide-52
SLIDE 52

Creating Polaris

  • “Huge” problem (no pun intended):

Extensive-Form Game 1018

Strategy Profile

5 million GB

slide-53
SLIDE 53

Creating Polaris

Extensive-Form Game

Nash Equilibrium Strategy Profile

?

slide-54
SLIDE 54

Creating Polaris

Extensive-Form Game Abstract Game

slide-55
SLIDE 55

Creating Polaris

Abstract Game

  • Merge card deals into buckets.

Extensive-Form Game

slide-56
SLIDE 56

Creating Polaris

Abstract Game

  • Merge card deals into buckets.

Extensive-Form Game

slide-57
SLIDE 57

Creating Polaris

  • Old technique: Percentile Hand Strength

– Rank hands from best to worst.

. . . . . . . . . . .

Best Worst

slide-58
SLIDE 58

Creating Polaris

  • Old technique: Percentile Hand Strength

– Rank hands from best to worst. – For 10 buckets, put top 10% into bucket 1,

next 10% into bucket 2, etc.

. . . . . . . . . . .

Best Worst Bucket 1 Bucket 5 Bucket 10

slide-59
SLIDE 59

Creating Polaris

  • New technique: Hand Strength Distribution Clustering
slide-60
SLIDE 60

Creating Polaris

  • New technique: Hand Strength Distribution Clustering

– Old bucketing technique

slide-61
SLIDE 61

Creating Polaris

  • New technique: Hand Strength Distribution Clustering

– New bucketing technique

slide-62
SLIDE 62

Creating Polaris

Extensive-Form Game Abstract Game

1018

109 - 1012

slide-63
SLIDE 63

Creating Polaris

Extensive-Form Game Abstract Game

1018

109 - 1012

Abstract Game Equilibrium Strategy

“Play” “Poker”

Deal Buckets

Update Abstract Strategy Profile billions of times

CFR

slide-64
SLIDE 64

Creating Polaris

Extensive-Form Game Abstract Game

1018

109 - 1012

Abstract Game Equilibrium Strategy

Approximate Full Game Equilibrium Strategy

<100 GB

slide-65
SLIDE 65

Creating Polaris

  • How are these numbers still manageable?

– We use Compute Canada's largest supercomputers. – Parallel implementations of abstraction, CFR.

Source: rqchp.ca

slide-66
SLIDE 66

Creating Polaris

  • So how close to equilibrium are we?

Old abstraction CFR New abstraction Supercomputers Fancy new CFR variant

slide-67
SLIDE 67

Overview

  • Texas Hold'em

Why is poker research interesting?

Computer Poker Research Group

  • Creating Polaris, a poker-playing program

Nash equilibrium

Abstraction

  • Polaris in Action

Annual Computer Poker Competition (Programs vs. Programs)

Man vs. Machine Competitions

  • Future Directions
slide-68
SLIDE 68

Polaris / Hyperborean in Action

  • Annual Computer Poker Competition

– Programs vs. Programs.

slide-69
SLIDE 69

Polaris / Hyperborean in Action

  • Annual Computer Poker Competition

– Programs vs. Programs. – Three different Texas Hold'em games:

  • Heads-up limit
  • Heads-up no-limit
  • Three-player limit
slide-70
SLIDE 70

Polaris / Hyperborean in Action

  • Two divisions per game:

Total Bankroll

Pot 2 Pot 1 Pot 3

Bankroll Instant Run-off

Nash Equilibrium Strategy Profile

slide-71
SLIDE 71

Polaris / Hyperborean in Action

  • Between 2006 – 2012:
  • Placed in top 3 in 34 out of 35 events.

– Finished 4th in 2012 Heads-up limit total bankroll.

21 8 5

Source: clker.com

slide-72
SLIDE 72

Polaris / Hyperborean in Action

  • 2007 Man vs. Machine Poker Competition:

– Heads-up limit only – Opponents: Phil “The Unabomber” Laak and Ali Eslami.

vs.

slide-73
SLIDE 73

Polaris / Hyperborean in Action

  • Phil Laak during his second session against Polaris:

– Youtube Video

slide-74
SLIDE 74

Polaris / Hyperborean in Action

  • Humans were victorious with narrow victory.

– 500 duplicate hands per session, $10/$20 blinds

Ali Eslami Phil Laak

Combined Human Score

Results Session 1 +$395

  • $465
  • $70

Statistical Draw Session 2

  • $2495

+$1570

  • $925

Polaris Wins Session 3

  • $635

+$1455 +$820 Humans Win Session 4 +$460 +$110 +$570 Humans Win Overall

  • $2275

+$2670 +$395 Humans Win

slide-75
SLIDE 75

Polaris / Hyperborean in Action

  • 2008 Man vs. Machine Poker Competition

– Again, just heads-up limit – Opponents: Matt Hawrilenko, Ijay Palansky, Nick

Grudzien, Kyle Hendon, Rich McRoberts, Victor Acosta, Mark Newhouse

vs.

slide-76
SLIDE 76

Polaris / Hyperborean in Action

  • Polaris wins in rematch against humans!

– 500 duplicate hands per session, $1000/$2000 blinds Human 1 Human 2

Combined Human Score

Results Session 1 +$199,500

  • $174,000

+$25,500 Humans Win Session 2

  • $2000
  • $118,000
  • $120,000

Polaris Wins Session 3

  • $42,000

+$37,000

  • $5000

Statistical Draw Session 4 +$89,500

  • $39,500

+$50,000 Humans Win Session 5 +$251,500

  • $307,500
  • $56,000

Polaris Wins Session 6

  • $60,500
  • $29,000
  • $89,500

Polaris Wins Overall

  • $195,000

Polaris Wins

slide-77
SLIDE 77

Polaris / Hyperborean in Action

  • Lost to humans in 2007 – beat humans in 2008!

Lose Win

slide-78
SLIDE 78

Overview

  • Texas Hold'em

Why is poker research interesting?

Computer Poker Research Group

  • Creating Polaris, a poker-playing program

Nash equilibrium

Abstraction

  • Polaris in Action

Annual Computer Poker Competition (Programs vs. Programs)

Man vs. Machine Competitions

  • Future Directions
slide-79
SLIDE 79

Future Directions

  • Official heads-up no-limit man vs. Machine match?

– We are still far from equilibrium in no-limit.

  • Is there a better approach for three-player?
  • Can we extend our techniques to max ten-player?
  • Tournament poker?
  • Better abstraction techniques?
  • Can we “solve” heads-up limit Texas Hold'em?
slide-80
SLIDE 80

Future Directions

  • We need more students!
slide-81
SLIDE 81

Future Directions

  • We need more students!
slide-82
SLIDE 82

Thanks for Listening!

  • Computer Poker Research Group:

– Website: http://cs.ualberta.ca/~poker – Twitter: @PolarisPoker

  • My information:

– Email: rggibson@cs.ualberta.ca – Website: http://cs.ualberta.ca/~rggibson – Twitter: @RichardGGibson