Learning From Data Lecture 5 Training Versus Testing The Two - - PowerPoint PPT Presentation

learning from data lecture 5 training versus testing
SMART_READER_LITE
LIVE PREVIEW

Learning From Data Lecture 5 Training Versus Testing The Two - - PowerPoint PPT Presentation

Learning From Data Lecture 5 Training Versus Testing The Two Questions of Learning Theory of Generalization ( E in E out ) An Effective Number of Hypotheses A Combinatorial Puzzle M. Magdon-Ismail CSCI 4100/6100 recap: The Two Questions


slide-1
SLIDE 1

Learning From Data Lecture 5 Training Versus Testing

The Two Questions of Learning Theory of Generalization (Ein ≈ Eout) An Effective Number of Hypotheses A Combinatorial Puzzle

  • M. Magdon-Ismail

CSCI 4100/6100

slide-2
SLIDE 2

recap: The Two Questions of Learning

  • 1. Can we make sure that Eout(g) is close enough to Ein(g)?
  • 2. Can we make Ein(g) small enough?

The Hoeffding generalization bound:

Eout(g) ≤ Ein(g) +

  • 1

2N ln 2|H| δ

  • generalization error bar

in-sample error model complexity

  • ut-of-sample error

|H| Error |H|∗

Ein: training (eg. the practice exam) Eout: testing (eg. the real exam)

There is a tradeoff when picking |H|.

c A M L Creator: Malik Magdon-Ismail

Training Versus Testing: 2 /18

Goal of generalization theory − →

slide-3
SLIDE 3

What Will The Theory of Generalization Achieve?

Eout(g) ≤ Ein(g) +

  • 1

2N ln 2|H| δ

in-sample error model complexity

  • ut-of-sample error

|H| Error |H|∗

Eout(g) ≤ Ein(g) +

  • 8

N ln 4mH δ

in-sample error model complexity

  • ut-of-sample error

model complexity Error

The new bound will be applicable to infinite H.

c A M L Creator: Malik Magdon-Ismail

Training Versus Testing: 3 /18

|H| is overkill − →

slide-4
SLIDE 4

Why is |H| an Overkill

How did |H| come in? Bad events Bg = {|Eout(g) − Ein(g)| > ǫ} Bm = {|Eout(hm) − Ein(hm)| > ǫ}

We do not know which g, so use a worst case union bound.

P[Bg] ≤ P[any Bm] ≤

|H|

  • m=1

P[Bm].

B3 B1 B2

  • Bm are events (sets of outcomes); they can overlap.
  • If the Bm overlap, the union bound is loose.
  • If many hm are similar, the Bm overlap.
  • There are “effectively” fewer than |H| hypotheses,.
  • We can replace |H| by something smaller.

|H| fails to account for similarity between hypotheses.

c A M L Creator: Malik Magdon-Ismail

Training Versus Testing: 4 /18

Measuring diversity on N points − →

slide-5
SLIDE 5

Measuring the Diversity (Size) of H

We need a way to measure the diversity of H. A simple idea: Fix any set of N data points. If H is diverse it should be able to implement all functions . . . on these N points.

c A M L Creator: Malik Magdon-Ismail

Training Versus Testing: 5 /18

Example: large H − →

slide-6
SLIDE 6

A Data Set Reveals the True Colors of an H

H

c A M L Creator: Malik Magdon-Ismail

Training Versus Testing: 6 /18

. . . through the eyes of D − →

slide-7
SLIDE 7

A Data Set Reveals the True Colors of an H

H H through the eyes of the D

c A M L Creator: Malik Magdon-Ismail

Training Versus Testing: 7 /18

Just one dichotomy − →

slide-8
SLIDE 8

A Data Set Reveals the True Colors of an H

From the point of view of D, the entire H is just one dichotomy.

c A M L Creator: Malik Magdon-Ismail

Training Versus Testing: 8 /18

An effective number of hypotheses − →

slide-9
SLIDE 9

An Effective Number of Hypotheses

If H is diverse it should be able to implement many dichotomys. |H| only captures the maximum possible diversity of H. Consider an h ∈ H, and a data set x1, . . . , xN. h gives us an N-tuple of ±1’s:

(h(x1), . . . , h(xN)). A dichotomy of the inputs.

If H is diverse, we get many different dichotomies. If H contains similar functions, we only get a few dichotomies.

dichotomy The growth function quantifies this.

c A M L Creator: Malik Magdon-Ismail

Training Versus Testing: 9 /18

Growth function − →

slide-10
SLIDE 10

The Growth Function mH(N)

Define the the restriction of H to the inputs x1, x2, . . . , xN: H(x1, . . . , xN) = {(h(x1), . . . , h(xN)) | h ∈ H}

(set of dichotomies induced by H)

The Growth Function mH(N) The largest set of dichotomies induced by H: mH(N) = max

x1,...,xN |H(x1, . . . , xN)|.

mH(N) ≤ 2N. Can we replace |H| by mH, an effective number of hypotheses?

  • Replacing |H| with 2N is no help in the bound. (why?)
  • the error bar is
  • 1

2N ln 2|H| δ

  • We want mH(N) ≤ poly(N) to get a useful error bar.

c A M L Creator: Malik Magdon-Ismail

Training Versus Testing: 10 /18

Example: 2-d perceptron − →

slide-11
SLIDE 11

Example: 2-D Perceptron Model

Cannot implement Can implement all 8 Can implement at most 14

mH(3) = 8 = 23. mH(4) = 14 < 24. What is mH(5)?

c A M L Creator: Malik Magdon-Ismail

Training Versus Testing: 11 /18

Example: 1-d positive ray − →

slide-12
SLIDE 12

Example: 1-D Positive Ray Model

w0 · · · x1 x2 + · · · xN

  • h(x) = sign(x − w0)
  • Consider N points.
  • There are N + 1 dichotomies depending on where you put w0.
  • mH(N) = N + 1.

c A M L Creator: Malik Magdon-Ismail

Training Versus Testing: 12 /18

Example: 2-d positive rectangle − →

slide-13
SLIDE 13

Example: Positive Rectangles in 2-D

N = 4 N = 5

x1 x2 x3 x4 x1 x2 x3 x4 x4

H implements all dichotomies some point will be inside a rectangle defined by others

mH(4) = 24 mH(5) < 25 We have not computed mH(5) – not impossible, but tricky.

c A M L Creator: Malik Magdon-Ismail

Training Versus Testing: 13 /18

The growth functions summarized − →

slide-14
SLIDE 14

Example Growth Functions

N 1 2 3 4 5 · · · 2-D perceptron 2 4 8 14 · · · 1-D pos. ray 2 3 4 5 · · · 2-D pos. rectangles 2 4 8 16 < 25 · · ·

  • mH(N) drops below 2N – there is hope for the generalization bound.
  • A break point is any n for which mH(n) < 2n.

c A M L Creator: Malik Magdon-Ismail

Training Versus Testing: 14 /18

Combinatorial puzzle: dichotomys on 3 points − →

slide-15
SLIDE 15

A Combinatorial Puzzle

x1 x2 x3

  • A set of dichotomys

c A M L Creator: Malik Magdon-Ismail

Training Versus Testing: 15 /18

Two points shattered − →

slide-16
SLIDE 16

A Combinatorial Puzzle

x1 x2 x3

  • Two points are shattered

c A M L Creator: Malik Magdon-Ismail

Training Versus Testing: 16 /18

Another set of dichotomys − →

slide-17
SLIDE 17

A Combinatorial Puzzle

x1 x2 x3

  • No pair of points is shattered

c A M L Creator: Malik Magdon-Ismail

Training Versus Testing: 17 /18

What about N = 4? − →

slide-18
SLIDE 18

A Combinatorial Puzzle

x1 x2 x3

  • 4 dichotomies is max.

x1 x2 x3 x4

  • .

. .

If N = 4 how many possible dichotomys with no 2 points shattered?

c A M L Creator: Malik Magdon-Ismail

Training Versus Testing: 18 /18