[PPT] - Midterm review CS 446 1. Lecture review (Lec1.) Basic setting: PowerPoint Presentation

SLIDE 1

Midterm review

CS 446

SLIDE 2

SLIDE 3

SLIDE 4

SLIDE 5

SLIDE 6

1. Lecture review

SLIDE 7

(Lec1.) Basic setting: supervised learning

Training data: labeled examples (x1, y1), (x2, y2), . . . , (xn, yn)

1 / 61

SLIDE 8

(Lec1.) Basic setting: supervised learning

Training data: labeled examples (x1, y1), (x2, y2), . . . , (xn, yn) where ◮ each input xi is a machine-readable description of an instance (e.g., image, sentence), and

1 / 61

SLIDE 9

(Lec1.) Basic setting: supervised learning

Training data: labeled examples (x1, y1), (x2, y2), . . . , (xn, yn) where ◮ each input xi is a machine-readable description of an instance (e.g., image, sentence), and ◮ each corresponding label yi is an annotation relevant to the task—typically not easy to automatically obtain.

1 / 61

SLIDE 10

(Lec1.) Basic setting: supervised learning

Training data: labeled examples (x1, y1), (x2, y2), . . . , (xn, yn) where ◮ each input xi is a machine-readable description of an instance (e.g., image, sentence), and ◮ each corresponding label yi is an annotation relevant to the task—typically not easy to automatically obtain. Goal: learn a function ˆ f from labeled examples, that accurately “predicts” the labels of new (previously unseen) inputs. (Note: 0 training error is easy; test/population error is what matters.)

learned predictor past labeled examples learning algorithm predicted label new (unlabeled) example

1 / 61

SLIDE 11

(Lec2.) k-nearest neighbors classifier

Given: labeled examples D := {(xi, yi)}n

i=1

Predictor: ˆ fD,k : X → Y: On input x,

1. Find the k points xi1, xi2, . . . , xik among {xi}n

i=1 “closest” to x

(the k nearest neighbors).

2. Return the plurality of yi1, yi2, . . . , yik.

(Break ties in both steps arbitrarily.)

2 / 61

SLIDE 12

(Lec2.) Choosing k

The hold-out set approach

1. Pick a subset V ⊂ S (hold-out set, a.k.a. validation set).
2. For each k ∈ {1, 3, 5, . . . }:

◮ Construct k-NN classifier ˆ fS\V,k using S \ V . ◮ Compute error rate of ˆ fS\V,k on V (“hold-out error rate”).

3. Pick the k that gives the smallest hold-out error rate.

3 / 61

SLIDE 13

(Lec2.) Choosing k

The hold-out set approach

1. Pick a subset V ⊂ S (hold-out set, a.k.a. validation set).
2. For each k ∈ {1, 3, 5, . . . }:

◮ Construct k-NN classifier ˆ fS\V,k using S \ V . ◮ Compute error rate of ˆ fS\V,k on V (“hold-out error rate”).

3. Pick the k that gives the smallest hold-out error rate.

(There are many other approaches.)

3 / 61

SLIDE 14

(Lec2.) Decision trees

Directly optimize tree structure for good classification. A decision tree is a function f : X → Y, represented by a binary tree in which: ◮ Each tree node is associated with a splitting rule g : X → {0, 1}. ◮ Each leaf node is associated with a label ˆ y ∈ Y.

4 / 61

SLIDE 15

(Lec2.) Decision trees

Directly optimize tree structure for good classification. A decision tree is a function f : X → Y, represented by a binary tree in which: ◮ Each tree node is associated with a splitting rule g : X → {0, 1}. ◮ Each leaf node is associated with a label ˆ y ∈ Y. When X = Rd, typically only consider splitting rules of the form g(x) = 1{xi > t} for some i ∈ [d] and t ∈ R. Called axis-aligned or coordinate splits. (Notation: [d] := {1, 2, . . . , d}.)

x1 > 1.7 x2 > 2.8 ˆ y = 1 ˆ y = 2 ˆ y = 3

4 / 61

SLIDE 16