Introduction to Big Data and Machine Learning Classification Dr. - - PowerPoint PPT Presentation

introduction to big data and machine learning
SMART_READER_LITE
LIVE PREVIEW

Introduction to Big Data and Machine Learning Classification Dr. - - PowerPoint PPT Presentation

Introduction to Big Data and Machine Learning Classification Dr. Mihail September 19, 2019 (Dr. Mihail) Intro Big Data September 19, 2019 1 / 3 Linear models for classification Goal Goal of classification: take an input vector x and assign


slide-1
SLIDE 1

Introduction to Big Data and Machine Learning Classification

  • Dr. Mihail

September 19, 2019

(Dr. Mihail) Intro Big Data September 19, 2019 1 / 3

slide-2
SLIDE 2

Linear models for classification

Goal

Goal of classification: take an input vector x and assign it to one of K discrete classes Ck where k = 1, . . . , K The input space is therefore divided into decision regions whose boundaries are called “decision bounderies” or “decision surfaces” Here, we will consider linear models where the decision boundaries are linear functions of the input vector “x” and hence are definded by D − 1-dimensional hyperplanes within the D-dimensional input space Data sets that can be separated exactly by linear decision surfaces are said to be “linearly separable”

(Dr. Mihail) Intro Big Data September 19, 2019 2 / 3

slide-3
SLIDE 3

Probabilistic models

For probabilistic models, the most convenient, in the case of two-class problems is the binary representation, in which there is a single target variable t = {0, 1} For K > 2 classes, it is convenient to use 1 − of − K coding scheme, in which t is a vector of length K such that if the class is Cj, then all elements of tk are zero exacept tj. For instance if we have 5 classes, the a patter from class 2 would be given by the target vector t = (0, 1, 0, 0, 0)T

(Dr. Mihail) Intro Big Data September 19, 2019 3 / 3

slide-4
SLIDE 4

Using Bayes Theorem

Model posterior class conditional probability: p(Ck|x) = p(x|Ck)p(Ck)

p(x)

Notice the denominator is not a function of C Prior class distribution: p(Ck) Class conditional density: p(x|Ck)

(Dr. Mihail) Intro Big Data September 19, 2019 4 / 3

slide-5
SLIDE 5

Discriminative models

Discriminative model

P(c|x) To train a discriminative classifier, all training examples of different classes must be jointly used to build up a single discriminative classifier Output K probabilities for K class labels in probabilistic classifiers, while a single label is produced by non-probabilistic classifier

(Dr. Mihail) Intro Big Data September 19, 2019 5 / 3

slide-6
SLIDE 6

Discriminative classifier

(Dr. Mihail) Intro Big Data September 19, 2019 6 / 3

slide-7
SLIDE 7

Generative classifier

P(x|c), c = c1, . . . , cK, x = (x1, . . . , xn) K probabilistic models have to be trained independently Each is trained on only the examples of the same label Output K probabilities for a given input with K models “Generative” means that model can produce data via distribution sampling

(Dr. Mihail) Intro Big Data September 19, 2019 7 / 3

slide-8
SLIDE 8

Generative classifier

(Dr. Mihail) Intro Big Data September 19, 2019 8 / 3

slide-9
SLIDE 9

Maximum a-posteriori (MAP)

For an input x, find the largest one from K probabilities output by a discriminative probabilistic classifier P(c1|x), . . . , P(cK|x) Assign x to label c∗ if P(c∗|x) is the largest Generative classification with the MAP rule: P(ci|x) = P(x|ci)P(ci) P(x) ∝ P(x|ci)P(ci) (1)

(Dr. Mihail) Intro Big Data September 19, 2019 9 / 3

slide-10
SLIDE 10

Na¨ ıve Bayes

Bayes classification

P(c|x) ∝ P(x|c)P(c) = P(x1, . . . , xn|c)P(c) (2) for c = c1, . . . , cK

(Dr. Mihail) Intro Big Data September 19, 2019 10 / 3

slide-11
SLIDE 11

Na¨ ıve Bayes

Bayes classification

P(c|x) ∝ P(x|c)P(c) = P(x1, . . . , xn|c)P(c) (2) for c = c1, . . . , cK

Problem

The joint probability P(x1, . . . , xn|c) is not feasible to learn.

(Dr. Mihail) Intro Big Data September 19, 2019 10 / 3

slide-12
SLIDE 12

Na¨ ıve Bayes

Bayes classification

P(c|x) ∝ P(x|c)P(c) = P(x1, . . . , xn|c)P(c) (2) for c = c1, . . . , cK

Problem

The joint probability P(x1, . . . , xn|c) is not feasible to learn.

Solution

Assume all input features are class conditionally independent!

(Dr. Mihail) Intro Big Data September 19, 2019 10 / 3

slide-13
SLIDE 13

Bayes model

P(x1, x2, . . . , xn|c) =P(x1|x2, . . . , xn, c)P(x2, . . . , xn|c) =P(x1|c)P(x2, . . . , xn|c) =P(x1|c)P(x2|c) . . . P(xn|c) (3)

(Dr. Mihail) Intro Big Data September 19, 2019 11 / 3

slide-14
SLIDE 14

Algorithm

Discrete valued features

Learning phase: Given a training set S of F features and K classes, For each target value of ci(ci = c1, . . . , cK):

ˆ P(ci) ← estimate P(ci) with examples in S For every feature value xjk of each feature xj(j = 1, . . . , F; k = 1, . . . N):

ˆ P(xj = xjk|ci) ← estimate P(xjk|ci) with samples in S

Output: F × K conditional probabilistic (generative) models. Test phase: Given an unknown instance x′ = (a′

1, . . . , a′ n) assign label c∗

to x′ if [ ˆ P(a′

1|c∗) . . . ˆ

P(a′

n|c∗)] ˆ

P(c∗) > [ ˆ P(a′

1|ci) . . . ˆ

P(a′

n|ci)] ˆ

P(ci) (4) for ci = c∗, ci = c1, . . . cK

(Dr. Mihail) Intro Big Data September 19, 2019 12 / 3

slide-15
SLIDE 15

Example

(Dr. Mihail) Intro Big Data September 19, 2019 13 / 3

slide-16
SLIDE 16

Learning phase

(Dr. Mihail) Intro Big Data September 19, 2019 14 / 3

slide-17
SLIDE 17

Test phase

Given a new instance, predict its label: x′ = (Outlook = Sunny, Temperature = Cool, Humidity = High, Wind = Strong) Look up tables: Make decision with the MAP rule:

(Dr. Mihail) Intro Big Data September 19, 2019 15 / 3