Association Rules Extracting Patterns from Large Data Sets Content - - PowerPoint PPT Presentation

▶

Jul 21, 2023 189 likes •305 views

Association Rules Extracting Patterns from Large Data Sets Content Introduction to Pattern and Rule Analysis A-priori Algorithm Generalized Rule Induction Sequential Patterns Other WEKA algorithms Outlook Introduction

SLIDE 1

Association Rules

Extracting Patterns from Large Data Sets

SLIDE 2

Content

Introduction to Pattern and Rule Analysis A-priori Algorithm Generalized Rule Induction Sequential Patterns Other WEKA algorithms Outlook

SLIDE 3

Introduction

Finding unusual patterns and rules from large data

sets

Examples

10% percent of the customers buy wine and cheese If someone today buys wine and cheese, tomorrow will buy

sparkling water

If alarm A and B occur within 30 seconds, then alarm C

ccurs within 60 seconds with probability 0.5

If someone visits derstandard.at, there is a 60% chance

that the person will visit faz.net as well

If player X and Y were playing together as strikers, the

team won 90% of the games

Application Areas: Unlimited Question: How we can find such patterns?

SLIDE 4

General Considerations

Rule Represenation

Left-hand side proposition (antecedent) Right-hand side proposition (consequent)

Probabilistic Rule

Consequent is true with probability p given that the

antecedent is true conditional probability

Scale Level

Especially suited for categorical data Setting thresholds for continuous data

Advantages

Easy to compute Easy to understand

SLIDE 5

Example

Basket ID Milk Bread Water Coffee Kleenex 1 1 2 1 1 1 1 3 1 1 1 4 1 5 1 1 1 6 1 1 1 7 1 1 1 8 1 1 1 9 1 1 10 1 1 1

Example of a market basket: The aim is to find itemsets in

rder to predict accurately

(i.e. with high confidence) a consequent from one or more antecedents.

Algorithms: A-Priori, Tertius and GRI

SLIDE 6

Mathematical Notations

General Notations

p Variables

N Persons

Itemset

k < p

Rule

Identification of frequent itemsets

Itemset frequency: Support: Accuracy (Confidence): 1 2

, , ,

X X X …

( ) ( )

( ) (1) ( )

1 1

k k

X X θ = = ∧ ∧ = …

( ) ( ) ( )

( ) (1) ( ) ( 1)

1 1 1

k k k

X X X θ ϕ

= = ∧ ∧ = ⇒ = = …

( )

( ) k

fr θ

( )

( ) k

s fr θ ϕ = ∧

( )

( ) ( ) ( )

k k k

fr c p fr θ ϕ θ ϕ ϕ θ θ ∧ ⇒ = = = = 1

SLIDE 7

A-priori Algorithm*

Identification of frequent itemsets

Start with one variable, i.e. then Compute the support s > smin List of frequent itemset

Rule generation

Split the itemset in antecedents A and consequent C Compute evaluation measure

Evaluation measures

Prior confidence: Posterior confidence:

…rule confidence

(2) (3)

, ,

* Agrawal & Srikant, 1994

θ θ …

(1)

prior c

C s N =

post a a c

C s s ∧ =

SLIDE 8

Further Algorithms in WEKA

Predictive Apriori

Rules sorted expected predicted accuracy

Tertius

Confirmation values TP-/FP-rate Rules with and/or catenations

SLIDE 9

Generalized Rule Induction (GRI)*

Quantitative measure for interestingness

Ranks competing rules due to this measure Information theoretic entropy-calculation

Rule generation

Basically works like a-priori Algorithm Compute for each rule J-statistic and specialized Js by adding

more antecedents

The J-measure

Entropy: Information Measure (small) J-measure: Js-measure:

* Smyth & Goodman, 1992

( ) ( ) ( ) ( ) ( ) ( )

( )

( ) ( )

2 2

| 1 | | | log 1 | log 1 p x y p x y J x y p y p x y p x y p x p x ⎡ ⎤ ⎛ ⎞ ⎛ ⎞ − = + − ⎢ ⎥ ⎜ ⎟ ⎜ ⎟ − ⎢ ⎥ ⎝ ⎠ ⎝ ⎠ ⎣ ⎦

( ) ( ) ( ) ( ) ( )

( )

2 2

1 1 max | log , 1 | log 1

J p y p x y p y p x y p x p x ⎡ ⎤ ⎛ ⎞ ⎛ ⎞ = − ⎢ ⎥ ⎜ ⎟ ⎜ ⎟ − ⎢ ⎥ ⎝ ⎠ ⎝ ⎠ ⎣ ⎦

( ) ( ) [ ]

2 2

log 1 log 1 0;1 H p p p p = − − − − ∈

SLIDE 10

Sequential Patterns*

Observations over time

Itemsets within each

time point

Customer performs

transaction

Sequence Notation: X > Y (i.e. Y occurs after X)

Rule generation

Compute s by adding successively time points CARMA Algorithm as before

Customer Time 1 Time 2 Time 3 Time 4 1 Cheese Wine Beer

Wine Beer Cheese

Bread Wine Cheese

Crackers Wine Beer

Beer Cheese Bread Cheese 6 Crackers Bread

* Agrawal & Srikant, 1995

SLIDE 11

Outlook

Decision Trees

CART (Breiman et al., 1984) C5.0 (Quinlan, 1996)