Reductions for Frequency- Based Data Mining Problems
Stefan Neumann & Pauli Miettinen
Reductions for Frequency- Based Data Mining Problems Stefan Neumann - - PowerPoint PPT Presentation
Reductions for Frequency- Based Data Mining Problems Stefan Neumann & Pauli Miettinen Maximal Frequent Patterns A pattern is a subset of the data entities itemset, subgraph, subsequence, A pattern is frequent if it appears su
Stefan Neumann & Pauli Miettinen
unknown
⇒ takes exponential time
(enumeration complexity, Johnson et al. 1988)
(counting complexity, Valiant 1979)
MaxFS(BDG3) MaxFS(BTW3) MaxFS(G) MaxFS(PLN) MaxFS(T) MaxFS(DAG) MaxFS(DirG) MaxFIS MaxSQS Uniquely labelled undirected graphs
with degree ≤ 3
with treewidth ≤ 3 Planar undir. graphs
Directed cyclic graphs Directed graphs Sequences with no repetition Itemsets A → B = A can be reduced to B
MaxFS(BDG3) MaxFS(BTW3) MaxFS(G) MaxFS(PLN) MaxFS(T) MaxFS(DAG) MaxFS(DirG) MaxFIS MaxSQS A → B = A can be reduced to B
transactions
A B C A D C A B D
tid A–B A–D B–C B–D C–D 1 1 1 1 2 1 1 1 3 1 1 1
D B C
C D
B C
A B
B C D
A B C D
constrain the feasible patterns
itemsets, and
the feasible maximum itemsets
MaxFS(BDG3) MaxFS(BTW3) MaxFS(G) MaxFS(PLN) MaxFS(T) MaxFS(DAG) MaxFS(DirG) MaxFIS MaxSQS A → B = A can be reduced to B
MaxFS(BDG3) MaxFS(BTW3) MaxFS(G) MaxFS(PLN) MaxFS(T) MaxFS(DAG) MaxFS(DirG) MaxFIS MaxSQS A → B = A can be reduced to B
whether there is any more feasible patterns is NP-hard
enumeration can be done in polynomial time
well
standard level-wise algorithms
repetitions