A General Model for OLAP of Complex Data Jian Pei State University - - PowerPoint PPT Presentation

a general model for olap of complex data
SMART_READER_LITE
LIVE PREVIEW

A General Model for OLAP of Complex Data Jian Pei State University - - PowerPoint PPT Presentation

A General Model for OLAP of Complex Data Jian Pei State University of New York at Buffalo, USA http://www.cse.buffalo.edu/faculty/jianpei/ Outline Motivation GOLAP a general OLAP model Applying GOLAP on complex data


slide-1
SLIDE 1

A General Model for OLAP of Complex Data

Jian Pei

State University of New York at Buffalo, USA

http://www.cse.buffalo.edu/faculty/jianpei/

slide-2
SLIDE 2

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 2

Outline

  • Motivation
  • GOLAP – a general OLAP model
  • Applying GOLAP on complex data
  • Conclusions
slide-3
SLIDE 3

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 3

OLAP on Relational Data

9 Fall P1 S2 12 Spring P2 S1 6 Spring P1 S1 Measure Dimensions Sales Season Product Store (S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*):9(*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9

Operations:

  • Roll-up
  • Drill-down
  • Slice, dice, pivot (rotate)
slide-4
SLIDE 4

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 4

Why OLAP is Desirable?

  • Multi-level, multi-dimensional

summarization

– Identify multi-level, multi-dimensional trends, changes and exceptions

  • Can we conduct OLAP on complex data?

– Data types: strings, time series, sequences, XML documents, … – “What are the major patterns among the gene expressions that are similar to the given new sample?”

slide-5
SLIDE 5

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 5

Gene Expression Matrix

w11 w12 w13 w21 w22 w23 w31 w32 w33

genes Samples/time

i

g r

i

s r

slide-6
SLIDE 6

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 6

Can We OLAP Gene Expression Data?

  • Gene expression data – matrices

– Oh, it can be treated as a relational table! ☺

  • Syntax problem: what should be the measure?

– SUM, MAX, MIN, AVG? They do not make sense! – The patterns are wanted

  • Semantic problem: what should be the OLAP
  • perations?

– What is the meaning by generalizing (roll up) a sample/gene?

slide-7
SLIDE 7

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 7

Good News, We Are Not Far Away

  • Two major issues in defining an OLAP

model

– How to partition the data into summarization units at various levels? – How to summarize the data?

  • The summarization units for OLAP should

yield to some nice hierarchical structure

– What about a lattice? – It’s nice

slide-8
SLIDE 8

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 8

GOLAP – A General OLAP Model

  • Base database – a set of objects
  • Grouping function

– Map a set of query objects in the base database to the smallest summarization unit covering the query set – Containment: a summarization unit is still in the base database – Monotonicity: Q1 ⊆ Q2 g(Q1) ⊆ g(Q2) – Closure: a summarization unit is self-closed

slide-9
SLIDE 9

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 9

Grouping Function and Class

  • Class: a subset of objects S s.t. g(S) = S

A class A larger class The whole base database itself is a class

slide-10
SLIDE 10

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 10

Grouping Function – Lattice

  • The classes generated by a grouping

function form a lattice

  • Good news: containment, monotonicity

and closure are sufficient to get a nice hierarchical structure!

  • Member function: from class to the set of

members

slide-11
SLIDE 11

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 11

Summarization Function

  • A mapping from a set of objects to a

summary

– A set of sequences the sequential patterns – A set of time series the dominant pattern – A set of XML trees the frequent subtrees

slide-12
SLIDE 12

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 12

OLAP Operations

  • Given

– A grouping function – A summarization function

  • OLAP operations

– Summarize: return the summary of the smallest class covering the query set – Roll up: return the summary of the smallest class covering the query set and the current class – Drill down: return the summary of the smallest class covering the current class except for the query set

slide-13
SLIDE 13

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 13

GOLAP Model and Data Warehouse

  • GOLAP model (g, f)

– g – grouping function – f – summarization function

  • G-warehouse {(c, f(c))}

– c is a class

  • (g1, f1) and (g2, f2) are two GOLAP models.

Then, ((g1,g2), (f1,f2)) is also a GOLAP model

  • GOLAP on relational data is consistent with the

traditional OLAP model

slide-14
SLIDE 14

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 14

Applying GOLAP on Complex Data

  • How to find a meaningful grouping function?

– Use clusters from hierarchical clustering

  • What kind of hierarchical clustering can lead to a

grouping function in GOLAP?

– Each cluster contains a subset of objects – The hierarchy covers every object – The whole set of objects is the root cluster – Ancestor/descendant relation based on containment – For any two clusters c1 and c2, c1 ∩ c2 is a cluster if it is not empty

slide-15
SLIDE 15

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 15

Fixing the Clustering Methods

  • Many hierarchical clustering methods, but

not all, satisfy the requirements

– The requirement “c1 ∩ c2 is a cluster” may be violated by some methods

  • Fix: make the non-empty intersections of

clusters as “intermediate clusters”

slide-16
SLIDE 16

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 16

GeneXplorer: A GOLAP System

  • OLAP gene expression time series data
  • Use a hierarchical clustering

– Based on attraction tree – the index structure

  • f G-data warehouse
  • Coherent patterns as summarization
  • Basic operations

– Roll up – Drill down – Slice

slide-17
SLIDE 17

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 17

Towards Interactive Exploration of Gene Expression Patterns

  • Mine hierarchical

clusters of co- expressed genes and coherent patterns

slide-18
SLIDE 18

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 18

Indexing Clusters

slide-19
SLIDE 19

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 19

Interactive Exploration on Iyer’s Data Set

slide-20
SLIDE 20

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 20

Comparison with Other Methods

0.996 0.976 0.981 0.974 10 0.800 0.844 0.824 0.702 9 0.999 0.914 0.997 0.991 8 0.719 0.990 0.976 0.967 7 0.984 0.970 0.989 0.952 6 0.855 0.868 0.855 0.958 5 0.968 0.883 0.984 0.980 4 0.997 0.994 0.993 0.984 3 0.887 0.991 0.911 0.957 2 0.955 0.884 0.956 0.993 1 CAST(9) CLICK(7) Adapt(7) GeneXplorer(9) Pattern

Each cell represents the similarity between the pattern reported by different approaches and the corresponding pattern in the ground truth

slide-21
SLIDE 21

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 21

Other Features of GeneXplorer

  • Model adjustment – GOLAP models as

plug-ins

– User can change the grouping function and summarization function

  • Gene annotation panel

– Link patterns to ground truth from public annotations – Pattern and object visualization

slide-22
SLIDE 22

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 22

Conclusions

  • Problem: how to construct a general

model for OLAP on complex data?

  • Solution: GOLAP – a general model

– Consistent with traditional OLAP on relational data – Can handle complex data

  • A case study: GeneXplorer
slide-23
SLIDE 23

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 23

Future Work

  • Is it necessary to introduce new OLAP
  • perations for complex data?

– Data/application oriented or general?

  • Efficient implementation of G-warehouse
  • Data integration based on general OLAP
  • n complex data
slide-24
SLIDE 24

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 24

Thank You!

http://www.cse.buffalo.edu/faculty/jianpei/