Similarity Matching of Temporal Event-Interval Sequences S . MO H - - PowerPoint PPT Presentation

similarity matching of temporal event interval sequences
SMART_READER_LITE
LIVE PREVIEW

Similarity Matching of Temporal Event-Interval Sequences S . MO H - - PowerPoint PPT Presentation

Similarity Matching of Temporal Event-Interval Sequences S . MO H A MMA D MI R B A G H E R I A N D H O WA RD J . H A MI LTO N U N I V E R S I T Y O F R E G I N A , R E G I N A , C A N A D A Outline 1. Introduction 2. Problem


slide-1
SLIDE 1

Similarity Matching of Temporal Event-Interval Sequences

S . MO H A MMA D MI R B A G H E R I A N D H O WA RD J . H A MI LTO N U N I V E R S I T Y O F R E G I N A , R E G I N A , C A N A D A

slide-2
SLIDE 2

Outline

1. Introduction 2. Problem Statement 3. Similarity Matching 4. Experiments 5. Conclusion

2

slide-3
SLIDE 3

Introduction

  • Interval-based event sequences (e-sequence )
  • Sequences of events persist over intervals of time of varying lengths
  • Exist in many application domains such as medicine, sensor networks,

and sign languages

  • E-sequence dataset
  • Contains longitudinal data : instances are described by a series of

event intervals

  • No features with a single value
  • Not organized appropriately for standard machine learning algorithms

3

slide-4
SLIDE 4

Problem Statement

  • Event interval:
  • A triple e = (l, b, f) with event label, beginning and finishing time, e.g.,

(A ,4 ,8)

  • E-sequence:
  • A list of m event intervals placed in ascending order based on their

beginning times, e.g., = <(A,4,8),(B,6,12),(C,14,18),(D,20,22) >

  • E-sequence dataset:
  • Set of n e-sequences {,…, } where each e-sequence is

associated with an unique identifier .

4

slide-5
SLIDE 5

Problem Statement

An example

  • f

e-sequence dataset with 4 e-sequences (e.g., 4 patients ) and 6 event labels (e.g., type of diseases)

5

slide-6
SLIDE 6

Problem Statement

  • E-sequence sliced time: {4,5,10,12,14,16,18,20,22}
  • Coincidence:
  • (5,10) = {C,D,E}
  • Coincidence label sequence (L-sequence):
  • Ordered list of coincidences excluding gaps e.g.,

= < {E},{C,D,E},{C,E},{E},{B},{B,F},{B} >

6

slide-7
SLIDE 7

Problem Statement

  • Problem:
  • Similarity searching and matching of full-length e-sequences
  • Contributions:
  • We propose and evaluate three novel approaches
  • We intuitively view the similarity between two e-sequences and in

terms of:

  • Presence of event intervals with the same event labels
  • Order of occurrences of these event intervals
  • Duration of the event intervals
  • Temporal relations among these event intervals

7

slide-8
SLIDE 8

Similarity Matching

  • 1. Matching Using Relative Frequency
  • Relative Frequency:

Duration event interval E in : d(E) = 14-4=10 Duration e-sequence : d() = 22-4= 18

  • Function

maps an e-sequence to a vector of

the relative frequencies of event labels

  • Distance between the relative frequency vectors of

e-sequences and

8

# event labels event labels

slide-9
SLIDE 9

Similarity Matching

  • 2. Matching Using Position Code
  • Position Code:
  • Function

maps an e-sequence to a vector

  • f the position codes of event labels
  • Distance between the position code vectors of

e-sequences and

9

Coincidence L-sequence

slide-10
SLIDE 10

Similarity Matching

  • 3. Matching Using Multiple Kernel Learning
  • Distance between two e-sequences and

based on Multiple Kernel Learning

10

number of kernels weight of functions (kernels) functions: e.g., {ERF, EPC}

slide-11
SLIDE 11

Experiments

  • Eight real-world datasets
  • Method of evaluation:
  • Perform 1-NN classification

prediction for every e-sequence in datasets and

  • Recording the fraction of correct

predictions (accuracy)

  • Three existing competitors:
  • DTW-based method
  • Artemis
  • IBSM

11

slide-12
SLIDE 12

Experiments

  • Results:
  • EMKL outperforms the Artemis

and DTW-based methods on all datasets

  • EMKL vs IBSM:
  • Outperforms on four datasets,
  • Ties on two datasets
  • Loses on two datasets

12

slide-13
SLIDE 13

Conclusion

  • We propose three distance functions to match the full-length of

event interval sequences.

  • 1. The ERF function measures the distance of e-sequences based on

the relative frequency of the event intervals.

  • 2. The EPC function matches e-sequences based on the position codes
  • f the event intervals.
  • 3. The EMKL function combines the ERF and EPC functions.
  • Experiments show that EMKL method is an effective approach to

the task of matching of full-length e-sequences and it is a better choice compared to the state-of-the-art methods.

13