Similarity Matching of Temporal Event-Interval Sequences S . MO H - - PowerPoint PPT Presentation

▶

Mar 29, 2023 278 likes •429 views

Similarity Matching of Temporal Event-Interval Sequences S . MO H A MMA D MI R B A G H E R I A N D H O WA RD J . H A MI LTO N U N I V E R S I T Y O F R E G I N A , R E G I N A , C A N A D A Outline 1. Introduction 2. Problem

SLIDE 1

Similarity Matching of Temporal Event-Interval Sequences

S . MO H A MMA D MI R B A G H E R I A N D H O WA RD J . H A MI LTO N U N I V E R S I T Y O F R E G I N A , R E G I N A , C A N A D A

SLIDE 2

Outline

1. Introduction 2. Problem Statement 3. Similarity Matching 4. Experiments 5. Conclusion

SLIDE 3

Introduction

Interval-based event sequences (e-sequence )
Sequences of events persist over intervals of time of varying lengths
Exist in many application domains such as medicine, sensor networks,

and sign languages

E-sequence dataset
Contains longitudinal data : instances are described by a series of

event intervals

No features with a single value
Not organized appropriately for standard machine learning algorithms

SLIDE 4

Problem Statement

Event interval:
A triple e = (l, b, f) with event label, beginning and finishing time, e.g.,

(A ,4 ,8)

E-sequence:
A list of m event intervals placed in ascending order based on their

beginning times, e.g., = <(A,4,8),(B,6,12),(C,14,18),(D,20,22) >

E-sequence dataset:
Set of n e-sequences {,…, } where each e-sequence is

associated with an unique identifier .

SLIDE 5

Problem Statement

An example

e-sequence dataset with 4 e-sequences (e.g., 4 patients ) and 6 event labels (e.g., type of diseases)

SLIDE 6

Problem Statement

E-sequence sliced time: {4,5,10,12,14,16,18,20,22}
Coincidence:
(5,10) = {C,D,E}
Coincidence label sequence (L-sequence):
Ordered list of coincidences excluding gaps e.g.,

= < {E},{C,D,E},{C,E},{E},{B},{B,F},{B} >

SLIDE 7

Problem Statement

Problem:
Similarity searching and matching of full-length e-sequences
Contributions:
We propose and evaluate three novel approaches
We intuitively view the similarity between two e-sequences and in

terms of:

Presence of event intervals with the same event labels
Order of occurrences of these event intervals
Duration of the event intervals
Temporal relations among these event intervals

SLIDE 8

Similarity Matching

1. Matching Using Relative Frequency
Relative Frequency:

Duration event interval E in : d(E) = 14-4=10 Duration e-sequence : d() = 22-4= 18

Function

maps an e-sequence to a vector of

the relative frequencies of event labels

Distance between the relative frequency vectors of

e-sequences and

# event labels event labels

SLIDE 9

Similarity Matching

2. Matching Using Position Code
Position Code:
Function

maps an e-sequence to a vector

f the position codes of event labels
Distance between the position code vectors of

e-sequences and

Coincidence L-sequence

SLIDE 10

Similarity Matching

3. Matching Using Multiple Kernel Learning
Distance between two e-sequences and

based on Multiple Kernel Learning

number of kernels weight of functions (kernels) functions: e.g., {ERF, EPC}

SLIDE 11

Experiments

Eight real-world datasets
Method of evaluation:
Perform 1-NN classification

prediction for every e-sequence in datasets and

Recording the fraction of correct

predictions (accuracy)

Three existing competitors:
DTW-based method
Artemis
IBSM

SLIDE 12

Experiments

Results:
EMKL outperforms the Artemis

and DTW-based methods on all datasets

EMKL vs IBSM:
Outperforms on four datasets,
Ties on two datasets
Loses on two datasets

SLIDE 13

Conclusion

We propose three distance functions to match the full-length of

event interval sequences.

1. The ERF function measures the distance of e-sequences based on

the relative frequency of the event intervals.

2. The EPC function matches e-sequences based on the position codes
f the event intervals.
3. The EMKL function combines the ERF and EPC functions.
Experiments show that EMKL method is an effective approach to

the task of matching of full-length e-sequences and it is a better choice compared to the state-of-the-art methods.