[PPT] - Big Data Algorithms with Medical Applications Yixin Chen Outline PowerPoint Presentation

SLIDE 1

Big Data Algorithms with Medical Applications

Yixin Chen

SLIDE 2

Outline

Challenges to big data algorithms Clinical Big Data Our new algorithms

SLIDE 3

Small data vs. Big data

SLIDE 4

Small data vs. Big data

一般性规律 VS 特殊性规律

SLIDE 5

Small data vs. Big data

Causality

Association

Domain knowledge Data knowledge

SLIDE 6

Small data vs. Big data Models

Data Size Model Quality Big Data Small Data

SLIDE 7

Modeling techniques

Parametric VS Non-parametric

Efficiency interpretability

Accuracy

SLIDE 8

Efficiency of big data models

High efficiency

Parallelization (constant speedup)
Algorithmic improvements (e.g. O(N3) vs O(N2))

Large-scale Manifold Learning Maximum Variance Correction (Chen et al. ICML’13)

SLIDE 9

Outline

Challenges to big data algorithms Clinical Big Data Our new algorithms

SLIDE 10

The need for clinical prediction

The ICU direct costs per day for

survivors is between six and seven times those for non-ICU care.

Unlike patients at ICUs, general

hospital wards (GHW) patients are not under extensive electronic monitoring and nurse care.

Clinical study has found that 4–17%
f patients will undergo

cardiopulmonary or respiratory arrest while in the GHW of hospital.

SLIDE 11

Goal: Let Data Speak!

Sudden deteriorations (e.g. septic shock, cardiopulmonary or respiratory arrest) of GHW patients can often be severe and life threatening. Goal: Provide early detection and intervention based on data mining to prevent these serious,

ften life-threatening events. Using both clinical

data and wireless body sensor data A NSF/NIH funded clinical trial at Washington University/Barnes Jewish Hospital

SLIDE 12

Clinical Data: high-dimensional real-time time-series data 34 vital signs: pulse, temperature, oxygen saturation, shock index, respirations, blood pressure, …

Time/second Time/second

SLIDE 13

Previous Work

Main problems : Most previous general work uses a snapshot method that takes all the features at a given time as input to a model, discarding the temporal evolving of data Medical data mining

medical knowledge machine learning methods

SCAP and PSI

Acute Physiology Score, Chronic Health Score , and APACHE score are used to predict renal failures

Modified Early Warning Score (MEWS)

decision trees neural networks

SVM

SLIDE 14

Machine learning task

5000 10000 15000 20000 25000 30000 Non-ICU ICU

Challenges:

Classification of high-

dimensional time series data

Irregular data gaps
measurement errors
class imbalance

SLIDE 15

Solution based on existing techniques

Temporal feature extraction Bootstrap aggregating (bagging) Exploratory under-sampling Feature selection Exponential moving average smoothing Basic classifier (Mao et al. KDD’12)

SLIDE 16

Solution based on existing techniques

Temporal feature extraction Bootstrap aggregating (bagging) Exploratory under-sampling Feature selection Exponential moving average smoothing Basic classifier (Mao et al. KDD’12)

SLIDE 17

Nonlinear classification ability
Interpretability
Support for mixed data types
Efficiency
Multi-class classification

Desired Classifier Properties

Linear SVM and Logistic Regression Interpretable and efficient but linear SVM with RBF kernels Nonlinear but not interpretable; inefficient

SLIDE 18

kNN NB

NN

LR

Linear SVM Kernel SVM

Nonlinear classification ability Y N Y N N Y Interpretability N Y N Y Y N Direct support for mixed data types Y Y N N N N Efficiency Y Y Y Y Y N Multi-class classification Y Y Y Y N N

Desired Classifier Properties

SLIDE 19

Random kitchen sinks (RKS)

Random nonlinear feature transformation Parametric, linear classifier

1. Transform each input x into: exp(-i wk x), k= 1, …, K, wk ~ Gaussian distribution p(w)

2. Learn a linear model ∑ αk exp(-i wk x)

Theory: based on Fourier transformation, RKS converges to RBF-SVM with large K Efficiency, but no interpretability

SLIDE 20

Outline

Challenges to big data algorithms Clinical Big Data Our new algorithms

SLIDE 21

Key Idea: Hybrid Model

Non-parametric, Nonlinear Feature Transformation Parametric, Linear Classifier

Efficiency Interpretability Nonlinearity

SLIDE 22

kNN NB

NN

LR

Linear SVM Kernel SVM DLR

Nonlinear classification ability Y N Y N N Y Y Interpretability N Y N Y Y N Y Direct support for mixed data types Y Y N N N N Y Efficiency Y Y Y Y Y N Y Multi-class classification Y Y Y Y N N Y

Desired Classifier Properties

DLR: Density-based Logistic Regression (Chen et al., KDD’13)

SLIDE 23

Each instance has D features:

Logistic Regression

Training dataset: Optimization: maximize the overall log likelihood

where τ(x)

Assume:

SLIDE 24

Problem with linear models

If we set , what should be ϕd(x)?

SLIDE 25

Insights on τ(x)

(Logistic regression) On the other hand: Hence: LR:

SLIDE 26

Factorization in DLR

Assumption:

SLIDE 27

, where

DLR Feature Transformation

is an increasing function of

SLIDE 28

Conditional Probability Estimation

Numerical : Kernel density estimation Categorical xd : (smoothed histogram)

SLIDE 29

kernel bandwidth

Kernel density estimation

Training dataset: where

SLIDE 30

DLR Learning

Maximize the overall log likelihood Objective: A function of

SLIDE 31

Overview of DLR

Initialize h and w Update w

Calculate new feature vector

Update h Converged? No

SLIDE 32

Fix and optimize (steepest gradient descent) Repeat until convergence (using a LR solver) Fix and optimize

Optimization

Initial h iter 1 Iter 2 Iter 3

SLIDE 33

Interpretability

DLR:

For example, represents a particular disease If represents the blood pressure (BP) of a patient On disease level Ranking can identify the risk factors of this disease indicates the abnormality of his BP indicates the extent of BP resulting in his disease On patient level

SLIDE 34

Kernel

Ideal kernel: RBF kernel:

doesn’t consider the label information

SLIDE 35

DLR Kernel

DLR kernel: indicates same label indicates different label

SLIDE 36

DLR on example data

Original LR Density-based LR Test Data:

SLIDE 37

Accuracy on UCI Datasets

Better

numerical categorical

SLIDE 38

Training Time

Better

numerical categorical

SLIDE 39

Results on clinical data

SVM: 0.9194 DLR: 0.9204

Accuracy: LR: 0.9141

Early alert when the patient appears normal to the best doctors in the world

SLIDE 40

DLR for real large data

estimation: kernel density smoothing Still too slow for big data Testing time grows as get larger No curse of dimensionality for estimation Ultra-fast training and testing estimation: histogram

SLIDE 41

DLR with Bins

SLIDE 42

DLR with Bins

Not smooth Not enough data

SLIDE 43

Histogram KDE Smoothing

where is the number of label in bin i is the number of instances in bin i

SLIDE 44

Different Number of Bins

5 bins 20 bins 100 bins

SLIDE 45

Results on accuracy

Splice 1K Mush 8K w5a 10K w8a 50K Adult 30K kddcu p 1.26M linearSVM

75 100 98.15 98.57 60.03 99.99

LR

77 99.87 97.67 98.24 84.80 99.99

RBF SVM

80 99.23 97.14 97.20 75.29 N/A

DLR-b

88 99.95 98.26 98.55 85.54 99.99

SLIDE 46

Results on efficiency

Splice 1K Mush 8K w5a 10K w8a 50K Adult 30K kddcu p 1.26M linearSVM 0.12 0.56 1.16

15 2847 81.70

LR

0.15 0.21 0.18 0.7 2.89 55.66

RBF SVM 0.09 1.63 1.60

29 217 N/A

DLR-b

0.22 0.32 2.65 7.6 0.6 17.93

SLIDE 47

Feature Selection Ability

DLR:

l1-regularization: loss(w) + c∑max(wd,0)

non-smooth optimization

However, in DLR, we can simply use c ∑wd

along with constraints wd ≥ 0

smooth optimization

SLIDE 48

Top features selected by DLR standard deviation of heart rate ApEn of heart rate Energy of oxygen saturation LF of oxygen saturation LF of heart rate DFA of oxygen saturation Mean of heart rate HF of heart rate Inertia of heart rate Homogeneity of heart rate Energy of heart rate linear correlation of heart rate of oxygen saturation

SLIDE 49

Nonlinear classification ability
Support for mixed data types
Interpretability
Efficiency
Multi-class classification

Conclusions on DLR

DLR satisfies all the following:

Try it out!

http://www.cse.wustl.edu/~wenlinchen/project/DLR/

SLIDE 50

Hybrid!
Non-parametric + parametric
Association + causality
Generative + discriminative
Balance accuracy and speed
For real big data, get rid of heavy machinery
Let accuracy grow with data size
Linear model would suffice with enough

nonlinearity/randomness

Big Data Algorithms

SLIDE 51

Thank you

SLIDE 52

第

大数据时代的挑战：

麦肯锡全球研究院报告：大数据人才稀缺

人才

SLIDE 53

kNN NB

NN

LR

Linear SVM Kernel SVM Random Kitchen Sinks

Nonlinear classification ability Y N Y N N Y Y Interpretability N Y N Y Y N N Direct support for mixed data types Y Y N N N N N Efficiency Y Y Y Y Y N Y Multi-class classification Y Y Y Y N N N

RKS: Linear model over nonlinear features

RBF SVM: k(x,x’) =

SLIDE 54

Gaussian Naive Bayes

Assumption: Gaussian:

SLIDE 55

LR and GNB

Both GNB and LR express in a linear model GNB learns under GNB assumption LR learns using maximum likelihood of the data

Assumption:

SLIDE 56

Motivation

NB LR Assumption: