Methods and Results for Challenge 3A Robert Bruggner, Rachel Finck, - - PowerPoint PPT Presentation

methods and results for challenge 3a
SMART_READER_LITE
LIVE PREVIEW

Methods and Results for Challenge 3A Robert Bruggner, Rachel Finck, - - PowerPoint PPT Presentation

Methods and Results for Challenge 3A Robert Bruggner, Rachel Finck, Robin Jia, Noah Zimmerman Stanford University | rbruggner@stanford.edu FlowCAPII Summit Sept 23 2011 Challenge 3A and Method Overview Challenge 3A and Method Overview


slide-1
SLIDE 1

Methods and Results for Challenge 3A

Robert Bruggner, Rachel Finck, Robin Jia, Noah Zimmerman Stanford University | rbruggner@stanford.edu FlowCAPII Summit • Sept 23 2011

slide-2
SLIDE 2

Challenge 3A and Method Overview

slide-3
SLIDE 3

Challenge 3A and Method Overview

  • Given two tubes of data from a single patient, predict the

antigen used in each tube

slide-4
SLIDE 4

Challenge 3A and Method Overview

  • Given two tubes of data from a single patient, predict the

antigen used in each tube

  • Our Approach:
  • Automatically identify populations of cells by surface marker
  • Extract population meta-features and build model to predict antigen group
slide-5
SLIDE 5

Challenge 3A and Method Overview

  • Given two tubes of data from a single patient, predict the

antigen used in each tube

  • Our Approach:
  • Automatically identify populations of cells by surface marker
  • Extract population meta-features and build model to predict antigen group
  • Identified a highly predictive population for determining

antigen group

slide-6
SLIDE 6

Surface Markers Normalized for Simple Cluster Matching

slide-7
SLIDE 7

Surface Markers Normalized for Simple Cluster Matching

  • Surface marker expression variable between patients
slide-8
SLIDE 8

Surface Markers Normalized for Simple Cluster Matching

  • Surface marker expression variable between patients
  • Need to establish population correspondence
slide-9
SLIDE 9

Surface Markers Normalized for Simple Cluster Matching

  • Surface marker expression variable between patients
  • Need to establish population correspondence
  • Assume bimodal expression & landmark normalize
slide-10
SLIDE 10

Cells Clustered With 2D Density-Based Merging & Greedy Dimensional Exploration

slide-11
SLIDE 11

Cells Clustered With 2D Density-Based Merging & Greedy Dimensional Exploration

  • Data from all patients and

conditions combined

slide-12
SLIDE 12

Cells Clustered With 2D Density-Based Merging & Greedy Dimensional Exploration

  • Data from all patients and

conditions combined

  • Combined data clustered in

all pairwise sets of dimensions

slide-13
SLIDE 13

Cells Clustered With 2D Density-Based Merging & Greedy Dimensional Exploration

  • Data from all patients and

conditions combined

  • Combined data clustered in

all pairwise sets of dimensions

  • Dimensions with highest

confidence clusters selected

slide-14
SLIDE 14

Cells Clustered With 2D Density-Based Merging & Greedy Dimensional Exploration

  • Data from all patients and

conditions combined

  • Combined data clustered in

all pairwise sets of dimensions

  • Dimensions with highest

confidence clusters selected

  • Identified clusters recursively

projected and clustered until no new clusters found

slide-15
SLIDE 15

Per-patient Cluster Meta-features Extracted For Model Construction

slide-16
SLIDE 16

Per-patient Cluster Meta-features Extracted For Model Construction

  • Data separated back into

source components

slide-17
SLIDE 17

Per-patient Cluster Meta-features Extracted For Model Construction

  • Data separated back into

source components

  • Cluster Meta-features

extracted

  • Cluster density
  • Antigen condition density difference

vs negative controls

  • Response of clusters in cytokine

response dimensions as quantified by Earth Mover's Distance (EMD)

slide-18
SLIDE 18

Per-patient Cluster Meta-features Extracted For Model Construction

  • Data separated back into

source components

  • Cluster Meta-features

extracted

  • Cluster density
  • Antigen condition density difference

vs negative controls

  • Response of clusters in cytokine

response dimensions as quantified by Earth Mover's Distance (EMD)

  • Logistic Regression

Classification Model built from features

GLMNET

slide-19
SLIDE 19

Cross validation Used to Identify Optimal Classifier and Features

slide-20
SLIDE 20

Cross validation Used to Identify Optimal Classifier and Features

  • 100 runs of random 3-fold internal cross validation using

different combinations of features

slide-21
SLIDE 21

Cross validation Used to Identify Optimal Classifier and Features

  • 100 runs of random 3-fold internal cross validation using

different combinations of features

  • Logistic regression model using cluster difference and EMD

features had best performance

slide-22
SLIDE 22

Cross validation Used to Identify Optimal Classifier and Features

  • 100 runs of random 3-fold internal cross validation using

different combinations of features

  • Logistic regression model using cluster difference and EMD

features had best performance

  • Used to predict test labels
slide-23
SLIDE 23

Density of CD4/CD8 Double Positive T

  • cell Population

Most Important Factor in Logistic Regression Model

slide-24
SLIDE 24

Density of CD4/CD8 Double Positive T

  • cell Population

Most Important Factor in Logistic Regression Model

GAG# ENV#

0.21%# 0.18%# 0.42%# 0.27%#

GAG# ENV#

0.21%# 0.18%# 0.42%# 0.27%#

slide-25
SLIDE 25

Density of CD4/CD8 Double Positive T

  • cell Population

Most Important Factor in Logistic Regression Model

  • Backgating suggest possibly two subpopulations within

CD4/CD8 cells

slide-26
SLIDE 26

Thoughts & Future Work

slide-27
SLIDE 27

Thoughts & Future Work

  • Identification of CD4+/CD8+ population highlights

unbiased nature of method

slide-28
SLIDE 28

Thoughts & Future Work

  • Identification of CD4+/CD8+ population highlights

unbiased nature of method

  • Need to identify all potentially predictive features and

their predictive power for users

slide-29
SLIDE 29

Thoughts & Future Work

  • Identification of CD4+/CD8+ population highlights

unbiased nature of method

  • Need to identify all potentially predictive features and

their predictive power for users

  • Automated methods critical for comprehensive

exploration of higher-dimensional data

slide-30
SLIDE 30

Thanks & Questions

slide-31
SLIDE 31

Thanks & Questions

  • J. Irish, D. Parks, R. Tibshirani, D. Dill, & G. Nolan
slide-32
SLIDE 32

Thanks & Questions

  • J. Irish, D. Parks, R. Tibshirani, D. Dill, & G. Nolan
  • FlowCAPII Committee
slide-33
SLIDE 33

Thanks & Questions

  • J. Irish, D. Parks, R. Tibshirani, D. Dill, & G. Nolan
  • FlowCAPII Committee
  • NIAID
slide-34
SLIDE 34

Thanks & Questions

  • J. Irish, D. Parks, R. Tibshirani, D. Dill, & G. Nolan
  • FlowCAPII Committee
  • NIAID
  • Questions?

rbruggner@stanford.edu