Theory in Practice: Modeling in Neuroimaging How to model big MRI - - PowerPoint PPT Presentation

▶

theory in practice modeling

Theory in Practice: Modeling in Neuroimaging How to model big MRI - - PowerPoint PPT Presentation

Dec 31, 2023 28 likes •941 views

Theory in Practice: Modeling in Neuroimaging How to model big MRI datasets Outline of talk Theory recap: modelling approaches can be reduced to two types: predictive and descriptive Big data complicates our ability to apply

slide-1

SLIDE 1

Theory in Practice: Modeling in Neuroimaging

How to model “big” MRI datasets

slide-2

SLIDE 2

Outline of talk

Theory recap: modelling approaches can be reduced to two types:

predictive and descriptive

“Big data” complicates our ability to apply both approaches
Marginal Modelling is a good approach good for descriptive modelling
Functional Random Forests is a good approach for predictive

modelling

Other approaches can also handle big data, but are beyond the scope
f this workshop

slide-3

SLIDE 3

Before even considering models, we need to know what question to ask

How and where may cortical thickness be associated with working

memory performance?

slide-4

SLIDE 4

Before even considering models, we need to know what question to ask

How and where may cortical thickness be associated with working

memory performance?

Can measures of functional brain organization predict an individual’s

working memory ability?

slide-5

SLIDE 5

Each question requires a different modelling approach

How and where may cortical thickness be associated with working

memory performance? Descriptive modelling

Can measures of functional brain organization predict an individual’s

working memory ability? Predictive modelling

slide-6

SLIDE 6

Descriptive models measure what one has collected predictive models measure what one will collect

https://www.educba.com/predictive-analytics-vs-descriptive-analytics/

5

slide-7

SLIDE 7

Descriptive models explore data, predictive models confirm properties of data

https://www.educba.com/predictive-analytics-vs-descriptive-analytics/

5

slide-8

SLIDE 8

Descriptive models provide insight, predictive models apply insight

https://www.educba.com/predictive-analytics-vs-descriptive-analytics/

5

slide-9

SLIDE 9

Descriptive models are limited to in-sample data, predictive models require out-of-sample data

https://www.educba.com/predictive-analytics-vs-descriptive-analytics/

5

slide-10

SLIDE 10

Descriptive models are assessed via theory and inference, predictive models are assessed by independent testing

https://www.educba.com/predictive-analytics-vs-descriptive-analytics/

5

5

slide-11

SLIDE 11

Outline of talk

Theory recap: modelling approaches can be reduced to two types:

predictive and descriptive

“Big data” complicates our ability to apply both approaches
Marginal Modelling is a good approach for descriptive modelling
Functional Random Forests is a good approach for predictive

modelling

Other approaches can also handle big data, but are beyond the scope
f this workshop

slide-12

SLIDE 12

First, all health-focused imaging studies should probably be big data

https://www.cell.com/neuron/pdf/S0896-6273(17)31141-8.pdf

slide-13

SLIDE 13

Our ABCD pipeline generates anywhere from 10 to 90 thousand tests

https://www.cell.com/neuron/pdf/S0896-6273(17)31141-8.pdf

slide-14

SLIDE 14

Our ABCD pipeline generates anywhere from 10 to 90 thousand tests (some special cases are in hundreds)

https://www.cell.com/neuron/pdf/S0896-6273(17)31141-8.pdf

slide-15

SLIDE 15

We’ve collected about 10,000 cases

https://www.cell.com/neuron/pdf/S0896-6273(17)31141-8.pdf

slide-16

SLIDE 16

ABCD needed a lot of coordination and data aggregation to collect over 10,000 participants

Auchter et al, 2018, https://doi.org/10.1016/j.dcn.2018.04.003

slide-17

SLIDE 17

Descriptive models must take into account this nested structure

Complex models may be slow to calculate when analyzing ~4500

participants

Permutation tests may take days or even weeks
Permutation tests lack exchangeability for complex questions

slide-18

SLIDE 18

Permutation testing can reveal whether differences in community structure are significantly different

Hirschhorn,2005, https://doi.org/10.1038/nrg1521

depression

slide-19

SLIDE 19

Permute group assignment and calculate statistic

Hirschhorn,2005, https://doi.org/10.1038/nrg1521

depression ‘depression’ no depression ‘no depression’

slide-20

SLIDE 20

Do so for multiple permutations and construct a distribution of the statistic for permuted groups

Hirschhorn,2005, https://doi.org/10.1038/nrg1521

depression ‘depression’ no depression ‘no depression’

slide-21

SLIDE 21

P value is determined by the proportional rank

f the observed statistic compared to the

permuted distribution

Frequency

slide-22

SLIDE 22

At a Z=2.3, false positive rates are high when not using permutation testing

slide-23

SLIDE 23

At a Z=3.1, false positive rates are generally better and in-line with the true FP rate

slide-24

SLIDE 24

This all works because each individual is independently acquired from one another – the data are exchangeable

slide-25

SLIDE 25

Independence gets more complicated when you have more complicated designs – but even here we can exchange every individual

Anderson and Braak, 2003, JSCS; 10.1080=0094965021000015558 Drug use

Cannabis Alcohol Nicotine Stimulant

slide-26

SLIDE 26

However, if a second factor is nested, our permutations are limited to the nested pairs, restricting our permutations

Anderson and Braak, 2003, JSCS; 10.1080=0094965021000015558 Drug use

Cannabis Alcohol Nicotine Stimulant

Family nested by drug use

slide-27

SLIDE 27

More complex designs have even more restrictions, relative to the total number of permutations

Anderson and Braak, 2003, JSCS; 10.1080=0094965021000015558 Drug use

Cannabis Alcohol Nicotine Stimulant

Hometown

slide-28

SLIDE 28

In turn, restricted permutations have reduced power when controlling for the false positive rate

Anderson and Braak, 2003, JSCS; 10.1080=0094965021000015558

slide-29

SLIDE 29

Predictive models must also take into account nested structure

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5736019/

slide-30

SLIDE 30

Scanner effects can be common, independent

f site

Gareth Harman, 4/11/19 – combat Cortical Thickness

slide-31

SLIDE 31

ComBat has also been used to correct for ABCD data, which can be predicted by site

Nielson, 2018, biorxiv; http://dx.doi.org/10.1101/309260 Site classification accuracy

slide-32

SLIDE 32

Cross-validation strategies can mitigate known but not unknown effects

Stratified validation is possible via independent stratified groups
Leave-one-site-out validation can help catch site effects
But what about effects of scanner upgrades, software maintenance,
r even changes in personnel?

slide-33

SLIDE 33

Outline of talk

Theory recap: modelling approaches can be reduced to two types:

predictive and descriptive

“Big data” complicates our ability to apply both approaches
Marginal Modelling is a good approach for descriptive modelling
Functional Random Forests is a good approach for predictive

modelling

Other approaches can also handle big data, but are beyond the scope
f this workshop

slide-34

SLIDE 34

The marginal model may be a more feasible solution for modeling ABCD populations

Strengths:
Marginal model makes few assumptions with respect to the data
Nested-designs can be modeled or unmodeled, and left to the error term (hopefully)
Individual cases can be incomplete or missing for a marginal model
Longitudinal designs are feasible within the marginal model framework
Marginal model has a closed-form solution to the equation via a Sandwich

Estimator (SwE)

It’s fast, and can be feasibly run with limited resources on lots of data
Use of a wild bootstrap (WB) provides an NHST framework for complex

questions

slide-35

SLIDE 35

Critical limitations

The marginal model cannot be used to draw inferences about

individuals within a population

It is an exploratory approach, which can be verified using subsequent

confirmatory approaches

DEAP can help conform such analyses to best standards and practices through

pre-registered reports, reproducibility, and independent validation

slide-36

SLIDE 36

Bryan Gillaume’s and Tom Nichols implemented an approach that uses a sandwich estimator to solve a marginal model

Imaging Volume(s) Statistical T map for inference Estimate FE covariance (SwE) Calculate subject /groups covariance (residuals) Perform small sample adj. Design matrix Perform Wald Test Compute model Y/X = Beta

slide-37

SLIDE 37

Marginal models are effectively linear, so we first estimate the parameters for our design matrix by dividing the imaging measure (Y) by the design (X)

Imaging Volume(s) Design matrix Compute model Y/X = Beta

slide-38

SLIDE 38

For our software, the design matrix is just your non-imaging data

Imaging Volume(s) Design matrix Compute model Y/X = Beta

slide-39

SLIDE 39

So for example, with the ABCD data we can input measures and test a model

Imaging Volume(s) Design matrix Compute model Y/X = Beta Marginal model: y ~ RT

slide-40

SLIDE 40

A sandwich estimator is used to estimate covariance and determine the fixed effects parameters

Imaging Volume(s) Estimate FE covariance (SwE) Design matrix Compute model Y/X = Beta

slide-41

SLIDE 41

To handle nested structure, group covariance can be calculated separately (CRITICAL FOR ABCD)

Imaging Volume(s) Estimate FE covariance (SwE) Calculate subject /groups covariance (residuals) Design matrix Compute model Y/X = Beta

slide-42

SLIDE 42

For ABCD, it is good to control for site and gender

Imaging Volume(s) Estimate FE covariance (SwE) Calculate subject /groups covariance (residuals) Design matrix Compute model Y/X = Beta

site gender 14 2 5 2

slide-43

SLIDE 43

If needed we can perform a small sample size adjustment – this may be important if we used family as a nesting variable

Imaging Volume(s) Estimate FE covariance (SwE) Calculate subject /groups covariance (residuals) Perform small sample adj. Design matrix Compute model Y/X = Beta

slide-44

SLIDE 44

Finally, a Wald test extracts a t-map for statistical inference

Imaging Volume(s) Statistical T map for inference Estimate FE covariance (SwE) Calculate subject /groups covariance (residuals) Perform small sample adj. Design matrix Perform Wald Test Compute model Y/X = Beta

slide-45

SLIDE 45

The statistical map looks like this

Imaging Volume(s) Statistical T map for inference Estimate FE covariance (SwE) Calculate subject /groups covariance (residuals) Perform small sample adj. Design matrix Perform Wald Test Compute model Y/X = Beta

slide-46

SLIDE 46

Use of a wild bootstrap enables inference similar to a permutation test – so we can control for the FWER

Imaging Volume(s) Statistical T map for inference Estimate FE covariance (SwE) Calculate subject /groups covariance (residuals) Perform small sample adj. Design matrix Perform Wald Test Wild bootstrap WB maps Cluster detection/ TFCE Inference map Compute model Y/X = Beta

slide-47

SLIDE 47

Such a test allows us to detect significant clusters

Imaging Volume(s) Statistical T map for inference Estimate FE covariance (SwE) Calculate subject /groups covariance (residuals) Perform small sample adj. Design matrix Perform Wald Test Wild bootstrap WB maps Cluster detection/ TFCE Inference map Compute model Y/X = Beta

slide-48

SLIDE 48

Wild bootstrap

WB_value = fitted_value + residual_value*sample_value
Sample with replacement can be from simple or complex

distributions:

Radenbacher (-1, 1) would mean we either:
WB_value = fitted_value – residual_value
WB_value = fitted_value + residual_value
However, LOTS of possible distributions, so choice of distribution is

important.

slide-49

SLIDE 49

We have begun to implement a standalone MarginalModelCifti package in R

Alpha version will be released at -- http://github.com/dcan-labs/MarginalModelCifti

slide-50

SLIDE 50

The main wrapper for MarginalModelCifti takes in imaging volumes and prepares them for analysis

Imaging Volume(s) PrepCIFTI/Sur f/Vol

slide-51

SLIDE 51

ComputeMM is applied to the prepared data; user specifies the model using Wilkinson notation and wraps the SwE and Wald Test using Geepack

Imaging Volume(s) PrepCIFTI/Sur f/Vol ComputeMM Statistical T map for inference Y ~ group + treatment

slide-52

SLIDE 52

ComputeMM_WB generates the WB maps used to draw inferences about the T map

Imaging Volume(s) PrepCIFTI/Sur f/Vol ComputeMM ComputeMM_WB Null Distribution Statistical T map for inference

slide-53

SLIDE 53

In turn a family of functions are used to parallellize ComputeMM_WB

Imaging Volume(s) PrepCIFTI/Sur f/Vol ComputeMM ComputeMM_WB Null Distribution Statistical T map for inference ApplyWB_to_data ComputeFits ComputeResiudals ComputeZscores GetSurfAreas GetVolAreas

slide-54

SLIDE 54

Cluster detection is performed within the main wrapper, using information from both processes

Imaging Volume(s) PrepCIFTI/Sur f/Vol ComputeMM ComputeMM_WB Null Distribution Cluster detection/ TFCE Inference map Statistical T map for inference

slide-55

SLIDE 55

The MarginalModelCifti package comprises multiple functions that can be accessed by anyone

slide-56

SLIDE 56

Functions are documented in accordance with CRAN guidelines

slide-57

SLIDE 57

Here are all the parameters for ConstructMarginalModel()

slide-58

SLIDE 58

To make things easier – we’ve made a jupyter notebook that can be used as a reference

slide-59

SLIDE 59

Outline of talk

Theory recap: modelling approaches can be reduced to two types:

predictive and descriptive

“Big data” complicates our ability to apply both approaches
Marginal Modelling is a good approach for descriptive modelling
Functional Random Forests is a good approach for predictive

modelling

Other approaches can also handle big data, but are beyond the scope
f this workshop

slide-60

SLIDE 60

Nested structures -- people belong to multiple subtypes

SODA COKE POP Dialect preferences: soda, coke or pop? Feczko, Miranda-Dominguez, Marr, Graham, Nigg, Fair, TICS, 2019, DOI: https://doi.org/10.1016/j.tics.2019.03.009

slide-61

SLIDE 61

Nested structures -- people belong to multiple subtypes

DEM GOP U.S. 2016 presidential election voting preferences SODA COKE POP Dialect preferences: soda, coke or pop? Feczko, Miranda-Dominguez, Marr, Graham, Nigg, Fair, TICS, 2019, DOI: https://doi.org/10.1016/j.tics.2019.03.009

slide-62

SLIDE 62

Nested structures -- people belong to multiple subtypes

DEM GOP U.S. 2016 presidential election voting preferences Stroke mortality for Adults 35+ per 100,000 RATE SODA COKE POP Dialect preferences: soda, coke or pop? Feczko, Miranda-Dominguez, Marr, Graham, Nigg, Fair, TICS, 2019, DOI: https://doi.org/10.1016/j.tics.2019.03.009

slide-63

SLIDE 63

But what about effects of scanner upgrades, software maintenance, or even changes in personnel?

slide-64

SLIDE 64

If we want to control for unknown structure, we need to identify subtypes tied to an outcome

Supervised approaches can confirm known subtypes but not discover

unknown subtypes tied to an outcome

slide-65

SLIDE 65

If we want to control for unknown structure, we need to identify subtypes tied to an outcome

Supervised approaches can confirm known subtypes but not discover

unknown subtypes tied to an outcome

Unsupervised approaches can discover unknown subtypes, but not

tied to any outcome

slide-66

SLIDE 66

How does the Functional Random Forest work?

Supervised component

slide-67

SLIDE 67

Ask a question: can we predict depression diagnosis?

Supervised component Unsupervised component

slide-68

SLIDE 68

Supervised component

We start with an input dataset

Input dataset

Unsupervised component

slide-69

SLIDE 69

Supervised component

We start with an input dataset

Input dataset

Unsupervised component

slide-70

SLIDE 70

Supervised component

This dataset can be a functional connectivity matrix

Input dataset

Unsupervised component

slide-71

SLIDE 71

Supervised component

This dataset can be a functional connectivity matrix – which gets reduced to either graph metrics or principal components

Input dataset

Unsupervised component

slide-72

SLIDE 72

Supervised component

Input data are modeled via a random forest via validation/testing

Random Forest Creates decision trees Input dataset

Unsupervised component

slide-73

SLIDE 73

Supervised component

Model is supervised because it attempts to predict the outcome of interest

Random Forest Creates decision trees Input dataset

Unsupervised component

slide-74

SLIDE 74

Unsupervised component Supervised component

If the random forest performs well on independent test data, a similarity matrix is produced from the RFs

Similarity matrix Random Forest Creates decision trees Input dataset

=

slide-75

SLIDE 75

Supervised component Unsupervised component

Subgroups are identified from this matrix via Infomap

Random Forest Creates decision trees Infomap Identifies communities Input dataset Similarity matrix

slide-76

SLIDE 76

Supervised component Unsupervised component

Subtypes arise from the model that are tied to the outcome

Random Forest Creates decision trees Subpopulations Infomap Identifies communities Input dataset Similarity matrix

slide-77

SLIDE 77

The FRF can be used to identify trajectories in longitudinal data

Longitudinal dataset Functional Data Analysis Generates individual trajectories f(t) = a1ø1(t) + .... + akøk(t)

slide-78

SLIDE 78

Combining the set of functions estimates a smooth trajectory for an individual’s symptoms

Longitudinal dataset Functional Data Analysis Generates individual trajectories f(t) = a1ø1(t) + .... + akøk(t)

slide-79

SLIDE 79

Combining the set of functions estimates a smooth trajectory for an individual’s symptoms

Longitudinal dataset Functional Data Analysis Generates individual trajectories f(t) = a1ø1(t) + .... + akøk(t)

slide-80

SLIDE 80

We can use an unsuperv rvised approach to identify trajectories

Unsupervised

Longitudinal dataset Functional Data Analysis Generates individual trajectories Infomap Identifies communities Correlation-based subpopulations f(t) = a1ø1(t) + .... + akøk(t) Correlation Matrix Compares trajectories

slide-81

SLIDE 81

Or use a “hybrid” approach that identifies trajectory subtypes tied to an outcome of interest

Unsupervised Hybrid

Longitudinal dataset Functional Data Analysis Generates individual trajectories Infomap Identifies communities Correlation-based subpopulations Model-based subpopulations Infomap Identifies communities f(t) = a1ø1(t) + .... + akøk(t) Correlation Matrix Compares trajectories Parameters Random Forest Creates decision trees Similarity matrix

slide-82

SLIDE 82

A manual for using the FRF exists online (https://dcan-labs.github.io/functional-random- forest/)

slide-83

SLIDE 83

A new release is available at:

slide-84

SLIDE 84

A manual for using the FRF exists online (https://dcan-labs.github.io/functional-random- forest/)

slide-85

SLIDE 85

Outline of talk

Theory recap: modelling approaches can be reduced to two types:

predictive and descriptive

“Big data” complicates our ability to apply both approaches
Marginal Modelling is a good approach for descriptive modelling
Functional Random Forests is a good approach for predictive

modelling

Other approaches can also handle big data, but are beyond the

scope of this workshop

slide-86

SLIDE 86

New approaches within statistics and machine learning can also accommodate problems with big data

Many of these approaches have been developed in genomics
comBat is a Bayesian approach to handle known site effects in data
Surrogate Variable Analaysis
Such approaches need to be examined in the context of neuroimaging

data to evaluate where each is most useful

Knowing how to use these tools requires considerable skill in data

science, which has been relatively untaught in mental health fields

Hopefully, the workshop tomorrow should get you excited about

applying these new tools and on your path towards doing “big data” science right.

slide-87

SLIDE 87

Acknowledgments

Fair Lab

Damien Fair
Oscar Miranda-Dominguez
Alice Graham

Computing Team

Darrick Sturgeon
Eric Earl
Anders Perrone
Emma Schifsky
Anthony Galassi
Kathy Snider
David Ball
Lucille Moore

Alpha Testers

Bene Ramirez
Jennifer Zhu
Robert Hermosillo
Mollie Marr
Oliva Doyle
Michaela Cordova
AJ Mitchell

slide-88

SLIDE 88

Acknowledgments

slide-89

SLIDE 89

Questions?

slide-90

SLIDE 90

High dimensionality is bad for predictive modelling

Feczko, Miranda-Dominguez, Marr, Graham, Nigg, Fair, TICS, 2019, DOI: https://doi.org/10.1016/j.tics.2019.03.009

slide-91

SLIDE 91

Predictive models must also take into account nested structure

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3880143/