Important concepts and considerations in predictive modeling
Oscar Miranda-Domínguez, PhD, MSc. Research Assistant Professor Developmental Cognition and Neuroimaging Lab, OHSU
considerations in predictive modeling Oscar Miranda-Domnguez, PhD, - - PowerPoint PPT Presentation
Important concepts and considerations in predictive modeling Oscar Miranda-Domnguez, PhD, MSc. Research Assistant Professor Developmental Cognition and Neuroimaging Lab, OHSU Models try to identify associations between variables: ,
Oscar Miranda-Domínguez, PhD, MSc. Research Assistant Professor Developmental Cognition and Neuroimaging Lab, OHSU
𝑌, predictor variables 𝑧, outcome variables
2
Models in clinical research have specific problems:
Entire population
3
Models in clinical research have specific problems:
Entire population
4
Models in clinical research have specific problems:
Entire population
5
More importantly, what can be done to improve predictions across datasets?
6
7
How relevant is the balance between the number of variables and observations?
8
# Measurements = # Variables The system 4 = 2𝐵 has a unique solution 𝐵 = 2
# Measurements = # Variables The system 4 = 2𝐵 has a unique solution 𝐵 = 2 # Measurements > # Variables What about repeated measurements (real data with noise) 4.0 = 2.0𝐵 3.9 = 2.1𝐵
# Measurements = # Variables The system 4 = 2𝐵 has a unique solution 𝐵 = 2 # Measurements > # Variables What about repeated measurements (real data with noise) 4.0 = 2.0𝐵 → 𝐵 = 2.00 3.9 = 2.1𝐵 → 𝐵 ≈ 1.86
# Measurements = # Variables The system 4 = 2𝐵 has a unique solution 𝐵 = 2 # Measurements > # Variables What about repeated measurements (real data with noise) 4.0 = 2.0𝐵 → 𝐵 = 2.00 3.9 = 2.1𝐵 → 𝐵 ≈ 1.86 Select the solution with the lowest mean square error!
# Measurements = # Variables The system 4 = 2𝐵 has a unique solution 𝐵 = 2 # Measurements > # Variables What about repeated measurements (real data with noise) 4.0 = 2.0𝐵 → 𝐵 = 2.00 3.9 = 2.1𝐵 → 𝐵 ≈ 1.86 Select the solution with the lowest mean square error! 4.0 3.9 = 2.0 2.1 𝐵 𝑧 = 𝑦𝐵
# Measurements = # Variables The system 4 = 2𝐵 has a unique solution 𝐵 = 2 # Measurements > # Variables What about repeated measurements (real data with noise) 4.0 = 2.0𝐵 → 𝐵 = 2.00 3.9 = 2.1𝐵 → 𝐵 ≈ 1.86 Select the solution with the lowest mean square error! 4.0 3.9 = 2.0 2.1 𝐵 𝑧 = 𝑦𝐵 Using linear algebra (𝒚 pseudo-inverse) 𝐵 = 𝑦′𝑦 −1𝑦′𝑧 𝐵 ≈ 1.9286 This 𝑩 minimizes σ 𝐬𝐟𝐭𝐣𝐞𝐯𝐛𝐦𝐭𝟑
# Measurements = # Variables The system 4 = 2𝐵 has a unique solution 𝐵 = 2 # Measurements > # Variables What about repeated measurements (real data with noise) 4.0 = 2.0𝐵 → 𝐵 = 2.00 3.9 = 2.1𝐵 → 𝐵 ≈ 1.86 Select the solution with the lowest mean square error! 4.0 3.9 = 2.0 2.1 𝐵 𝑧 = 𝑦𝐵 Using linear algebra (𝒚 pseudo-inverse) 𝐵 = 𝑦′𝑦 −1𝑦′𝑧 𝐵 ≈ 1.9286 This 𝑩 minimizes σ 𝐬𝐟𝐭𝐣𝐞𝐯𝐛𝐦𝐭𝟑 # Measurements < # Variables What about (real) limited data: 8 = 4𝛽 + 𝛾 There are 2 variables (𝛽 and 𝛾) and 1 measurements.
# Measurements = # Variables The system 4 = 2𝐵 has a unique solution 𝐵 = 2 # Measurements > # Variables What about repeated measurements (real data with noise) 4.0 = 2.0𝐵 → 𝐵 = 2.00 3.9 = 2.1𝐵 → 𝐵 ≈ 1.86 Select the solution with the lowest mean square error! 4.0 3.9 = 2.0 2.1 𝐵 𝑧 = 𝑦𝐵 Using linear algebra (𝒚 pseudo-inverse) 𝐵 = 𝑦′𝑦 −1𝑦′𝑧 𝐵 ≈ 1.9286 This 𝑩 minimizes σ 𝐬𝐟𝐭𝐣𝐞𝐯𝐛𝐦𝐭𝟑 # Measurements < # Variables What about (real) limited data: 8 = 4𝛽 + 𝛾 There are 2 variables (𝛽 and 𝛾) and 1 measurements. Solving the system: 8 − 4𝛽 = 𝛾 All the points on 𝛾 = 8 − 4𝛽 solve the system. In other words, there is an infinite number of solutions!
well models predict outcome.
18
Let say you have a set of predictor variables with some correlation
19
20
21
Additional axis are selected to be perpendicular to each other (orthogonal)
22
23
24
25
26
27
28
29
More components:
30
The question is, how many components do we need for a generalizable model?
32
33
Definition: Using different samples to model and predict
partitions, one to model and the other to predict Other forms of out of sample sampling
34
35
The modeler does not know the model structure but it is given by a third order polynomial:
𝑦 = mean fconn between the Fronto-parietal and default networks score= 𝑞0 + 𝑞1𝑦 + 𝑞2𝑦2 + 𝑞3𝑦3
36
participant Noiseless data
37
Noiseless data
38
Noiseless data fconn’s noise
39
fconn’s noise
Measured data Noiseless data
40
Measured data
41
based on mean fconn using polynomials of different order
complex models
(mean square errors in predictions)
i.e., lowest error
42
Mean Square Error OHSU 1 22.35 Polynomial
43
Mean Square Error OHSU 1 22.35 2 21.22 Polynomial
44
Mean Square Error OHSU 1 22.35 2 21.22 3 16.21 4 15.61 Polynomial
45
Mean Square Error OHSU 1 22.35 2 21.22 3 16.21 4 15.61 5 14.14 Polynomial
46
Mean Square Error OHSU 1 22.35 2 21.22 3 16.21 4 15.61 5 14.14 6 14.13 Polynomial
47
Mean Square Error OHSU 1 22.35 2 21.22 3 16.21 4 15.61 5 14.14 6 14.13 Polynomial
48
Mean Square Error OHSU 1 22.35 2 21.22 3 16.21 4 15.61 5 14.14 6 14.13 Polynomial
49
50
OHSU Minn 1 22.35 23.16 Polynomial
Mean Square Error
51
OHSU Minn 1 22.35 23.16 2 21.22 23.27 Polynomial
Mean Square Error
52
OHSU Minn 1 22.35 23.16 2 21.22 23.27 3 16.21 39.03 Polynomial
Mean Square Error
53
OHSU Minn 1 22.35 23.16 2 21.22 23.27 3 16.21 39.03 Polynomial
Mean Square Error
54
OHSU Minn 1 22.35 23.16 2 21.22 23.27 3 16.21 39.03 4 15.61 36.77 5 14.14 44.55 Polynomial
Mean Square Error
55
OHSU Minn 1 22.35 23.16 2 21.22 23.27 3 16.21 39.03 4 15.61 36.77 5 14.14 44.55 Polynomial
Mean Square Error
56
OHSU Minn 1 22.35 23.16 2 21.22 23.27 3 16.21 39.03 4 15.61 36.77 5 14.14 44.55 6 14.13 49.96 Polynomial
Mean Square Error
57
Testing performance on the same data used to obtain a model leads to
58
OHSU Minn 1 22.35 23.16 2 21.22 23.27 3 16.21 39.03 4 15.61 36.77 5 14.14 44.55 6 14.13 49.96 Polynomial
Mean Square Error
59
OHSU Minn 1 22.35 23.16 2 21.22 23.27 3 16.21 39.03 4 15.61 36.77 5 14.14 44.55 6 14.13 49.96 Polynomial
Mean Square Error
60
61
62
Then predict in-sample and out-sample data
A reasonable cost function is the mean
squares’s residuals
63
Keep track of the errors.
64
65
Increase order complexity Keep track of the errors.
66
67
68
Pick the best (lowest out-of- sample prediction) Notice how the in-sample (modeling) error decreases as order increases: OVERFITTING
69
Cross-validation is a useful tool towards predictive modeling. Partial-least squares regression requires cross-validation for predictive modeling to avoid overfitting
70
Why is it important to generate a null distribution?
71
and hold-out cross-validation?
72
9𝑦1 − 7𝑦2 + ⋯ − 4𝑦𝑜 = 21 −𝑦1 + 9𝑦2 + ⋯ + 2𝑦𝑜 = 19 2𝑦1 + 7𝑦2 + ⋯ + 2𝑦𝑜 = 77 1𝑦1 − 6𝑦2 + ⋯ + 1𝑦𝑜 = 20 7𝑦1 − 2𝑦2 + ⋯ − 9𝑦𝑜 = 62
Original data
73
Original data Modeling Validation 9𝑦1 − 7𝑦2 + ⋯ − 4𝑦𝑜 = 21 −𝑦1 + 9𝑦2 + ⋯ + 2𝑦𝑜 = 19 2𝑦1 + 7𝑦2 + ⋯ + 2𝑦𝑜 = 77 1𝑦1 − 6𝑦2 + ⋯ + 1𝑦𝑜 = 20 7𝑦1 − 2𝑦2 + ⋯ − 9𝑦𝑜 = 62 9𝑦1 − 7𝑦2 + ⋯ − 4𝑦𝑜 = 21 −𝑦1 + 9𝑦2 + ⋯ + 2𝑦𝑜 = 19 2𝑦1 + 7𝑦2 + ⋯ + 2𝑦𝑜 = 77 1𝑦1 − 6𝑦2 + ⋯ + 1𝑦𝑜 = 20 7𝑦1 − 2𝑦2 + ⋯ − 9𝑦𝑜 = 62 9𝑦1 − 7𝑦2 + ⋯ − 4𝑦𝑜 = 21 −𝑦1 + 9𝑦2 + ⋯ + 2𝑦𝑜 = 19 2𝑦1 + 7𝑦2 + ⋯ + 2𝑦𝑜 = 77 1𝑦1 − 6𝑦2 + ⋯ + 1𝑦𝑜 = 20 7𝑦1 − 2𝑦2 + ⋯ − 9𝑦𝑜 = 62
74
Original data Modeling Validation 9𝑦1 − 7𝑦2 + ⋯ − 4𝑦𝑜 = 21 −𝑦1 + 9𝑦2 + ⋯ + 2𝑦𝑜 = 19 2𝑦1 + 7𝑦2 + ⋯ + 2𝑦𝑜 = 77 1𝑦1 − 6𝑦2 + ⋯ + 1𝑦𝑜 = 20 7𝑦1 − 2𝑦2 + ⋯ − 9𝑦𝑜 = 62 9𝑦1 − 7𝑦2 + ⋯ − 4𝑦𝑜 = 77 −𝑦1 + 9𝑦2 + ⋯ + 2𝑦𝑜 = 19 2𝑦1 + 7𝑦2 + ⋯ + 2𝑦𝑜 = 20 1𝑦1 − 6𝑦2 + ⋯ + 1𝑦𝑜 = 21 7𝑦1 − 2𝑦2 + ⋯ − 9𝑦𝑜 = 62 9𝑦1 − 7𝑦2 + ⋯ − 4𝑦𝑜 = 21 −𝑦1 + 9𝑦2 + ⋯ + 2𝑦𝑜 = 19 2𝑦1 + 7𝑦2 + ⋯ + 2𝑦𝑜 = 77 1𝑦1 − 6𝑦2 + ⋯ + 1𝑦𝑜 = 20 7𝑦1 − 2𝑦2 + ⋯ − 9𝑦𝑜 = 62
75
Original data Modeling Validation 9𝑦1 − 7𝑦2 + ⋯ − 4𝑦𝑜 = 21 −𝑦1 + 9𝑦2 + ⋯ + 2𝑦𝑜 = 19 2𝑦1 + 7𝑦2 + ⋯ + 2𝑦𝑜 = 77 1𝑦1 − 6𝑦2 + ⋯ + 1𝑦𝑜 = 20 7𝑦1 − 2𝑦2 + ⋯ − 9𝑦𝑜 = 62 9𝑦1 − 7𝑦2 + ⋯ − 4𝑦𝑜 = 77 −𝑦1 + 9𝑦2 + ⋯ + 2𝑦𝑜 = 19 2𝑦1 + 7𝑦2 + ⋯ + 2𝑦𝑜 = 20 1𝑦1 − 6𝑦2 + ⋯ + 1𝑦𝑜 = 21 7𝑦1 − 2𝑦2 + ⋯ − 9𝑦𝑜 = 62
partition “Modeling”
partition “Validation”
mean square error 9𝑦1 − 7𝑦2 + ⋯ − 4𝑦𝑜 = 21 −𝑦1 + 9𝑦2 + ⋯ + 2𝑦𝑜 = 19 2𝑦1 + 7𝑦2 + ⋯ + 2𝑦𝑜 = 77 1𝑦1 − 6𝑦2 + ⋯ + 1𝑦𝑜 = 20 7𝑦1 − 2𝑦2 + ⋯ − 9𝑦𝑜 = 62
76
Original data Modeling Validation 9𝑦1 − 7𝑦2 + ⋯ − 4𝑦𝑜 = 21 −𝑦1 + 9𝑦2 + ⋯ + 2𝑦𝑜 = 19 2𝑦1 + 7𝑦2 + ⋯ + 2𝑦𝑜 = 77 1𝑦1 − 6𝑦2 + ⋯ + 1𝑦𝑜 = 20 7𝑦1 − 2𝑦2 + ⋯ − 9𝑦𝑜 = 62 9𝑦1 − 7𝑦2 + ⋯ − 4𝑦𝑜 = 21 −𝑦1 + 9𝑦2 + ⋯ + 2𝑦𝑜 = 62 2𝑦1 + 7𝑦2 + ⋯ + 2𝑦𝑜 = 77 1𝑦1 − 6𝑦2 + ⋯ + 1𝑦𝑜 = 19 7𝑦1 − 2𝑦2 + ⋯ − 9𝑦𝑜 = 20 9𝑦1 − 7𝑦2 + ⋯ − 4𝑦𝑜 = 21 −𝑦1 + 9𝑦2 + ⋯ + 2𝑦𝑜 = 19 2𝑦1 + 7𝑦2 + ⋯ + 2𝑦𝑜 = 77 1𝑦1 − 6𝑦2 + ⋯ + 1𝑦𝑜 = 20 7𝑦1 − 2𝑦2 + ⋯ − 9𝑦𝑜 = 62
partition “Modeling”
partition “Validation”
mean square error
77
Mean Square Errors 78
fconn_regression tool
79
http://parkinsonteam.blogspot.com/2011/10 /prevencion-de-caidas-en-personas-con.html https://en.wikipedia.org/wiki/Parkinson's_disease
Freezing of gait, a pretty descriptive name, is an additional symptom present on some patients Freezing can lead to falls, which adds an extra burden in Parkinson’s disease
80
Open loop
Ashoori A, Eagleman DM, Jankovic J. Effects of Auditory Rhythm and Music on Gait Disturbances in Parkinson’s Disease [Internet]. Front Neurol 2015;
81
82
Resting state functional MRI
83
can be predicted using connectivity from specific brain networks
84
85
Parameters
This can be done using the tool fconn_regression
86
Sorted by Cohen effect size
Visual and subcortical
Effect size = 0.87
Auditory and default
Effect size = 0.81
Somatosensory lateral and Ventral attention
Effect size = 0.78
Visual Auditory Default Subcortical Ventral Attn Somatosensory lateral Mean square error Mean square error Mean square error 87
88
89
Truncated singular value decomposition
90
# Measurements = # Variables The system 4 = 2𝐵 has a unique solution 𝐵 = 2 # Measurements > # Variables What about repeated measurements (real data with noise) 4.0 = 2.0𝐵 → 𝐵 = 2.00 3.9 = 2.1𝐵 → 𝐵 ≈ 1.86 Select the solution with the lowest mean square error! 4.0 3.9 = 2.0 2.1 𝐵 𝑧 = 𝑦𝐵 Using linear algebra (𝒚 pseudo-inverse) 𝐵 = 𝑦′𝑦 −1𝑦′𝑧 𝐵 ≈ 1.9286 This 𝑩 minimizes σ 𝐬𝐟𝐭𝐣𝐞𝐯𝐛𝐦𝐭𝟑 # Measurements < # Variables What about (real) limited data: 8 = 4𝛽 + 𝛾 There are 2 variables (𝛽 and 𝛾) and 1 measurements. Solving the system: 8 − 4𝛽 = 𝛾 All the points on 𝛾 = 8 − 4𝛽 solve the system. In other words, there is an infinite number of solutions!
Regularization is a powerful approach to handle this kind of problems (ill-posed systems)
92
We know that the pseudo-inverse offers the optimal solution (lowest least squares) for systems with more measurements than observations
93
94
𝑧 = 𝛾1𝑦1 + 𝛾2𝑦2 + ⋯ 𝛾379𝑦379 1)
95
𝑧 = 𝛾1𝑦1 + 𝛾2𝑦2 + ⋯ 𝛾379𝑦379 𝑧 = 𝛾1𝑦1 + 𝛾2𝑦2 + ⋯ 𝛾379𝑦379 𝑧 = 𝛾1𝑦1 + 𝛾2𝑦2 + ⋯ 𝛾379𝑦379
𝑧 = 𝛾1𝑦1 + 𝛾2𝑦2 + ⋯ 𝛾379𝑦379 1) 2) 3) 163)
96
97
This solution, however, is problematic: *unstable beta weights *over fitting *not applicable to
dataset
98
Let’s suppose age and weight are two variables used in your model For one participant you used
There was, however, an error in data collection and the real values are:
99
Let’s suppose age and weight are two variables used in your model For one participant you used
There was, however, an error in data collection and the real values are:
Stable beta-weights: score ~ 3.9 Unstable beta weights: score ~ -344,587.42
100
𝑧 = 𝛾1𝑦1 + 𝛾2𝑦2 + ⋯ 𝛾379𝑦379 𝑧 = 𝛾1𝑦1 + 𝛾2𝑦2 + ⋯ 𝛾379𝑦379 𝑧 = 𝛾1𝑦1 + 𝛾2𝑦2 + ⋯ 𝛾379𝑦379
𝑧 = 𝛾1𝑦1 + 𝛾2𝑦2 + ⋯ 𝛾379𝑦379 1) 2) 3) 163)
101
We said that we can rotate X (the data) to find optimal projections We can use different number of axis Adding more axis leads to:
102
can explore effect of inclusion/exclusion of components (singular value decomposition)
components
into the pseudo-inverse
components
𝑌 = 𝑉Σ𝑊𝑈 𝛵 = 𝜏1 ⋯ ⋮ ⋱ ⋯ 𝜏𝑁 , 𝜏1 ≥ 𝜏2 ≥ ⋯ ≥ 𝜏𝑁 ≥ 0. The smaller singular values of 𝑌 are more unstable (susceptible to noise)
103
can explore effect of inclusion/exclusion of components (singular value decomposition)
components
into the pseudo-inverse
components
𝑌 = 𝑉Σ𝑊𝑈 𝛵𝑢𝑠𝑣𝑜𝑑𝑏𝑢𝑓𝑒 = 𝜏1 ⋯ ⋮ ⋱ ⋯ , 𝑌𝑢𝑠𝑣𝑜𝑑𝑏𝑢𝑓𝑒 = 𝑉Σ𝑢𝑠𝑣𝑜𝑑𝑏𝑢𝑓𝑒𝑊𝑈
104
can explore effect of inclusion/exclusion of components (singular value decomposition)
components
into the pseudo-inverse
components
𝛾 = 𝑌′𝑌 −1𝑌′𝑧 Pseudo- inverse 𝛾𝑢𝑠𝑣𝑜𝑑𝑏𝑢𝑓𝑒 = 𝑌𝑢𝑠𝑣𝑜𝑑𝑏𝑢𝑓𝑒
′𝑌𝑢𝑠𝑣𝑜𝑑𝑏𝑢𝑓𝑒 −1𝑌𝑢𝑠𝑣𝑜𝑑𝑏𝑢𝑓𝑒 ′𝑧
105
can explore effect of inclusion/exclusion of components (singular value decomposition)
components
into the pseudo-inverse
components
Accuracy Norm of the residuals
?
106
Unstable Pseudo-inverse solution
107
Accuracy Norm of the residuals 108
Accuracy Norm of the residuals 109
Accuracy Norm of the residuals 110
Accuracy Norm of the residuals 111
Accuracy Norm of the residuals 112
Accuracy Norm of the residuals
Use tsvd and cross-validation *more stable beta weights *less over fitting *applicable to outside dataset
?
113
to overfitting. Do not do it. Use cross-validation instead.
exceeds the number of measurements: “ill-posed” systems
and lead to better out-of-sample performance
114
Correlations might not be enough with limited data (~5 mins)
115
The activity of each brain region can be predicted by the weighted contribution
Ƹ 𝑠
1
Ƹ 𝑠2 Ƹ 𝑠3
116
Ƹ 𝑠
1
Ƹ 𝑠2 Ƹ 𝑠3
117
β1,2 β1,3 Ƹ 𝑠
1
Ƹ 𝑠2 Ƹ 𝑠3
118
β1,2 β1,3 Ƹ 𝑠
1 = 𝟏 𝑠 1 + β1,2 𝑠2 + β1,3 𝑠3
119
β1,2 β1,3 Ƹ 𝑠
1 = 𝟏 𝑠 1 + β1,2 𝑠2 + β1,3 𝑠3
120
β2,1 β2,3 Ƹ 𝑠2 = β2,1 𝑠
1 + 0 𝑠2+ β2,3 𝑠3
Red does not depend on red
121
β3,1 β3,2 Ƹ 𝑠3 = β3,1 𝑠
1 + β3,2 𝑠2+ 𝟏 𝑠3
Green does not depend on green
122
Ƹ 𝑠
1
Ƹ 𝑠2 Ƹ 𝑠3 = 𝟏 β1,2 β1,3 β2,1 𝟏 β2,3 β3,1 β3,2 𝟏 𝑠
1
𝑠2 𝑠3 Matricial form Ƹ 𝑠
1 =
0 𝑠
1 + β1,2 𝑠2 + β1,3 𝑠3
Ƹ 𝑠2 = β2,1 𝑠
1 + 0 𝑠2+ β2,3 𝑠3
Ƹ 𝑠3 = β3,1 𝑠
1 + β3,2 𝑠2+ 0 𝑠3
123
A bigger matrix
General case Ƹ 𝑠
1
Ƹ 𝑠2 ⋮ Ƹ 𝑠𝑁 = β1,2 β2,1 … β1,𝑁 … β2,𝑁 ⋮ ⋮ β𝑁,1 β𝑁,2 ⋱ ⋮ … 𝑠
1
𝑠2 ⋮ 𝑠𝑁 Ill-posed system (more unknowns that data) Solved by regularization and cross validation
124
Solution! General case Ƹ 𝑠
1
Ƹ 𝑠2 ⋮ Ƹ 𝑠𝑁 = β1,2 β2,1 … β1,𝑁 … β2,𝑁 ⋮ ⋮ β𝑁,1 β𝑁,2 ⋱ ⋮ … 𝑠
1
𝑠2 ⋮ 𝑠𝑁
125
Ƹ 𝑠
1
Ƹ 𝑠2 ⋮ Ƹ 𝑠𝑁 = β1,2 β2,1 … β1,𝑁 … β2,𝑁 ⋮ ⋮ β𝑁,1 β𝑁,2 ⋱ ⋮ … 𝑠
1
𝑠2 ⋮ 𝑠𝑁 Ƹ 𝑠
1
Ƹ 𝑠2 ⋮ Ƹ 𝑠𝑁 = β1,2 β2,1 … β1,𝑁 … β2,𝑁 ⋮ ⋮ β𝑁,1 β𝑁,2 ⋱ ⋮ … 𝑠
1
𝑠2 ⋮ 𝑠𝑁 Ƹ 𝑠
1
Ƹ 𝑠2 ⋮ Ƹ 𝑠𝑁 = β1,2 β2,1 … β1,𝑁 … β2,𝑁 ⋮ ⋮ β𝑁,1 β𝑁,2 ⋱ ⋮ … 𝑠
1
𝑠2 ⋮ 𝑠𝑁
Subject 1 Subject 2 Subject 3
126
127
128
Fresh data Modeling Modeling Fresh data
One for modeling, the other for prediction
129
Calculate the beta weights (connectivity matrix)!
Fresh data Modeling Connectotype
Ƹ 𝑠
1
Ƹ 𝑠2 Ƹ 𝑠3 = 𝟏 β1,2 β1,3 β2,1 𝟏 β2,3 β3,1 β3,2 𝟏 𝑠
1
𝑠
2
𝑠
3 130
Modeling Connectotype Predicted data Fresh data
131
You may use correlation coefficients!
Modeling Connectotype Predicted data Fresh data R1 R2 R3 ഥ 𝑺 Compare fresh vs predicted data
132
Data sets
HUMANS:
age 19 to 35 years
two weeks later (Validated in data from 11 macaques too)
133
Step 1
Approach:
participant using partial data
data for each scan
134
Step 2
Approach:
participant using partial data
data for each scan
135
Step 3
Approach:
participant using partial data
data for each scan
and observed timecourses was calculated
136
Fresh data Baseline Subject
137
Miranda-Dominguez O, et al.. PLoS One. 2014
Fresh data Baseline Subject
138
Miranda-Dominguez O, et al.. PLoS One. 2014
Fresh data Baseline Subject 0.6 0.7 0.8 0.9 Correlations
139
Miranda-Dominguez O, et al.. PLoS One. 2014
Fresh data Baseline Subject 0.6 0.7 0.8 Correlations
Accurate characterization
shared variance
140
Miranda-Dominguez O, et al.. PLoS One. 2014
Fresh data Baseline Subject 0.2 0.4 0.6 Correlations
Accurate characterization
shared variance
141
Miranda-Dominguez O, et al.. PLoS One. 2014
We are all equipped with functional networks that process certain stimuli in the same way … on top of this… we all each have unique salient functional networks that make us unique
0.6 0.7 0.8 0.9 Correlations
shared variance
142
Miranda-Dominguez O, et al.. PLoS One. 2014
We are all equipped with functional networks that process certain stimuli in the same way … on top of this… we all each have unique salient functional networks that make us unique
0.6 0.7 0.8 0.9 Correlations
Accurate characterization
143
Miranda-Dominguez O, et al.. PLoS One. 2014
144
Variance Across Subjects
Subjects
145
Miranda-Dominguez O, et al.. PLoS One. 2014
More individual More conserved
146
Miranda-Dominguez O, et al.. PLoS One. 2014
More individual More conserved
147
Miranda-Dominguez O, et al.. PLoS One. 2014
148
repeated using different amounts of data
connectotype!
Time
2.5 minutes
149
Miranda-Dominguez O, et al.. PLoS One. 2014
individuals.
150
151
Controls passing QC:
“Gordon” parcellation schema
152
Gordon et al, Cerebral Cortex, 2014
Step 1
Approach: 1. A model was calculated for each scan (N=188) 2. Each model was used to predict fresh data for each scan (N=188) 3. Average correlation between predicted and observed timecourses were calculated (N = 188 x 188) 4. Average correlations were grouped based
prediction
153
Step 2
Approach: 1. A model was calculated for each scan (N=188) 2. Each model was used to predict fresh data for each scan (N= 188 x 188 x ROIs) 3. Average correlation between predicted and observed timecourses were calculated (N = 188 x 188) 4. Average correlations were grouped based
prediction
154
Step 3
Approach: 1. A model was calculated for each scan (N=188) 2. Each model was used to predict fresh data for each scan (N= 188 x 188 x ROIs) 3. Average correlation between predicted and observed timecourses were calculated (N = 188 x 188) 4. Average correlations are grouped based
prediction
155
Step 4
Approach: 1. A model was calculated for each scan (N=188) 2. Each model was used to predict fresh data for each scan (N=188 x 188 x ROIs) 3. Average correlation between predicted and observed timecourses were calculated (N = 188 x 188) 4. Average correlations were grouped based
prediction
I. Same scan II. Same participant
156
Same scan (N=188)
157
Same scan (N=188)
Distributions of correlations (per group) 0.25 1.00
Average correlations
0.50 0.75
158 Miranda-Domínguez O, et al. Heritability of the human connectome: A connectotyping study. Netw Neurosci 2018.
1 or 2 years later
Same scan (N=188) Same participant (N=60)
Difference in years when data was acquired
Distributions of correlations (per group) 0.25 1.00
Average correlations
0.50 0.75
159 Miranda-Domínguez O, et al. Heritability of the human connectome: A connectotyping study. Netw Neurosci 2018.
Same scan (N=188) Siblings (N=46) Same participant (N=60)
1 or 2 years later
Difference in years when data was acquired
Distributions of correlations (per group) 0.25 1.00
Average correlations
0.50 0.75
160 Miranda-Domínguez O, et al. Heritability of the human connectome: A connectotyping study. Netw Neurosci 2018.
Same scan (N=188) Siblings (N=46) Same participant (N=60) Unrelated (N=35,050)
1 or 2 years later
Difference in years when data was acquired
Distributions of correlations (per group) 0.25 1.00
Average correlations
0.50 0.75
161 Miranda-Domínguez O, et al. Heritability of the human connectome: A connectotyping study. Netw Neurosci 2018.
Same participant (N=60) Unrelated (N=35,050)
1 or 2 years later
Difference in years when data was acquired
Distributions of correlations (per group) 0.25 1.00
Average correlations
0.50 0.75
Same scan (N=188)
162 Miranda-Domínguez O, et al. Heritability of the human connectome: A connectotyping study. Netw Neurosci 2018.
Siblings (N=46) Unrelated (N=35,050)
Difference in years when data was acquired
Distributions of correlations (per group) 0.25 1.00
Average correlations
0.50 0.75
163 Miranda-Domínguez O, et al. Heritability of the human connectome: A connectotyping study. Netw Neurosci 2018.
The connectotype is similarly predictive in children as shown in adults, across a wider timespan, and some features appear to be familial
164
165
among families
the “baseline” shared connectome across siblings?
166
http://edition.cnn.com/2015/09/06/tennis/tennis-venus-serena-bouchard/ http://www.tampabay.com/news/politics/national/bush-dynasty- continues-to-impact-republican-politics/1248057 167
OHSU Human Connectome Project
Data from 198 unique participants 1 hour of data each 22-36 yo, 45% males 79 pairs of siblings:
Data from 32 unique participants 5 mins of low-head movement of RS 7-15 yo, 60% males Siblings (16 pairs) 16 families with 2 siblings each
168
Within dataset
pair of participants as siblings or unrelated Between datasets
across datasets
169
170 Miranda-Domínguez O, et al. Heritability of the human connectome: A connectotyping study. Netw Neurosci 2018.
171 Miranda-Domínguez O, et al. Heritability of the human connectome: A connectotyping study. Netw Neurosci 2018.
172 Miranda-Domínguez O, et al. Heritability of the human connectome: A connectotyping study. Netw Neurosci 2018.
173 Miranda-Domínguez O, et al. Heritability of the human connectome: A connectotyping study. Netw Neurosci 2018.
174
better than chance
175
176
Members of the DCAN Lab Funding: Parkinson’s Center of Oregon Pilot Grant, OHSU Fellowship for Diversity, Tartar Family grant, NIMH AJ Mitchell Alice Graham Alina Goncharova Anders Perrone Anita Randolph Anjanibhargavi Ragothaman Anthony Galassi Bene Ramirez Binyam Nardos Damien Fair Elina Thomas Eric Earl Eric Feczko Greg Conan Johnny Uriarte-Lopez Kathy Snider Lisa Karstens Lucille Moore Michaela Cordova Mollie Marr Olivia Doyle Robert Hermosillo Samantha Papadakis Thomas Madison DCAN Lab