Orthogonal grey simultaneous component analysis to distinguish - PowerPoint PPT Presentation
Orthogonal grey simultaneous component analysis to distinguish common and distinctive information in coupled data Martijn Schouteden Katrijn Van Deun Iven Van Mechelen Outline Introduction Coupled data Research questions
Orthogonal grey simultaneous component analysis to distinguish common and distinctive information in coupled data Martijn Schouteden Katrijn Van Deun Iven Van Mechelen
Outline • Introduction – Coupled data – Research questions • Method – Simultaneous component method – Problem – Solution: DISCO-GSCA • Illustration – Results • Conclusion
Outline • Introduction – Coupled data – Research questions • Method – Simultaneous component method – Problem – Solution: DISCO-GSCA • Illustration – Results • Conclusion
Introduction • Coupled data: data that consist of different data blocks, which all contain information about the same entities – E.g. • Data blocks = GC/MS and LC/MS • Variables = E. coli metabolites • Objects = condition Metabolites Condition LC/MS GC/MS Smilde et al. (2005)
Introduction • Coupled data: data that consist of different data blocks, which all contain information about the same entities – E.g. • Data blocks = GC/MS and LC/MS • Variables = E. coli metabolites • Objects = condition 1 … J 1 1 … J 2 1 Metabolites Condition . LC/MS GC/MS . . I Smilde et al. (2005)
• Finding mechanisms that underly the coupled data • RESEARCH QUESTIONS : which mechanisms are – common for both data blocks and – distinctive for a single data block? Which metabolome processes are measured by both separation techniques? Which processes are measured by just one of the two?
Outline • Introduction – Coupled data – Research questions • Method – Simultaneous component method – Problem – Solution: DISCO-GSCA • Illustration – Results • Conclusion
Outline • Introduction – Coupled data – Research questions • Method – Simultaneous component method – Problem – Solution: DISCO-GSCA • Illustration – Results • Conclusion
Simultaneous Component Analysis • Finding underlying mechanisms in – ONE data block Principal Component Analysis (PCA, Jolliffe, 2002) – More data blocks Simultaneous Component Analysis (SCA, Van Deun et al., 2009)
Simultaneous Component Analysis 1 . LC/MS GC/MS . . I 1 … J 1 1 … J 2
Simultaneous Component Analysis LC/MS GC/MS 1 . . LC/MS GC/MS . I 1 … J 1+J2
Simultaneous Component Analysis LC/MS GC/MS 1 . . LC/MS GC/MS . I 1 … J 1+J2 X conc
Simultaneous Component Analysis LC/MS GC/MS 1 . . LC/MS GC/MS . I 1 … J 1+J2 conc = x + X ' ' P P T LC GC E E LC GC x + ' Data = Scores Loadings Error P E conc conc × ( + ) ×( + ) × I R R J J I J J × ( + ) I J J 1 2 1 2 1 2
Simultaneous Component Analysis LC/MS GC/MS 1 . . LC/MS GC/MS . I 1 … J 1+J2 conc = x + X ' ' P P T LC GC E E LC GC x + ' Data = Scores Loadings Error P E conc conc × ( + ) ×( + ) × I R R J J I J J × ( + ) I J J 1 2 1 2 1 2 2 ' Objective: min X - TP conc conc T,P conc
• Distinctive mechanisms = simultaneous components that underly only one data block • Common mechanisms = simultaneous components that underly both data blocks
• Distinctive mechanisms = simultaneous components that underly only one data block • Common mechanisms = simultaneous components that underly both data blocks • E.g., ) = ' X TP conc conc ⎡ ⎤ ' ' | = T P P ⎣ ⎦ LC GC [ ] ⎡ ⎤ L L 0 0 | x x x ⎢ ⎥ = ⎢ ⎥ M ⎢ ⎥ ⎣ ⎦ x
• Distinctive mechanisms = simultaneous components that underly only one data block • Common mechanisms = simultaneous components that underly both data blocks • E.g., ) = ' X TP conc conc ⎡ ⎤ ' ' | = T P P ⎣ ⎦ LC GC [ ] ⎡ ⎤ L L 0 0 | x x x ⎢ ⎥ = ⎢ ⎥ M ⎢ ⎥ ⎣ ⎦ x Distinctive component for GC/MS
• Distinctive mechanisms = simultaneous components that underly only one data block • Common mechanisms = simultaneous components that underly both data blocks • E.g., = ⎣ ⎡ ⎤ ' ' ' | P P P ⎦ conc LC GC ⎡ ⎤ L L | 0 0 x x ⎢ ⎥ = ⎢ L L 0 0 | x x ⎥ ⎢ ⎥ L L ⎣ | ⎦ x x x x
• Distinctive mechanisms = simultaneous components that underly only one data block • Common mechanisms = simultaneous components that underly both data blocks • E.g., ⎡ ⎤ = ⎣ ' ' ' | P P P ⎦ conc LC GC ⎡ ⎤ L L | 0 0 D1 x x ⎢ ⎥ = ⎢ L L D2 0 0 | x x ⎥ ⎢ ⎥ L L ⎣ | ⎦ x x x x C
Problem • Distinctive mechanisms = simultaneous components that underly only one data block • Common mechanisms = simultaneous components that underly both data blocks • E.g., ⎡ ⎤ = ⎣ ' ' ' | P P P ⎦ conc LC GC ⎡ ⎤ L L | 0 0 D1 x x ⎢ ⎥ = ⎢ L L D2 0 0 | x x ⎥ ⎢ ⎥ L L ⎣ | ⎦ x x x x C � However… SC method: obtaining such a pattern is outside control…
Problem • Distinctive mechanisms = simultaneous components that underly only one data block • Common mechanisms = simultaneous components that underly both data blocks • E.g., ⎡ ⎤ = ⎣ ' ' ' a g e a g e a g e | P P P t r t t r t t r t ⎦ conc LC GC ⎡ ⎤ L L | 0 0 D1 x x ⎢ ⎥ = ⎢ L L D2 0 0 | x x ⎥ ⎢ ⎥ L L ⎣ | ⎦ x x x x C � However… SC method: obtaining such a pattern is outside control…
Solution: DISCO-GSCA • Predecessors: – DISCO-SCA (Schouteden et al., 2010) – Grey Component Analysis (GCA, Westerhuis et al., 2007)
Solution: DISCO-GSCA λ - Impose target structure to a certain power ( ) ( ) 2 2 + λ • − ' target = min X - TP W P P ' T T I conc conc conc conc , T P conc
Solution: DISCO-GSCA λ - Impose target structure to a certain power ( ) ( ) 2 2 + λ • − ' target = min X - TP W P P ' T T I conc conc conc conc , T P conc ⎛ ⎞ ⎡ ⎤ p p p ⎡ ⎤ 0 x x 11 12 13 ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ M M M ⎜ ⎟ ⎢ ⎥ M M M ⎢ ⎥ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ p p p 0 x x ⎜ ⎟ I 1 I 2 I 3 ⎢ ⎥ 1 1 1 − ⎢ ⎥ − − − ⎜ ⎟ − − − ⎢ ⎥ ⎢ ⎥ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ p p p 0 ⎜ x x ⎟ ⎢ ( ) ( ) ( ) ⎥ + + + I I 1 I I 2 I I 3 ⎢ ⎥ 1 2 1 2 1 2 ⎜ ⎟ ⎢ ⎥ M M M ⎢ ⎥ M M M ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ ⎜ ⎟ ⎣ ⎦ 0 x x ⎢ ⎥ p p p ⎣ ⎦ ⎝ ( ) ( ) ( ) ⎠ + + + I I 1 I I 2 I I 3 1 2 1 2 1 2
Solution: DISCO-GSCA λ - Impose target structure to a certain power ( ) ( ) 2 2 + λ • − ' target = min X - TP W P P ' T T I conc conc conc conc , T P conc Elementwise product ⎛ ⎞ ⎡ ⎤ ⎡ ⎤ p p p ⎡ ⎤ 0 1 0 0 x x 11 12 13 ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ M M M ⎜ ⎟ M M M ⎢ ⎥ M M M ⎢ ⎥ ⎢ ⎥ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ p p p 0 1 0 0 x x ⎜ ⎟ I 1 I 2 I 3 ⎢ ⎥ 1 1 1 ⎢ ⎥ • − ⎢ ⎥ − − − − − − ⎜ ⎟ − − − ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 1 0 0 p p p 0 ⎜ x x ⎟ ⎢ ( ) ( ) ( ) ⎥ + + + ⎢ ⎥ I I 1 I I 2 I I 3 ⎢ ⎥ 1 2 1 2 1 2 ⎜ ⎟ ⎢ ⎥ M M M ⎢ ⎥ M M M ⎢ ⎥ M M M ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎜ ⎟ ⎣ 1 0 0 ⎦ ⎣ ⎦ 0 x x ⎢ ⎥ p p p ⎣ ⎦ ⎝ ( ) ( ) ( ) ⎠ + + + I I 1 I I 2 I I 3 1 2 1 2 1 2
Solution: DISCO-GSCA ( ) ( ) 2 2 + λ • − ' target = min X - TP W P P ' T T I conc conc conc conc , T P conc
Solution: DISCO-GSCA • Model selection: 3 steps – FIRST: Select the number of simultaneous components • (SCA, Van Deun et al., 2009) – SECOND: characterize these components • i.e., how many of them are common/distinctive? • (DISCO-SCA, Schouteden et al., 2010) – THIRD: define λ • L-curve (Hansen, 1992)
Outline • Introduction – Coupled data – Research questions • Method – Simultaneous component method – Problem – Solution: DISCO-GSCA • Illustration – Results • Conclusion
Outline • Introduction – Coupled data – Research questions • Method – Simultaneous component method – Problem – Solution: DISCO-GSCA • Illustration – Results • Conclusion
• Data: E. coli • Model: – 5 simultaneous components – Target: • 1 common component • 2 distinctive components for GC/MS • 2 distinctive components for LC/MS
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.