Dcouverte dans les rseaux biologiques htrognes : l'exprience Adalab - - PowerPoint PPT Presentation
Dcouverte dans les rseaux biologiques htrognes : l'exprience Adalab - - PowerPoint PPT Presentation
Dcouverte dans les rseaux biologiques htrognes : l'exprience Adalab Cline Rouveirol, LIPN The Reproducibility Crisis One of the most important current issues in biology is The reproducibility crisis - Billions of euros
The Reproducibility Crisis
■ One of the most important current issues in biology is
‘The reproducibility crisis’ - Billions of euros wasted.
■ ‘There is growing alarm about results that cannot be
- reproduced. Explanations include increased levels of
scrutiny, complexity of experiments and statistics, and pressures on researchers.’ Nature 2016.
■ We require Automation to ensure reproducibility.
Credit : Ross King
The Concept of a Robot Scientist
Background Knowledge Analysis Final Theory Experiment selection Robot Results Interpretation
Computer systems capable of originating their own experiments, physically executing them, interpreting the results, and then repeating the cycle.
Hypothesis Formation
Journées BIOSS-IA - 22 juin 2017 3 Credit : Ross King
Eve: Robot Scientist
Journées BIOSS-IA - 22 juin 2017 4 Credit : Ross King
Scientific Goals
■ To make scientific research more efficient: cheaper,
faster, better.
■ Our vision is that within 10 years many scientific
discoveries will be made by teams of human and robot scientists.
■ This collaboration will produce scientific knowledge
more efficiently than either could alone.
Credit : Ross King
Scientific Goals
■ A framework for semi-automated and automated
knowledge discovery by teams of human and robot scientists.
■ Integrating advances in knowledge representation,
- ntology engineering, semantic technologies, machine
learning, bioinformatics, and automated experimentation.
Credit : Ross King
The Diauxic Shift
■ Yeast (S. cerevisiae). ■ First turn sugar into ethanol. ■ Then turn ethanol into CO2. ■ Cancer ■ Ageing
Credit : Ross King
The Diauxic Shift
Diauxic shift: when yeast (Saccharomyces cerevisiae) is grown on glucose with oxygen it first produces ethanol, and when the glucose is exhausted it reorganises (shifts) its metabolism to grow using the ethanol it previously produced (Dickinson, 1999) Typical culture-density profile of a fermentative batch culture of S.cerevisiae.
Credit : Daniel Trejo
Key Challenges
■ The AdaLab system needs to be:
– autonomous and perceptive to human requirements (its scientific collaborators). – able to continuously learn, adapt and improve in the “real world” complex environment of scientific research. – capable of continuous cycles of scientific hypothesis formation and experimentation that will improve its scientific knowledge (models).
Credit : Ross King
AdaLab Structure
Journées BIOSS-IA - 22 juin 2017 10 Credit : Ross King
Machine Learning objectives
■ Context
– Knowledge intensive – Scarce data – Limited experiment panel (gene knock out, growth curve)
■ Learning probabilistic graphical models from scarce data ■ Model revision from partially observed data ■ Experiment design
Objectives
Data collection
■ Collection of bioinformatic data about the yeast diauxic shift. ■ Development of an integrated metabolic and gene signalling
network
Simulation
■ Development of simulation tools, including both regulatory
and metabolic model simulation
■ Phenotype predictions from genotype using the integrated
model.
Diauxic Shift model
- 100 genes from which 68 are transcription factor
- There are 212 proven interactions from 322 in total
- 1133 annotations from 410 articles
Geistlinger et al. A comprehensive gene regulatory network for the diauxic shift in Saccharomyces cerevisiae.Nucleic Acids Res. 2013
Credit : Daniel Trejo
Metabolic network models
Credit : Daniel Trejo
Gene expression data
■ M3D (Faith et al. 2008), 247 experiments and 5520
probes.
■ Derisi et al. (1997), Diauxic shift- 7 time points.
Samples were taken at times 0 hr; 9.5 hr; 11.5 hr; 13.5 hr; 15.5 hr; 18.5 hr and 20.5 hr
■ Brauer et al (2005), Diauxic shift- 14 time points 1
chemostat (steady state). Samples taken at 7.25hr; 7.5.h; 7.75.hr; 8.hr; 8.25hr; 8.5.hr; 8.75.hr; 9.hr; 9.25hr; 9.5.hr; 9.75.hr; 10.hr
Credit : Daniel Trejo
Network completion from GE data
■ « Core » state of the art networks (regulation +
metabolic) model is already available – Zimmer is not perfect (false negatives) – Need to get a confidence over Zimmer edges and quantify them – Need to identify new likely edges/nodes (some known influential genes do not occur in this model)
■ (Few) public dynamic GE datasets are available
– Noisy, High #genes/#observations ratio (« fat » data)
Xa−Xr r
s2 a na + s2 r nr
Active Inactive
Target gene expression
−1 1 2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Density
Activated targets Repressed targets
a: activated r: repressed n: size X: mean s2: variance
−
Nicolle et al., CoRegNet: reconstruction and integrated analysis of co- regulatory networks, Bioinformatics, 2015 Elati et al., LICORN: learning co-operative regulation networks from expression data. Bioinformatics, 23:2407-2414, 2007
Tool$availability: CoRegNet
Variable selection : Influential Transcription Factors
Learning algorithm for "fat" data (LIPN)
■ Proposal: Model Averaging method over multiple spanning
arborescences – Biased components, to offset "data fragmentation" of fat data – Introduction of models diversity to get better results
- Data perturbation + Edge sampling
– On regulatory structure only
Data set
Dataset Resample Dataset Resample Dataset Resample Partial Weighted Digraphs Arboresc ences Consensual model
A->B 1 B->C 1 A->C 0.5 …
Edge ranking Threshold
Learning algorithm for "fat" gene expression data
■ Results: Tested on DREAM 8 Challenge, with promising
results (top 3)
■ Digest: high impact of diversity parameters (sampling
ratios)
Coutant et al, Jobim 2017 and MLSB 2017
■ Network Inference over subsets of the yeast genes
– Subset of genes given by Evry university, from CoRegNet results – Learning on different nodes set and different data settings – With or without prior information (e.g. Zimmer edges)
Adalab results
Simulation in ADAlab
GRN model : Gaussian linear bayesian network
,&R1;+*#;)&;5#5(+R%"&%$'1(+m+%;0+h8Y+h@Y+III+h>Y m+.%(+%+0%&/"*'!"#$$%"&'.,(/0+&K
Simulation in Adalab
■ Possibility to make predictions from the model
– Exporting to Adalab simulator format – Directly interfacing with CoRegFlux from Evry university
Glucose Ethanol Biomass
Ongoing work
Journées BIOSS-IA - 22 juin 2017 23
Model revision
§ GRN only § Experiments : gene KO -> growth curves § Gene states over time are not observed : rely on simulation / inference
- Infer partial gene states consistent with
- bserved growth curves (backward
simulation)
- Forward GRN simulation given KO
- A gene is inconsistent if its forward and
backward simulated states « disagree »
Model revision
§ Ranking nodes with respect to their observed inconsistency (taking into account its neighborhood in the model) § Candidate revisions : modifying the Markov blanket of highly ranked nodes (classic rewires : adding a link, deleting a link, inverting a link) § Simulating these updated models for KO experiments § Select the KO experiment for which those models most disagree
Growth -> Metabolic Genes : workflow
Original growth data Metabolites reactions bounds over time GPR_i values of FBA program
GPR_1 = max(gene1, min(gene2, gene3), …) GPR_2 = min(gene4, gene2, gene1) …. GPR_i max / min expressions
Partial metabolic gene inference
FBA with known growth