Dcouverte dans les rseaux biologiques htrognes : l'exprience Adalab - - PowerPoint PPT Presentation

d couverte dans les r seaux biologiques h t rog nes l exp
SMART_READER_LITE
LIVE PREVIEW

Dcouverte dans les rseaux biologiques htrognes : l'exprience Adalab - - PowerPoint PPT Presentation

Dcouverte dans les rseaux biologiques htrognes : l'exprience Adalab Cline Rouveirol, LIPN The Reproducibility Crisis One of the most important current issues in biology is The reproducibility crisis - Billions of euros


slide-1
SLIDE 1

Découverte dans les réseaux biologiques hétérogènes : l'expérience Adalab

Céline Rouveirol, LIPN

slide-2
SLIDE 2

The Reproducibility Crisis

■ One of the most important current issues in biology is

‘The reproducibility crisis’ - Billions of euros wasted.

■ ‘There is growing alarm about results that cannot be

  • reproduced. Explanations include increased levels of

scrutiny, complexity of experiments and statistics, and pressures on researchers.’ Nature 2016.

■ We require Automation to ensure reproducibility.

Credit : Ross King

slide-3
SLIDE 3

The Concept of a Robot Scientist

Background Knowledge Analysis Final Theory Experiment selection Robot Results Interpretation

Computer systems capable of originating their own experiments, physically executing them, interpreting the results, and then repeating the cycle.

Hypothesis Formation

Journées BIOSS-IA - 22 juin 2017 3 Credit : Ross King

slide-4
SLIDE 4

Eve: Robot Scientist

Journées BIOSS-IA - 22 juin 2017 4 Credit : Ross King

slide-5
SLIDE 5

Scientific Goals

■ To make scientific research more efficient: cheaper,

faster, better.

■ Our vision is that within 10 years many scientific

discoveries will be made by teams of human and robot scientists.

■ This collaboration will produce scientific knowledge

more efficiently than either could alone.

Credit : Ross King

slide-6
SLIDE 6

Scientific Goals

■ A framework for semi-automated and automated

knowledge discovery by teams of human and robot scientists.

■ Integrating advances in knowledge representation,

  • ntology engineering, semantic technologies, machine

learning, bioinformatics, and automated experimentation.

Credit : Ross King

slide-7
SLIDE 7

The Diauxic Shift

■ Yeast (S. cerevisiae). ■ First turn sugar into ethanol. ■ Then turn ethanol into CO2. ■ Cancer ■ Ageing

Credit : Ross King

slide-8
SLIDE 8

The Diauxic Shift

Diauxic shift: when yeast (Saccharomyces cerevisiae) is grown on glucose with oxygen it first produces ethanol, and when the glucose is exhausted it reorganises (shifts) its metabolism to grow using the ethanol it previously produced (Dickinson, 1999) Typical culture-density profile of a fermentative batch culture of S.cerevisiae.

Credit : Daniel Trejo

slide-9
SLIDE 9

Key Challenges

■ The AdaLab system needs to be:

– autonomous and perceptive to human requirements (its scientific collaborators). – able to continuously learn, adapt and improve in the “real world” complex environment of scientific research. – capable of continuous cycles of scientific hypothesis formation and experimentation that will improve its scientific knowledge (models).

Credit : Ross King

slide-10
SLIDE 10

AdaLab Structure

Journées BIOSS-IA - 22 juin 2017 10 Credit : Ross King

slide-11
SLIDE 11

Machine Learning objectives

■ Context

– Knowledge intensive – Scarce data – Limited experiment panel (gene knock out, growth curve)

■ Learning probabilistic graphical models from scarce data ■ Model revision from partially observed data ■ Experiment design

slide-12
SLIDE 12

Objectives

Data collection

■ Collection of bioinformatic data about the yeast diauxic shift. ■ Development of an integrated metabolic and gene signalling

network

Simulation

■ Development of simulation tools, including both regulatory

and metabolic model simulation

■ Phenotype predictions from genotype using the integrated

model.

slide-13
SLIDE 13

Diauxic Shift model

  • 100 genes from which 68 are transcription factor
  • There are 212 proven interactions from 322 in total
  • 1133 annotations from 410 articles

Geistlinger et al. A comprehensive gene regulatory network for the diauxic shift in Saccharomyces cerevisiae.Nucleic Acids Res. 2013

Credit : Daniel Trejo

slide-14
SLIDE 14

Metabolic network models

Credit : Daniel Trejo

slide-15
SLIDE 15

Gene expression data

■ M3D (Faith et al. 2008), 247 experiments and 5520

probes.

■ Derisi et al. (1997), Diauxic shift- 7 time points.

Samples were taken at times 0 hr; 9.5 hr; 11.5 hr; 13.5 hr; 15.5 hr; 18.5 hr and 20.5 hr

■ Brauer et al (2005), Diauxic shift- 14 time points 1

chemostat (steady state). Samples taken at 7.25hr; 7.5.h; 7.75.hr; 8.hr; 8.25hr; 8.5.hr; 8.75.hr; 9.hr; 9.25hr; 9.5.hr; 9.75.hr; 10.hr

Credit : Daniel Trejo

slide-16
SLIDE 16

Network completion from GE data

■ « Core » state of the art networks (regulation +

metabolic) model is already available – Zimmer is not perfect (false negatives) – Need to get a confidence over Zimmer edges and quantify them – Need to identify new likely edges/nodes (some known influential genes do not occur in this model)

■ (Few) public dynamic GE datasets are available

– Noisy, High #genes/#observations ratio (« fat » data)

slide-17
SLIDE 17

Xa−Xr r

s2 a na + s2 r nr

Active Inactive

Target gene expression

−1 1 2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Density

Activated targets Repressed targets

a: activated r: repressed n: size X: mean s2: variance

Nicolle et al., CoRegNet: reconstruction and integrated analysis of co- regulatory networks, Bioinformatics, 2015 Elati et al., LICORN: learning co-operative regulation networks from expression data. Bioinformatics, 23:2407-2414, 2007

Tool$availability: CoRegNet

Variable selection : Influential Transcription Factors

slide-18
SLIDE 18

Learning algorithm for "fat" data (LIPN)

■ Proposal: Model Averaging method over multiple spanning

arborescences – Biased components, to offset "data fragmentation" of fat data – Introduction of models diversity to get better results

  • Data perturbation + Edge sampling

– On regulatory structure only

Data set

Dataset Resample Dataset Resample Dataset Resample Partial Weighted Digraphs Arboresc ences Consensual model

A->B 1 B->C 1 A->C 0.5 …

Edge ranking Threshold

slide-19
SLIDE 19

Learning algorithm for "fat" gene expression data

■ Results: Tested on DREAM 8 Challenge, with promising

results (top 3)

■ Digest: high impact of diversity parameters (sampling

ratios)

Coutant et al, Jobim 2017 and MLSB 2017

slide-20
SLIDE 20

■ Network Inference over subsets of the yeast genes

– Subset of genes given by Evry university, from CoRegNet results – Learning on different nodes set and different data settings – With or without prior information (e.g. Zimmer edges)

Adalab results

slide-21
SLIDE 21

Simulation in ADAlab

GRN model : Gaussian linear bayesian network

,&R1;+*#;)&;5#5(+R%"&%$'1(+m+%;0+h8Y+h@Y+III+h>Y m+.%(+%+0%&/"*'!"#$$%"&'.,(/0+&K

slide-22
SLIDE 22

Simulation in Adalab

■ Possibility to make predictions from the model

– Exporting to Adalab simulator format – Directly interfacing with CoRegFlux from Evry university

Glucose Ethanol Biomass

slide-23
SLIDE 23

Ongoing work

Journées BIOSS-IA - 22 juin 2017 23

slide-24
SLIDE 24

Model revision

§ GRN only § Experiments : gene KO -> growth curves § Gene states over time are not observed : rely on simulation / inference

  • Infer partial gene states consistent with
  • bserved growth curves (backward

simulation)

  • Forward GRN simulation given KO
  • A gene is inconsistent if its forward and

backward simulated states « disagree »

slide-25
SLIDE 25

Model revision

§ Ranking nodes with respect to their observed inconsistency (taking into account its neighborhood in the model) § Candidate revisions : modifying the Markov blanket of highly ranked nodes (classic rewires : adding a link, deleting a link, inverting a link) § Simulating these updated models for KO experiments § Select the KO experiment for which those models most disagree

slide-26
SLIDE 26

Growth -> Metabolic Genes : workflow

Original growth data Metabolites reactions bounds over time GPR_i values of FBA program

GPR_1 = max(gene1, min(gene2, gene3), …) GPR_2 = min(gene4, gene2, gene1) …. GPR_i max / min expressions

Partial metabolic gene inference

FBA with known growth

Constraint & value propagation (incomplete)

slide-27
SLIDE 27

Partial metabolic gene inference Green & Blue : original curves Red : inferred curve Growth curve reconstruction quality Red : WITHOUT metabolite data Blue : WITH metabolite data

Growth curves -> Metabolic Genes : current results

slide-28
SLIDE 28

Adalab participants

■ Université Paris Nord : Dominique Bouthinon

Coutant, Guillaume Santini, Henry Soldano,

■ Université d’Evry : Mohamed Elati, Daniel Trejo ■ Katholieke Universteit Leuven / INRIA Lille : Jan

Ramon

■ Manchester University :Martin Carpenter, Ross King

Katherine Ropper

■ Brunel University : Jacek Grzebyta, Larisa Soldatova,