1. Integration of proteomics and transcriptomics data to model the - - PowerPoint PPT Presentation

1 integration of proteomics and transcriptomics data to
SMART_READER_LITE
LIVE PREVIEW

1. Integration of proteomics and transcriptomics data to model the - - PowerPoint PPT Presentation

1. Integration of proteomics and transcriptomics data to model the dynamics of gene expression The processes that drive gene transcription and translation remain poorly understood, as is evident from long non-coding RNAs (lncRNAs), small open


slide-1
SLIDE 1
  • 1. Integration of proteomics and transcriptomics data

to model the dynamics of gene expression The processes that drive gene transcription and translation remain poorly understood, as is evident from long non-coding RNAs (lncRNAs), small open reading frames (sORFs), and complex correlations between protein and mRNA abundances. We will therefore endeavor to develop integrative strategies to explore these issues in more detail.

slide-2
SLIDE 2
slide-3
SLIDE 3

PROTEOFORMS

  • unravel proteome complexity
  • aided by genomics/transcriptomics/

translatomics

  • novel sequencing technologies: PacBio

(SMRT-seq) and Oxford Nanopore (MinION)

  • lncRNAs, sORFs, uORFs, SAAV…

Issues - Opportunities: DBsize+++, matching cross-omics datasets, feed public resources

slide-4
SLIDE 4

(RE-)ANNOTATION

  • non-model (micro)organisms
  • rescan public repos (key topic 2), stringent

workflow to reannotate novel events

  • PRIDE - Ensembl integration (ProteoAnnotator)

Issues - Opportunities: metadata, stringent filter for FP , robust workflows

  • Identify novelty in the proteome: reannotate

GENCODE, Ensembl, UCSC

  • completely novel coding gene
  • novel isoform
  • new CDS in non-coding
slide-5
SLIDE 5

QUANTIFICATION

  • compare NGS (semi-)quantitative measures to

MS measures (spectral, label or intensity based, SRM, DIA)

  • longitudinal studies

Issues - Opportunities: robust implementations for integration

  • systems biology / personalized medicine
  • asses post-transcriptional or -translational

regulation

slide-6
SLIDE 6

IMMUNOPEPTIDOMICS

  • MHC HLA class I and II immunopeptidome
  • HUPO-HIPP

Issues - Opportunities: create and update atlasses (e.g. systemMHC, create standards for reporting, improve detection of immunopeptidome

slide-7
SLIDE 7

TOOL DEVELOPMENT

Specialized DB Customized DB

Sequence redundancy removal/filtering based on: EST overlap RNAseq/RIBOseq overlap GENCODE annotation chromosome-centric (C-HPP) Retention Time (HiRief)

Known/Derived Protein DB

Known protein genome EST Derived NGS Experimental or publicly availabe (NCBI-SRA, EBI-ENA) exome-, RNAseq RIBOseq NGS mapping sofware Bowtie BWA - STAR - TopHat NGS other tools GATK samtools/mpileup

Custom DB creation

Automated Pipelines

PROTEOFORMER

  • uses RIBOseq
  • includes isoform, expression, TIS, variant

info

  • Galaxy + command line interface (CLI)

CustomProDB

  • uses RNAseq
  • includes expression, variant, junction info
  • Bioconductor - R language

Quilts

  • uses RNAseq
  • includes junction, variant, fusion info
  • web interface and CLI

SpliceDB

  • uses RNAseq
  • includes variant, junction info
  • CLI

SAP-db, SPLICE-db, REDUCED-db

  • uses RNAseq
  • includes variant, junction and expression

info

  • Galaxy

MSMSpdbb (prokaryotic genomes)

  • uses genome sequence
  • CLI

PIT

  • uses RNAseq
  • Galaxy and CLI
  • 1. DB creation
  • 2. MS/MS data

Experimental data Public repositories PRIDE PeptideAtlas MassIVE ProteomicsDB Chorus CPTAC

m/z

SearchGUI PROTEOFORMER + SearchGUI/Peptide Shaker SearchGUI/PeptideShaker multi-omics tools GenoSuite bacterial-proteogenom ic-pipeline

  • peptide identification

PGTools

  • peptide identification

Other workflow solutions

Taverna, KNIME, Yabi, bioKepler Mapping Visualization TISdb ChimerDB - dbCRID - ChiTaRS dbSNP - CanProVar - COSMIC HaltORF sORFdb UTRdb uORF dbRES -DARNED LNCiPedia Animal Toxin Annotation OMIM TIS based on RIBOseq Fusion, chimeric Variation Alternative ORF Small ORF Untranslated Regions Translated upstream ORF RNA-editing sites Long non-coding RNA Venoms, Toxins Human Genes/Disorders UniProtKB, RefSeq, Ensembl 6 RF translation 3 RF translation 3 RF translation 1 RF translation Non-splice aware mapper Splice aware alignment Variation calling Gene Prediction Homology-based Splice Graph Tag-based redundancy removal/filtering Derived NGS mapping sofware NGS other tools CustomProDB Quilts SpliceDB

  • 3. Peptide identifjcation
  • 4. Validation & interpretation
  • 5. Mapping & visualization

Complete proteogenomics pipeline

Experimental data Public repositories

DB-search Tag-based, hybrid search De novo or homology search

SearchGUI interface to: X!tandem MyriMatch MS Amanda MS-GF+ OMSSA Comet Tide Tag-based search: InSpect GenoMS Tag-based search & homology match: Spider TagRecon DenovoGUI interface to: PepNovo+ DirecTag UniNovo MS-Blast MS-Homology Tools aiding the mapping and covisualization of proteomics and NGS data (genome-centric)

Galaxy implementations

PROTEOFORMER + SearchGUI/Peptide- Shaker

  • RIBOseq + proteomics integration
  • genome-centric visualization

SAP-db/SPLICE-db/REDUCED-db + SearchGUI/PeptideShaker

  • RNAseq + proteomics integration

Multiple multi-omics tools are available

  • see examples in (Boekel et al., 2015)

Other stand-alone platforms

PEPPY (Eukaryotes) GenoSuite, bacterial-proteogenom- ic-pipeline (Prokaryotes)

  • 6RF genome translation
  • MS-based proteomics integration
  • visualization

ENOSI

  • 6RF + RNAseq (using SpliceDB tool)
  • peptide identification
  • visualization + annotation

PGTools

  • RNAseq or Ensembl-derived
  • peptide identification
  • visualization + (onco-)annotation

ProteoAnnotator

  • gene (re-)annotation using MS proteomics
  • using HUPO PSI standard: MSIdentML

Other workflow solutions

Taverna, KNIME, Yabi, bioKepler Mapping PMT PGx PepLine IggyPep proBAMr Visualization peptide_to_gff iPiG MIMOMICs PG Nexus using IGV Protter VESPA

PeptideShaker

  • Combined PEP/FDR estimation for RIBOseq-derived DB

(1RF translation) and reference protein DB (Crappé et al., 2015 & Menschaert et al., 2013)

  • Multistage PEP/FDR estimation using the follow-up analysis

for RNAseq-derived DB (3RF translation) and reference protein DB (Nesvizhskii, 2014)

  • Event calling (e.g. novel TIS, novel coding region ...)
  • Annotation analysis (STRING, QuickGO, DAVID, ...)
  • GO enrichment analysis (Ensembl-GO)

Peptide/ProteinProphet

  • within the Trans-Proteomic Pipeline (TPP)

Tool aiding the multi-omics visualization of protein interaction networks: Circos Cytoscape

slide-8
SLIDE 8

TOOLS Integration

  • with genomics/transcriptomics


=>PSI: proBed / proBAM

  • quantitative:

=> 2D annotation enrichment/Perseus

  • with NXTSEQ tools:


=> BioConda + Galaxy, Knime (key topic 4)

Genomic coordinates proBAM

Coverage Junction

proBed Gene structure

Alignment collapsed view expanded view

slide-9
SLIDE 9

TOOLS Statistics & Machine Learning

  • FDR calcul for expanding custom DBs (key topic 3)
  • predictive models for ORF detection, proteogenomics applications
slide-10
SLIDE 10

TOOLS Visualization - reporting

  • Cytoscape (plugins), Gephi, GraphViz for proteogenomics

visual integration of ShotgunMS, NtermMS, RNAseq, RIBOseq and RRBS viral response (GO- annotation) based on PI network or pathway

slide-11
SLIDE 11

Proteoforms - Proteome Complexity (Re-)Annotation Quantification Immunopeptidome Tool Development