Annotation Analytics for Gene and Protein functions Nigam Shah, - - PowerPoint PPT Presentation

annotation analytics for gene and protein functions
SMART_READER_LITE
LIVE PREVIEW

Annotation Analytics for Gene and Protein functions Nigam Shah, - - PowerPoint PPT Presentation

Annotation Analytics for Gene and Protein functions Nigam Shah, MBBS, PhD nigam@stanford.edu Annotation service Process textual metadata to automatically tag text with as many ontology terms as possible. 107 million calls, ~1000 GB data


slide-1
SLIDE 1

Annotation Analytics for Gene and Protein functions

Nigam Shah, MBBS, PhD nigam@stanford.edu

slide-2
SLIDE 2

Annotation service

Process textual metadata to automatically tag text with as many ontology terms as possible.

107 million calls, ~1000 GB data

slide-3
SLIDE 3

Resource index

Pubmed Abstracts Adverse Events (AERS) GEO : Clinical Trials Drug Bank

Won 1st prize at the 2010 Semantic Web Challenge @ ISWC

slide-4
SLIDE 4

Understanding the genome

  • Units of study range in

length from ‘whole chromosome’ to ‘singe nucleotide’

  • E.g. three copies of Chr.

21  Down’s syndrome

  • The focus in on finding

the functional associations of strings in the genome

slide-5
SLIDE 5

Genome

Generic GO based analysis routine

Reference set Study Set

  • Get annotations for each

gene in a set

  • Count the occurrence of

each annotation term in the study set

  • Count the occurrence of

that term in some reference set (whole genome?)

  • P-value for how surprising

their overlap is.

slide-6
SLIDE 6

Genes2MSH GOPubMed

Annotation Analytics Landscape

SNOMED-CT Gene Ontology Gene Sets NCIT ICD-9 Human Disease Cell Type MeSH Drugs, Chemicals Grant Sets Paper Sets Patient Sets Drug Sets :

?

Health Indicator Warehouse datasets

slide-7
SLIDE 7

Mutation enrichment

slide-8
SLIDE 8

Profiling a set of Aging genes

Disease Ontology

~ 30% of genome 261 Age-related genes

Genome

slide-9
SLIDE 9

Genes2MSH GOPubMed

Annotation Analytics Landscape

SNOMED-CT Gene Ontology Gene Sets NCIT ICD-9 Human Disease Cell Type MeSH Drugs, Chemicals Grant Sets Paper Sets Patient Sets Drug Sets : Health Indicator Warehouse datasets

Aging

Mutations

What else can we do?

1. Units of study range in length from ‘whole chromosome’ to ‘singe nucleotide’ 2. The focus in on finding the functional associations of strings in the genome 3. For each type of “string”, there will be some textual descriptions that you can process computationally.

slide-10
SLIDE 10

10

The team @

www.bioontology.org/project-team NIH Roadmap grant U54 HG004028