In silico blood genotyping from exome sequencing data Silvio - PowerPoint PPT Presentation
In silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova, Italy URL: http://protein.bio.unipd.it/ Today Personalized genetics has been upon us for some time How
In silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova, Italy URL: http://protein.bio.unipd.it/
Today • Personalized genetics has been upon us for some time • How good are we at actually identifying phenotype from whole genome?
The CAGI Personal Genom e Project ( PGP) Challenge • Few goals are more pure to genome interpretation than predicting traits from raw sequence (or genotype) data • In this CAGI challenge, phenotypes/traits are predicted for real people with genetic data • 10 individual’s genetic information from the Personal Genome Project are provided (PGP-10) Dataset provided by George Church
Personal genome project (PGP) ‐ Predict individuals’ phenotype Numerical traits 33. Birth weight (in g) 34. HDL level (in mg/dL) * 35. LDL level (in mg/dL) * 36. Triglyceride level (in mg/dL) * 37. Fasting blood glucose level (in mg/dL) 38. Warfarin dose (in mg) 39. Age at Menarche 40. Annual income (in $)
Personal genome project (PGP) ‐ Predict individuals’ phenotype Numerical traits 33. Birth weight (in g) 34. HDL level (in mg/dL) * 35. LDL level (in mg/dL) * 36. Triglyceride level (in mg/dL) * 37. Fasting blood glucose level (in mg/dL) 38. Warfarin dose (in mg) 39. Age at Menarche 40. Annual income (in $)
Blood Groups • Clear genetic cause of phenotypes • Model system for phenotype prediction • Good description in literature • High relevance, especially for blood transfusions (Blood. 2009;114: 248-256)
Exam ple: ABO glycosyltransferase Amino acid residues differing between blood group A- and B-active transferases, respectively (Arg176Gly; Gly235Ser; Leu266Met; Gly268Ala) are shown with the single-letter code and their positions indicated. Blood Grp Genes Antigens ABO ABO A, B, O
Relevant Blood Types 10 out of ca. 30 blood groups are relevant for transfusions Blood Grp Genes Antigens ABO ABO A, B, O RH RHCE, RHD D, E, C plus 50 minor DUFFY DARC FY(a), FY(b) Kell KEL K1, K2 plus 23 minor Di a , Di b , Wr a , Wr b Diego SLC4A1 Kidd SLC14A1 Jk(a), Jk(b) Lewis FUT3 a, b Lutheran BCAM Lu(a), Lu(b) plus 15 minor MNS GYPA, GYPB, M, N, S plus 40 minor GYBE Bombay FUT1, FUT2 H, secretor
BOOGI E: BlOOd Group I dEntifier • A knowledge-based system to predict blood groups from sequencing data • All 10 groups relevant for blood transfusions are predicted • A specialized genotype-phenotype knowledge base is required
BOOGI E: Know ledge representation • Stored in tree-like structure • Rules expressed in “ if <mutation(s)> then <phenotype(s)> ” form
BOOGI E: Know ledge collection Blood G rp G enes Antigens ABO ABO A, B, O RH RH CE, RHD D, E, C plus 50 m inor DUFFY DARC FY(a), FY(b) Kell KEL K1, K2 plus 23 m inor Di a , Di b , Wr a , Wr b Diego SLC4A1 Kidd SLC14A1 Jk(a), Jk(b) Lewis FUT3 a, b Lutheran BCAM Lu(a), Lu(b) plus 15 m inor M NS GYPA, GYPB, M , N, S plus 40 m inor GYBE Bom bay FUT1, FUT2 H, secretor – Manually curated – 580 rules derived
ANNOVAR ANNOVAR Millions of SNVs (Wang et al., Nucleic Acids Research 2010) Gene ‐ based annotation of variants Select conserved positions ANNOVAR is used to reduce the SNVs Remove unrelated to manageable genes number. Relevant variants Few relevant SNVs
BOOGI E Pipeline Blood G rp G enes Antigens ABO ABO A, B, O RH RHCE, RHD D, E, C plus 50 m inor DUFFY DARC FY(a), FY(b) Kell KEL K1, K2 plus 23 m inor Diego SLC4A1 Di a , Di b , W r a , Wr b Kidd SLC14A1 Jk(a), Jk(b) Lewis FUT3 a, b Lutheran BCAM Lu(a), Lu(b) plus 15 m inor M NS GYPA, GYPB, M , N, S plus 40 m inor GYBE Bom bay FUT1, FUT2 H, secretor
Benchm arking • BOOGIE covers all known blood group variants • Difficulty in finding genome sequences with known blood phenotypes • Personal Genome Project (PGP) as annotated benchmark set
Personal Genom e Project ( PGP) The mission of the PGP is to encourage the development of personal genomics • 10 individual’s genetic information from the Personal Genome Project are provided (PGP-10) • A larger dataset (PGP-1K) aims to cover at least 1,000 genomes Unfortunately, only ABO and Rh blood group information is available
PGP-1 0 Data Back row ( left to right ): James Sherley, Misha Angrist, John Halamka, Keith Batchelder, Rosalynn Gill. Front row ( left to right ): Esther Dyson, George Church, Kirk Maxey. Not shown : Stan Lapidus and Steven Pinker.
PGP-1 0 Data
PGP-1 0 Results BOOGIE predicts correctly all ABO types and all except one (PGP-4) Rh groups PGP1 PGP4 PGP8 Known O + A - B + ABO O A B Rh c; e; weak D c; e; weak D c; e; weak D DUFFY FY(a+); FY(b-) FY(a-); FY(b+) FY(a-); FY(b+) KELL K2; K21+; K4-; K2; K21+; K4-; K2; K21+; K4-; K3-; K11; K17; K3-; K11; K17; K3-; K11; K17; K14; K24; K6+; K14; K24; K6+; K14; K24; K6+; K7- K7- K7- Diego Dib; Memph neg Dib; Memph neg Dib; Memph neg KIDD Jk(a-); Jk(b+) Jk(a-); Jk(b+) Jk(a+); Jk(b-) Lewis negative negative negative Lutheran Lu(a-); Lu(b+); Lu(a-); Lu(b+); Lu(a-); Lu(b+); Lu6+; Lu9-; Lu4; Lu6-; Lu9+;Lu4-; Lu6+; Lu9-;Lu4-; Lu8+; Aua+;Aub- Lu8+; Aua-;Aub+ Lu8+; Aua+;Aub- MNS M; S M; s M,s Bombay H+; secretor H+; secretor H+; secretor
PGP-1 K Results • A second dataset was built from all PGP-1K participants with available blood group information for a total of 22 individuals • This dataset contains micro array data ( 23&me SNPs) P = predicted R = real * = missing blood group relevant SNPs from dataset
Conclusions • We developed a method, called BOOGIE, to predict the ten blood groups relevant for transfusions from sequencing data – Specialized knowledgebase with 580 genotype to phenotype rules – Novel variants can be easily considered • Benchmarking was (so far) only possible on PGP data for the ABO and Rh blood groups – The ABO and Rh systems are correctly predicted in 85-100% of cases – The Rh- type presents some additional difficulties
Acknowledgements Acknowledgements Manuel Giollo Giovanni Minervini Marta Scalzotto (not shown) Emanuela Leonardi Carlo Ferrari Funding FIRB Futuro in Ricerca Università di Padova CARIPLO AIRC URL: http:// URL: http://protein.bio.unipd.it protein.bio.unipd.it/ /
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.