Using Big Data technologies to uncover genetic causes of Amyotrophic lateral sclerosis
HEATH & BIOSECURITY
11 October 2017 Dr Natalie Twine | Transformational Bioinformatics | @nat_twine
Amyotrophic lateral sclerosis Dr Natalie Twine | Transformational - - PowerPoint PPT Presentation
Using Big Data technologies to uncover genetic causes of Amyotrophic lateral sclerosis Dr Natalie Twine | Transformational Bioinformatics | @nat_twine 11 October 2017 HEATH & BIOSECURITY Genomics will outpace other BigData
HEATH & BIOSECURITY
11 October 2017 Dr Natalie Twine | Transformational Bioinformatics | @nat_twine
Stephens et al. PLOS Biology 2015
Astronomy Twitter YouTube Genomics
Dr Natalie Twine | Big Data technologies to understand ALS| @nat_twine
Desktop compute High-performance compute cluster Hadoop/Spark compute cluster Focus small data Compute-intensive Data-intensive Fault tolerant No No Yes Node-bound Yes Yes No Parallelization 10 CPU 100 CPU 1000 CPU Parallelization procedure bespoke bespoke standardized CSIRO solution
Dr Natalie Twine | Big Data technologies to understand ALS| @nat_twine
Dr Natalie Twine | Big Data technologies to understand ALS| @nat_twine
Dr Natalie Twine | Big Data technologies to understand ALS| @nat_twine
– Exomes (Familial, n=137) – WGS (Sporadic, n=800) – Project MinE WGS (Sporadic, n=15,000)
GOI SNP A SNP B SNP C
Dr Natalie Twine | Big Data technologies to understand ALS| @nat_twine
– Designed to identify and remove relatives as part of GWAS workflow – Identifying more distant relatives is challenging – Tools effective at distant relationship detection are SLOW
– Identify more distant relationships with confidence
Each blue dot represents a relationship between a pair of ALS patients.
Dr Natalie Twine | Big Data technologies to understand ALS| @nat_twine
n=172 ( 137 Familial and 35 Sporadic)
Degree of relationship Number True positives False positives Unknown Duplicates 6 6 (100%) 1st degree 33 33 (100%) 2nd degree 23 23 (100%) 3rd degree 27 12 (44%) 9 (33%) 6 (22%) 4th degree 1310 n/a n/a n/a 5th degree 7852 n/a n/a n/a
Dr Natalie Twine | Big Data technologies to understand ALS| @nat_twine
– 800 WGS Sporadic and Familial ALS – 15,000 WGS samples (Project MinE)
Dr Natalie Twine | Big Data technologies to understand ALS| @nat_twine
z
BMC Genomics 2015, 16:1052 PMID: 26651996 (IF=4)
Bringing BigLearning to genomics applications. VariantSpark learns from
3000 individuals and 80 million mutations in
under 30 minutes Association testing Clustering Classification
Speed Accuracy Dr Natalie Twine | Big Data technologies to understand ALS| @nat_twine
Dr Natalie Twine | Big Data technologies to understand ALS| @nat_twine
Dr Natalie Twine | Big Data technologies to understand ALS| @nat_twine
Exomes (n= 137 Familial ALS)
Euclidean distance 1 2 3 5 UR 4 Degree of relationship distance
(IBD)
1 1 0.4 accuracy Degree of relationship 10 5
Ramstetter et al., Genetics 2017
Dr Natalie Twine | Big Data technologies to understand ALS| @nat_twine
Dr Natalie Twine | Big Data technologies to understand ALS| @nat_twine
Identify novel relationships in Sporadic ALS WGS (n=800) proof of principle cohort Familial ALS WGS (n=89)
speed and scalability
sensitivity and specificity (AUC)
random forest)
traditional GWAS (single loci regression).
smaller cohorts give robust insights
more loci
Bone Mineral Density (BMD) as the phenotype; 1,936 individuals with 7.2 Million variants (imputed from array).
Novel disease- causing variants Preventative measures Identify related individuals Personalised treatment
Dr Natalie Twine | Big Data technologies to understand ALS| @nat_twine
Natalie Twine
Denis Bauer Oscar Luo Rob Dunne Piotr Szul
Team
Aidan O’Brien Laurence Wilson
Adrian White Mia Champion
Collaborators News Software Kaitao Lai
Ian Blair Kelly Williams Emily McCann Jenn Fifita
Dr Natalie Twine | Big Data technologies to understand ALS| @nat_twine