[PPT] - : Life and Medical Biology Data Accelerator (Lambda) PowerPoint Presentation

SLIDE 1

Institute of Computing Technology, Chinese Academy of Sciences

1

Guangming ¡Tan ¡

: ¡ ¡Life ¡and ¡Medical ¡Biology ¡Data ¡Accelerator ¡ (Lambda) ¡

Ins>tute ¡of ¡Compu>ng ¡Technology, ¡Chinese ¡Academy ¡of ¡Sciences ¡

SLIDE 2

Institute of Computing Technology, Chinese Academy of Sciences

2

Biological ¡Imaging ¡Data ¡Challenge

GAP: ¡O(years) ¡!

Moritz ¡Helmstaedter, ¡Cellular ¡resolu>on ¡connectomics: ¡challenges ¡of ¡dense ¡neural ¡circuit ¡reconstruc>on, ¡Nature ¡Method, ¡10(6), ¡2013 ¡

¡ ¡

High-Throughput Image Data Analysis is Required!

Higher ¡Resolu>on

SLIDE 3

Institute of Computing Technology, Chinese Academy of Sciences

3

High ¡Spa>otemporal ¡Resolu>on ¡Two-‑Photon ¡ Microscope ¡Imaging ¡System

In ¡vivo ¡
High ¡Dimension

Peking ¡University

SLIDE 4

Institute of Computing Technology, Chinese Academy of Sciences

4

Event ¡Detec>on ¡at ¡Cellular ¡Level

Cheng, ¡H ¡ Science ¡1993

Calcium ¡Spark

Sparks ¡and ¡Transients

Cheng, ¡H ¡ Cell ¡2008 ¡

Superoxide ¡Flash ¡

¡

5 ¡mm

Superoxide ¡Flash ¡

Elementary ¡Events ¡of ¡Calcium ¡Signals ¡ Visualiza>on ¡of ¡Reac>ve ¡oxygen ¡species ¡(ROS)

5µ 5µm

(Zhuang Zhou, Xiaowei Can)

Dendrite ¡Calcium ¡Imaging ¡ Animal’s ¡dynamic ¡neural ¡signals ¡

2µ 2µm

SLIDE 5

Institute of Computing Technology, Chinese Academy of Sciences

5

Life and Medical Biology Data Accelerator (Lambda, λ)

PostgreSQL
Bio-Format

Data ¡

Domain-Specific Accelerator
Auto-tuning library

Engine ¡

Built-in modules
Customizable framework

Pipeline ¡

lambda

SLIDE 6

Institute of Computing Technology, Chinese Academy of Sciences

6

λ-‑Image ¡ ¡ SoYware/Hardware ¡Stack

¢

High-‑dimension ¡& ¡mul>-‑mode ¡biological ¡image ¡data ¡system ¡

¢

Data ¡analysis ¡pipeline ¡for ¡massive ¡biological ¡image ¡

¢

Accelera>ng ¡data-‑intensive ¡algorithms ¡for ¡biological ¡image ¡analysis

machine learning stencil denoising Biological Data Analysis Pipeline

(cell event detection, segmentation)

Biological Data Analysis Algorithm Toolkit

deconvolution

Database ¡

Accelerator

MPI Spark CUDA OpenCL

RDMA ¡ Mouse ¡embryo ¡heart ¡image ¡cell ¡lineage ¡ Mice ¡brain ¡cell ¡Ca2+ ¡spark ¡detec>on ¡ Islet ¡forming ¡in ¡pancrea>c ¡and ¡imaging ¡in ¡vivo ¡

Cardiovasology ¡ Brain Endocrinology

SLIDE 7

Institute of Computing Technology, Chinese Academy of Sciences

7

High-‑throughput ¡Image ¡Processing ¡Algorithm

fMRI ¡ ssTEM ¡ sBEM ¡ LSFM O(N*P3)

Unbiased ¡Analysis ¡of ¡Events

Current ¡Compu>ng ¡Systems（SoYware/Hardware）：O(Years） Interac>ve High ¡Performance ¡Compu>ng ¡Pla_orm: ¡O(Minutes) High ¡accuracy

Machine ¡Learning

SLIDE 8

Institute of Computing Technology, Chinese Academy of Sciences

8

Paralleliza>on ¡with ¡in-‑memory ¡Compu>ng ¡Model

Raw ¡Data Raw ¡Data 3D ¡Deconvolution Intensity ¡Normalization Subtract ¡Background Preprocessed ¡Data Left ¡Side Match Powell Mutual ¡Information Left ¡Side Right ¡Side Right ¡Side Left ¡Side Wavelet ¡Decomposition Activity ¡Measure Fusion ¡Decomposition Fused ¡Data Right ¡Side Fused ¡Data Planarity ¡Enhancement Tensor ¡Voting 3D ¡Watershed Labelmap ¡Image ¡Data Image ¡R1 Preprocess Registration Fusion Image ¡L1 Preprocess Preprocessed Image ¡L1 Registration Registered ¡ Image ¡L1 Image ¡L2 Preprocess Registration Image ¡R2 Preprocess Registration Fused ¡Image1 Merge ¡Image ¡Stack

... ...

Fusion Fused ¡Image2

...

Fused ¡Image ¡Stack Segmetation Final ¡Result ¡For ¡Visualization ¡Process

Map Reduce

Preprocessed Image ¡L2 Preprocessed Image ¡R1 Preprocessed Image ¡R2 Registered ¡ Image ¡L2 Registered ¡ Image ¡R1 Registered ¡ Image ¡R2

Spark

SLIDE 9

Institute of Computing Technology, Chinese Academy of Sciences

9

GPU ¡Accelera>on ¡of ¡Algorithm ¡Modules

0 ¡ 5 ¡ 10 ¡ 15 ¡ 20 ¡ 25 ¡ 30 ¡ 35 ¡ 40 ¡ DeconvoluJon ¡ Median ¡Filter ¡ Objectness ¡Filter ¡IteraJve ¡Closing ¡

CPU ¡ GPU ¡

SLIDE 10

Institute of Computing Technology, Chinese Academy of Sciences

10

Image Processing & Analysis Pipeline

RL/Sparse Mutual Information Machine Learning

(Event detection or Pattern Analysis) Watershed segmentation Labelmap selection Particle analysis Mutual Information is derived from Information Theory and its application to image registration has been proposed in different forms 2D and 3D iterative deconvolution.

Machine Learning

Mice Brain Cell Ca2+ Spike Detection

Analysis ¡ Segmenta>on ¡ Registra>on ¡ Deconvolu>on ¡ Fusion ¡

GPU

✔ ✔ ✔ ✔ ✔

Spark

✔ ✔ ✔

Fusion ¡

Wavelet based

✔

use global five-level wavelet decomposition

SLIDE 11

Institute of Computing Technology, Chinese Academy of Sciences

11

Deconvolu>on ¡of ¡Pancreas ¡Islet ¡Images

2 ¡DAYS

4 ¡GPUs ¡(K20)

Terabyte ¡EM ¡Images

Preprocessing e:image p:psf for N iterations /*apple imaging model to estimate*/ E=gpu_fft(e, batch) B=gpu_multiply(E, PSF, batch) b=gpu_ifft(B, batch) /*captured image divided by ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡blurred estimate*/ r=gpu_divide(o, b, batch) /*calculate correction vector*/ R=gpu_fft(r, batch) C=gpu_multiply(R, PSF, batch) c=gpu_ifft(C, batch) /*apply correction vector*/ e=gpu_multiply(e,c, batch) end deconvolution

GPU batch name pixel(XYZ) #ite rs size Fiji (JAV A) GPU speed up beta.tif 1024x2048x51 100 408MB 60m 22s 163 glucose_sequen tial2.tif 512x512x400 50 200MB 30m 10s 180

4.7YEARS

SLIDE 12

Institute of Computing Technology, Chinese Academy of Sciences

12

Extrac>ng ¡Cells ¡from ¡Mouse ¡Embryos ¡Images

light-‑sheet ¡microscopes ¡images

1.5 ¡DAYS

culture Fast, ¡ ¡ two-‑side, ¡ ¡ 3D, ¡ ¡ duel-‑color ¡imaging

excitaJon detecJon detecJon

ReconstrucJon ¡ ¡

Time1 Time2 Time3

……… 200 ¡Time ¡points ¡ 2x500 ¡images ¡ 2048x2048 ¡pixels ¡ 4GB*2*200 ¡= ¡1.6TB

SLIDE 13

Institute of Computing Technology, Chinese Academy of Sciences

13

Blitz：High ¡Performance ¡Machine ¡Learning ¡Toolkit

Algorithm ¡Interface Layer ¡Opera>on Virtual ¡Backend

NVIDIA ¡DIGITS ¡(Customized) Sugon ¡Xmachine

Pipelining ¡ Parallelism Model ¡ Parallelism Data ¡ Parallelism

RDMA

Classifica>on Clustering

SVM K-‑means

Accelerator:

Programming ¡ ¡ Hardware Distributed ¡ Parallelism Operator ¡Language ¡ ¡(Linear ¡Algebra ¡/ ¡Tensor ¡Primi>ves) ¡

Automa>c ¡Performance ¡Tuning

Performance ¡ ¡ Interface

Communica>on ¡ Avoiding

KNN vectoriza>on mul>thread

DNN

CNN

Dimensionality

PCA

SLIDE 14

Institute of Computing Technology, Chinese Academy of Sciences

14

Convolu>onal ¡Nets ¡2012 ¡(AlexNet)

Hardware ¡Environment ¡ CPU: ¡Dual ¡Intel(R) ¡Xeon(R) ¡CPU ¡E5-‑2680 ¡v3, ¡28 ¡ ¡ CPU-‑Memory: ¡128GB ¡ ¡ GPU: ¡Tesla ¡K20 ¡ ¡ GPU-‑Memory:6GB ¡

blitz ¡ 1310s ¡ ¡ caffe ¡ ¡ 1960s ¡

blitz ¡ 125ms caffe ¡ 196ms

13-‑layer ¡architecture

Layer Type Maps ¡and ¡neurons Kernel ¡size Input 1 ¡map ¡of ¡224*224 ¡neurons 1 ConvoluJon 64 ¡maps ¡of ¡55*55 ¡neurons 11*11 2 Pooling 64 ¡maps ¡of ¡27*27 ¡neurons ¡ 3*3 3 ConvoluJon 192 ¡maps ¡of ¡27*27 ¡neurons 5*5 4 Pooling 192 ¡maps ¡of ¡13*13 ¡neurons 3*3 5 ConvoluJon 384 ¡maps ¡of ¡13*13 ¡neurons 3*3 6 ¡ ConvoluJon 256 ¡maps ¡of ¡13*13 ¡neurons ¡ 3*3 7 ¡ ConvoluJon 256 ¡maps ¡of ¡13*13 ¡neurons 3*3 8 ¡ Pooling 256 ¡maps ¡of ¡6*6 ¡neurons 3*3 9 ¡ Fully-‑ connected 4096 ¡neurons 1*1 10 ¡ Dropout 4096 ¡neurons 1*1 11 ¡ Fully-‑ connected 4096 ¡neurons 1*1 12 ¡ Dropout 4096 ¡neurons 1*1 13 ¡ Fully-‑ connected 1000 ¡neurons 1*1

Input ¡3@224*224 Conv1 ¡64@55*55 Convolu>ons Pooling Pool1 ¡64@27*27 Convolu>ons Conv2 ¡192@27*27 Pool2 ¡192@13*13 Pooling

1 ¡batch ¡size ¡running ¡>me batch ¡size=128 ¡ 1 ¡epoch ¡running ¡>me ¡

SLIDE 15

Institute of Computing Technology, Chinese Academy of Sciences

15

Flash ¡Detec>on

A B C

E.Coli, ¡Jme ¡series, ¡512X512X(100 ¡frames). ¡

Intensity ¡increases ¡rapidly ¡ Intensity ¡declines ¡obviously ¡ ¡averaged ¡intensi>es ¡change ¡con>nuously ¡

A ¡nonstandard ¡flash ¡is ¡not ¡found ¡by ¡ ¡either ¡expert ¡or ¡threshold-‑based ¡method

SLIDE 16

Institute of Computing Technology, Chinese Academy of Sciences

16

Automated ¡Flash ¡Detec>on ¡based ¡on ¡Blitz

Stack of fluorescence images Image registration Cell segmentation Intensity average Data over- fitting check

If model accuracy is close to 100%

Feature addition Data skew elimination

Test set upper bound estimation Data ratio adjustment Cross validation If model accuracy is close to upper bound

Model generation Event detection Events output Input Pre- processing Feature selection Model train Model prediction Result collection Skew elimination

F ¡value: F=2×precision×recall/(precision+recall), where

precision ¡=(no. ¡returned ¡flashes)/(no. ¡returned ¡peaks) recall ¡=(no. ¡returned ¡flashes)/(no. ¡all ¡the ¡flashes). ¡

¢ Use ¡cross ¡valida>on ¡to ¡find ¡parameters ¡to ¡train ¡a ¡model ¡which ¡

can ¡get ¡beper ¡accuracy ¡and ¡F ¡value. ¡ ¡

¢ Use ¡MPI ¡+ ¡CUDA ¡paralleliza>on ¡to ¡reduce ¡training ¡>me

9 features: 6 local, 3 global
(amplitude, width, slope)*(left, right)
Average intensity of trace, distance to

the (last, next) peak

SLIDE 17

Institute of Computing Technology, Chinese Academy of Sciences

17

Membrane ¡Segmenta>on ¡based ¡on ¡Blitz

Group ¡ Rand ¡error ¡ [·√10 ¡−3 ¡] ¡ Warping ¡error ¡ [·√10 ¡−6] ¡ Pixel ¡error ¡ [·√10 ¡−3 ¡] ¡ Training ¡Time ¡ Simple ¡ Thresholding ¡ 445 ¡ ¡ 15522 ¡ 222 ¡ NIPS ¡2012 ¡ 48 ¡ 434 ¡ 60 ¡

7 ¡Days ¡ ¡(Four ¡GPUs) ¡

Our ¡approach 116 ¡ 2865 ¡ 95 ¡

2 ¡Days ¡ ¡(One ¡GPUs) ¡ Deep ¡Neural ¡Network ¡ ¡ Brain Heart

SLIDE 18

Institute of Computing Technology, Chinese Academy of Sciences

18

Conclusion

¢ Develop ¡a ¡Spark-‑based ¡paralleliza>on ¡framework ¡

for ¡high ¡throughput ¡image ¡analysis ¡pipelines ¡

¢ Op>miza>ons ¡on ¡GPU ¡

§ Core ¡algorithms ¡in ¡image ¡processing ¡(3x-‑10x) ¡ § SGEMM ¡in ¡deep ¡learning ¡( ¡ ¡ ¡30%) ¡

¢ Achieve ¡significant ¡speedups ¡for ¡image ¡processing ¡

§ ¡Years ¡à ¡days

SLIDE 19

Institute of Computing Technology, Chinese Academy of Sciences

19

: Life and Medical Biology Data Accelerator (Lambda) - - PowerPoint PPT Presentation

Thanks!