Burrows-Wheeler Short Read Aligner on AWS EC2 F1 Instances - - PowerPoint PPT Presentation

burrows wheeler short read aligner on aws ec2 f1 instances
SMART_READER_LITE
LIVE PREVIEW

Burrows-Wheeler Short Read Aligner on AWS EC2 F1 Instances - - PowerPoint PPT Presentation

University of Virginia High-Performance Low-Power Lab Prof. Dr. Mircea Stan Burrows-Wheeler Short Read Aligner on AWS EC2 F1 Instances Smith-Waterman Extension on FPGA(s) Sergiu Mosanu, Kevin Skadron and Mircea Stan AACBB, February 23, 2018


slide-1
SLIDE 1

University of Virginia High-Performance Low-Power Lab

  • Prof. Dr. Mircea Stan

Burrows-Wheeler Short Read Aligner

  • n AWS EC2 F1 Instances

Smith-Waterman Extension on FPGA(s) Sergiu Mosanu, Kevin Skadron and Mircea Stan

AACBB, February 23, 2018

slide-2
SLIDE 2

Motivation

Why target the cloud for bioinformatics? On-demand scalability – Increase / decrease resources with demand – Lower up-front infrastructure investments – Reduced cost of ownership Increased performance – High-end server machines – Equipped with GPU / FPGA accelerators Security compliant [1]

Sergiu Mosanu – BWA Smith-Waterman Extension on AWS EC2 F1 2 / 13

slide-3
SLIDE 3

Motivation

Why target FPGA acceleration? FPGAs are massively parallel More power efficient than CPU and GPUs – higher performance at lower cost?

Instance Accelerator vCPU Memory [GiB] Cost [USD/h] c5.2xlarge

  • 8

16 0.34 (1x) c5.18xlarge

  • 72

144 3.06 f1.2xlarge 1 FPGA 8 122 1.65 (5x) f1.16xlarge 8 FPGA 64 976 13.20 p3.2xlarge 1 GPU 8 61 3.06 (9x) p3.16xlarge 8 GPU 64 488 24.48

Table: AWS EC2 Instances and On-Demand Pricing

Sergiu Mosanu – BWA Smith-Waterman Extension on AWS EC2 F1 3 / 13

slide-4
SLIDE 4

Burrows-Wheeler Short-Read Aligner

Smith-Waterman (SW) Extension Available under GPLv3 on github.com/lh3/bwa Highly optimized, accurate aligner Implements SW extension in ksw extend2 function Includes: – BWA-backtrack [2] – BWA-SW [3] – BWA-MEM [4]

Sergiu Mosanu – BWA Smith-Waterman Extension on AWS EC2 F1 4 / 13

slide-5
SLIDE 5

Burrows-Wheeler Short-Read Aligner

Smith-Waterman (SW) Extension Iterative algorithm – Calculates scoring matrix H

ksw_extend2(query, target, s_mat, params) { // init H, E, F // ... for i in [0 to length(target)] // ... for j in [begin to end] H(i,j) = max{H(i-1,j-1)+S(i,j), E(i,j), F(i,j)} E(i+1,j) = max{H(i,j)-gapo, E(i,j)} - gape F(i,j+1) = max{H(i,j)-gapo, F(i,j)} - gape // ... } // update begin and end for the next round // ... } return max }

Figure: Code structure of ksw extend2 function

Sergiu Mosanu – BWA Smith-Waterman Extension on AWS EC2 F1 5 / 13

slide-6
SLIDE 6

Port of SW Extend to FPGA

Optimizations on ksw extend2 in SDAccel ksw extend2 kernel implemented in Xilinx SDAccel – Original code largely preserved Fixed query and target lengths to 256 symbols Similarity function implemented in logic Reduced variables from (u)int to (u)short Changed few variable declarations local to loop – Loop-carry dependency set to false with HLS pragmas Reduced BRAM accesses by storing previous iteration values Pipelined all but loop-i with HLS pragmas Achieved functional correctness

Sergiu Mosanu – BWA Smith-Waterman Extension on AWS EC2 F1 6 / 13

slide-7
SLIDE 7

Port of SW Extend to FPGA

Utilization and Performance Results Max frequency of 330MHz, well above 250MHz Average kernel execution time: 0.17ms Host chrono results: FPGA 333ms vs 54ms CPU – CPU matched with 6 ksw extend2 parallel instances on FPGA Min 80 ksw extend2 instances to fit on single FPGA

LUT LUTMem REG BRAM DSP User Budget 890.6k 552.1k 1985k 1615 6828 ksw ext2 6407 1550 11k 21 1 (< 1%) (≈1.3%)

Table: FPGA utilization with 1 BWA ksw extend2 instance

Sergiu Mosanu – BWA Smith-Waterman Extension on AWS EC2 F1 7 / 13

slide-8
SLIDE 8

Proposed Single-FPGA Multi-Threaded architecture

CPU FPGA Queue Manager

PCIe Gen3 x16 w/DMA

ksw_extend2 (0) ksw_extend2 (1) ksw_extend2 (2) ksw_extend2 (3) ksw_extend2 (n)

Threaded BWA

Figure: Multi-threaded single-FPGA architecture

Sergiu Mosanu – BWA Smith-Waterman Extension on AWS EC2 F1 8 / 13

slide-9
SLIDE 9

Proposed Cross-FPGA Multi-Threaded architecture

FPGA 8 Queue Manager ksw_extend2 (0) ksw_extend2 (1) ksw_extend2 (2) ksw_extend2 (3) ksw_extend2 (n) FPGA 8 Queue Manager ksw_extend2 (0) ksw_extend2 (1) ksw_extend2 (2) ksw_extend2 (3) ksw_extend2 (n) Queue Manager ksw_extend2 (0) ksw_extend2 (1) ksw_extend2 (2) ksw_extend2 (3) ksw_extend2 (n) Queue Manager ksw_extend2 (0) ksw_extend2 (1) ksw_extend2 (2) ksw_extend2 (3) ksw_extend2 (n) FPGA 2 Queue Manager ksw_extend2 (0) ksw_extend2 (1) ksw_extend2 (2) ksw_extend2 (3) ksw_extend2 (n) FPGA 2 Queue Manager ksw_extend2 (0) ksw_extend2 (1) ksw_extend2 (2) ksw_extend2 (3) ksw_extend2 (n) CPU FPGA 1 Queue Manager

PCIe Gen3 x16 w/DMA

ksw_extend2 (0) ksw_extend2 (1) ksw_extend2 (2) ksw_extend2 (3) ksw_extend2 (n)

Threaded BWA

Figure: Multi-threaded cross-FPGA architecture

Sergiu Mosanu – BWA Smith-Waterman Extension on AWS EC2 F1 9 / 13

slide-10
SLIDE 10

Estimated Benefits

Lighter SW Extend step ≈13x speedup for 80 BWA ksw extend2 instances on F1 2xLarge machine (single FPGA) ≈100x speedup for cross-FPGA multi-threaded architecture on F1 16xLarge machine (8 FPGAs) – Both result in ≈4x cost saving compared with equivalent EC2

machines with no accelerators

Sergiu Mosanu – BWA Smith-Waterman Extension on AWS EC2 F1 10 / 13

slide-11
SLIDE 11

Conclusion and future work

AWS EC2 F1 is a promising platform for bioinformatics SW Extension on FPGA with SDAccel Further optimize BWA ksw extend2 Complete multi-threaded architectures Integrate with rest of BWA and benchmark

Sergiu Mosanu – BWA Smith-Waterman Extension on AWS EC2 F1 11 / 13

slide-12
SLIDE 12

Conclusion and future work

AWS EC2 F1 is a promising platform for bioinformatics SW Extension on FPGA with SDAccel Further optimize BWA ksw extend2 Complete multi-threaded architectures Integrate with rest of BWA and benchmark

Thank you!

Code available at: github.com/hplp/BWA_HLS

Sergiu Mosanu – BWA Smith-Waterman Extension on AWS EC2 F1 12 / 13

slide-13
SLIDE 13

References

  • A. Pizarro, C. Whalley, “Architecting for Genomic Data Security and Compliance in AWS”, Amazon Web Services,

December 2014.

  • H. Li and R. Durbin, “Fast and accurate short read alignment with Burrows-Wheeler transform”, Bioinformatics, 2009
  • H. Li and R. Durbin, “Fast and accurate long-read alignment with Burrows-Wheeler transform”, Bioinformatics, 2010
  • H. Li, “Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM”, arXiv:1303.3997v2, 2013

Sergiu Mosanu – BWA Smith-Waterman Extension on AWS EC2 F1 13 / 13