Introduction to Metal FS and FPGA Programming Hands-On Robert Schmid - - PowerPoint PPT Presentation
Introduction to Metal FS and FPGA Programming Hands-On Robert Schmid - - PowerPoint PPT Presentation
Introduction to Metal FS and FPGA Programming Hands-On Robert Schmid , Max Plauth, Sven Khler, Lukas Wenzel and Andreas Polze Operating Systems and Middleware Group 19.06.2019 Interest in FPGAs is growing (again) Logic Blocks Programmable F
■
Field-Programmable Gate Array: programmable hardware circuit
■
Algorithms are represented as a hardware configuration
■
Reasons for using FPGAs
□
Energy efficiency
□
Parallel and pipelined data processing
□
‘Computing wires’
■
Technology Advancements
□
New Generation of Interconnects (OpenCAPI, CCIX, ...)
□
High-Level Synthesis (HLS) languages
■
‘Accelerators become first-class citizens in the system’
19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 2
Interest in FPGAs is growing (again)
Programmable Interconnect Logic Blocks IO Blocks RAM/ALU/... Blocks
■
How should end-users interact with FPGAs?
■
Just like with any other executable program!
■
Analogy: Builtin UNIX tools (cat, grep, sed, awk, …)
□
Do one thing, and do it well!
19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 3
First-class citizens?
UNIX Pipe Redirect Standard Output to File
$ echo "Hello World" | fpga-encrypt –k key.bin > encrypted_file.bin
‘Operator’
■
Goal: Improve the accessibility of FPGA accelerators using a file system abstraction
Foundations
19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 4
Metal FS
IBM POWER CAPI + SNAP Xilinx Vivado
19.06.2019 Robert Schmid Chart 5
Operators are specified in Vivado HLS
void my_metal_operator(mtl_stream & in, mtl_stream & out, snapu64_t offset) { mtl_stream_element element; do { element = in.read(); element.data += offset;
- ut.write(element);
} while (!element.last); }
Operator
Configuration Input Stream Output Stream
ParProg 2019 Metal FS
■
What happens here?
■
In between FPGA processing steps, data should not be copied to the CPU’s main memory (slow)
■
In conclusion:
□
Multiple operators must be deployed on the FPGA at once
□
Active subset and order of Operators should be configurable at runtime
19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 6
Chaining Operators
$ cat encrypted_file.bin | fpga-decrypt | fpga-uppercase HELLO WORLD
■
Streaming Data from different types of memory
19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 7
Metal FS Operator Pipelines
■
Composition of Pipelines by using AXI Stream Switch
■
C++ API
Stream Switch Blowfish Encrypt Blowfish Decrypt Change Case Host Memory Non- volatile Memory No-op
OperatorRegistry registry; auto encrypt = registry.operators().at("encrypt"); encrypt->setOption("key", keyBuffer); auto dataSource = create_data_source(inputBuffer); auto dataSink = create_data_sink(outputBuffer); PipelineDefinition pipeline ({ dataSource, encrypt, dataSink }); pipeline.run();
ap_clk ap_rst_n s_axi_ctrl_reg axi_metal_cpc AXI Protocol Converter S_AXI M_AXI aclk aresetn axi_metal_ctrl_crossbar AXI Crossbar S00_AXI M00_AXI M01_AXI M02_AXI M03_AXI M04_AXI aclk aresetn
- p_colorfilter
Hls_operator_colorfilter (Pre-Production) s_axi_control axis_input axis_output ap_clk ap_rst_n interrupt axi_datamover_mm2s AXI DataMover M_AXI_MM2S S_AXIS_MM2S_CMD M_AXIS_MM2S_STS M_AXIS_MM2S m_axi_mm2s_aclk m_axi_mm2s_aresetn mm2s_err m_axis_mm2s_cmdsts_aclk m_axis_mm2s_cmdsts_aresetn hls_streamgen Hls_streamgen (Pre-Production) s_axi_ctrl
- ut_r
- ne
Constant dout[0:0] dm_smartconnect AXI SmartConnect S00_AXI S01_AXI M00_AXI aclk aresetn snap_action Hls_action (Pre-Production) s_axi_ctrl_reg m_axi_host_mem mm2s_cmd_V_V mm2s_sts s2mm_cmd_V_V s2mm_sts m_axi_metal_ctrl_V interrupt_reg_V_V interrupt_reg_V_V_TVALID interrupt_reg_V_V_TDATA[7:0] ap_clk ap_rst_n interrupt axi_host_mem_crossbar AXI Crossbar S00_AXI M00_AXI S01_AXI aclk aresetn m_axi_host_mem interrupt
Composition of Hardware Components: Block Design
19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 8
Foundations
19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 9
Metal FS: Architecture Overview
CPU (‘Host’) FPGA Operator Pipelines Operator Pipelines IBM POWER CAPI + SNAP Xilinx Vivado
Operator Pipelines
AXI Stream Switch C++ API
■
Leverage the NVMe storage on the Nallatech N250S FPGA card
■
One use case for Operator Pipelines
■
File System Metadata is maintained in an LMDB Key-Value Store on the host
□
inodes, directory entries, free extents
■
Block Mapper on the FPGA translates file offsets to physical addresses using extent lists
■
All file accesses are implemented as Operator Pipelines (Read -> Write)
□
Data transformations can be transparently added (e.g. encryption)
19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 10
Metal FS Hybrid Filesystem
Foundations
19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 11
Metal FS: Architecture Overview
CPU (‘Host’) FPGA Operator Pipelines Operator Pipelines IBM POWER CAPI + SNAP Xilinx Vivado
Operator Pipelines
AXI Stream Switch C++ API
Hybrid File System
Data Sources and Sinks Block Mapper Filesystem Metadata Store
■
Users can mount Metal FS as a Linux file system
■
Implemented in user space
■
Example:
19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 12
Metal FS FUSE Filesystem
$ cp ~/orders.tbl /metal_fs/files/
19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 13
Metal FS Symbolic Executables
File System Driver & Pipeline Orchestrator Process change_case encrypt decrypt
Message flow via UNIX Socket Data flow via Memory-Mapped Files $ echo "Hello World" \ | /metal_fs/operators/change_case \ | /metal_fs/operators/encrypt \ | /metal_fs/operators/decrypt
Foundations
19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 14
Metal FS: Architecture Overview
CPU (‘Host’) FPGA Operator Pipelines Operator Pipelines IBM POWER CAPI + SNAP Xilinx Vivado
Operator Pipelines
AXI Stream Switch C++ API
Hybrid File System
Data Sources and Sinks Block Mapper Filesystem Metadata Store
User Interface & Instrumentation
Linux Filesystem Driver Symbolic Executables AXI Performance Monitor
Demo
19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 16
Demo Screencast
■
Steps:
1.
git clone https://github.com/rs22/metalfs-workshop
2.
Start the development container by using the start script:
–
start_linux, start_osx, start_win.bat
3.
Build a simulation image
–
make model
19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 17
Hands-On: Prepare the Metal FS Simulation
■
HLS translates the mtl_stream references into AXI Stream interfaces
■
We require the keep and last signals which are optional channels in the AXI Stream protocol
19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 18
Anatomy of an Operator in Vivado HLS
struct mtl_stream_element { ap_uint<64> data; ap_uint<8> keep; ap_uint<1> last; }; void my_operator(mtl_stream &in, mtl_stream &out) { mtl_stream_element element; do { element = in.read();
- ut.write(element);
} while (!element.last); }
■
HLS offers the ap_uint types for integers with arbitrary bit precision
■
snapu{8, 16, 32, 64}_t are typedefs for ap_uint<>
■
Access bit ranges of an ap_uint like this (similar to VHDL):
■
Concatenate integers:
19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 19
Programming with HLS: Arbitrary-Precision integers
snapu16_t my_integer; snapu8_t high_byte = my_integer(15, 8); snapu8_t high_byte = 0xFF; snapu8_t low_byte = 0x0A; snapu16_t both_bytes = (high_byte, low_byte);
■
Steps:
1.
Build a new model
–
make model
2.
Start the simulation
–
make sim
3.
In the simulation window:
–
snap_maint
–
metal_fs /mnt
4.
Start a second shell in the container using the start script
–
cat src/hls_operator_colorfilter/apples_simulation.bmp \ | /mnt/operators/colorfilter \ > out.bmp
19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 20
Hands-On: Run the Metal FS Simulation
■
Goal: Implement an operator that processes a bitmap image and converts it to grayscale, except for pixels where red is the dominant color
■
Operator is prepared in src/hls_operator_colorfilter/hls_operator_colorfilter.cpp
■
Bitmap header is not aligned to a stream word boundary
□
The template temporarily inserts a padding to make processing easier
■
Task 1: Exclude the bitmap header data from being transformed
■
Task 2: Leave those pixels unmodified where red is the dominant color
■
The operator code can be compiled into software, useful for testing
□
src/hls_operator_colorfilter $ make test
19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 21
Hands-On: Implement a Grayscale Filter operator
■
Try out your implementation in the simulation environment
■
Profiling results look like this:
19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 22
Hands-On: Simulating the Operator Implementation
Shell 1: $ make model $ make sim Simulation Shell: $ snap_maint $ metal_fs /mnt Shell 2: # -p enables profiling $ cat src/hls_operator_colorfilter/apples_simulation.bmp \ | /mnt/operators/colorfilter -p \ > out.bmp
STREAM BYTES TRANSFERRED ACTIVE CYCLES DATA WAIT CONSUMER WAIT TOTAL CYCLES MB/s input 6538 818 21% 634 16% 2439 63% 3897 419.43
- utput 6538 818 21% 3076 79% 0 0% 3897 419.43
Our operator limits the pipeline throughput
■
Open the xsim GUI
□
xsim –gui $SNAP_ROOT/hardware/sim/xsim/latest/top.wdb &
■
Only every four cycles a new stream element is processed
19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 23
Hands-On: Inspecting the Simulation Waveform
4 cycles read write
■
Open the colorfilter operator project in Vivado HLS
□
vivado_hls &
□
Project: src/hls_operator_colorfilter/hls_operator_colorfilter_sln_[…]
□
Switch to Analysis View
19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 24
Hands-On: Vivado HLS Analysis View
Inner processing loop has four steps and is not pipelined
■
Add a HLS PIPELINE pragma inside the do-while loop
■
New Performance Profile:
■
Profiling Results:
19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 25
Hands-On: Vivado HLS Pipeline pragma
STREAM BYTES TRANSFERRED ACTIVE CYCLES DATA WAIT CONSUMER WAIT TOTAL CYCLES MB/s input 6538 818 55% 668 45% 0 0% 1489 1097.72
- utput 6538 818 55% 668 45% 0 0% 1489 1097.72