[PPT] - Open Source Virtual Platforms for SW Prototyping on FPGA Mark Burton PowerPoint Presentation

SLIDE 1

Enabling System Level Design

1

Open Source Virtual Platforms for SW Prototyping on FPGA

Mark Burton

F1

SLIDE 2

Enabling System Level Design

2

Deep Learning Accelerator

Nvidia has a Deep Learning Accelerator (called NVDLA)
Nvidia also has a ‘c’ model of the DLA architecture (could be used as a systemc/tlm model)

The NVIDIA Deep Learning Accelerator (NVDLA) is a free and

pen architecture that promotes a standard way to design deep

learning inference accelerators. With its modular architecture, NVDLA is scalable, highly configurable, and designed to simplify integration and portability. The hardware supports a wide range

f IoT devices. Delivered as an open source project under the

NVIDIA Open NVDLA License, all of the software, hardware, and documentation will be available on GitHub. Contributions are welcome

SLIDE 3

Enabling System Level Design

3

Turing Lecture 2017 : Hennessey and Patterson

https://www.youtube.com/watch?v=3LVeEjsn8Ts

SLIDE 4

Enabling System Level Design

4

Goals

Bring HW and SW together
Minimize time to re-spin
(change in HW/change in SW)
Enable simulation to be used by anybody
Make it easy and quick to use
Make the simulation FAST
Enable S/W development

SLIDE 5

Enabling System Level Design

5

Virtualization

Emulation Virtual Platform Virtualization (Para-)Virtualization Hardware

Algorithm execution Or full system virtualization

Application O/S Virtual platform

(model)

‘real binary’

Full binary execution

n virtual

platform (model)

Application O/S FPGA

Full binary execution

n REAL

platform (FPGA)

Application O/S Hardware

Full binary execution

n

Final Hardware

SLIDE 6

Enabling System Level Design

6

Virtualization

Emulation Virtual Platform Virtualization (Para-)Virtualization Hardware

Algorithm execution Or full system virtualization

Application O/S Virtual platform

(model)

‘real binary’

Full binary execution

n virtual

platform (model)

Application O/S FPGA

Full binary execution

n REAL

platform (FPGA)

Application O/S Hardware

Full binary execution

n

Final Hardware

SLIDE 7

Enabling System Level Design

7

Open Source SystemC Standard

Virtual Platform Standard is SystemC TLM-2.0 IEEE 1666

Open Source Simulator available for download from Accellera.org

Corporate members 2016

GreenSocs technology at the heart of TLM-2.0 standard.
All GreenSocs interfaces use TLM-2.0
GreenSocs helping Accellera forge a new Model to tool standard.
Preview available in GreenConfig.
Our solutions are tool independent, and work with all vendors.

SLIDE 8

Enabling System Level Design

8

Qemu: Our Preferred source of CPU models

Qemu is the defacto standard Virtualizer.
Free and Open Source.
It is over 10 years old
GreenSocs is a key contributor:

Reverse execution and Multi-Core TCG Kernel.

Regular committers from many organizations

18 1100 43000 1000 989,863

Architectures CPU’s Commits Contributors Lines of code

…

SLIDE 9

Enabling System Level Design

9

Existing Model database overview:

X86 ARM MIPS Alpha PowerPC SPARC Micro- blaze Cold- fire Cris SH4 Xtensa

Fast SW dev model (LT)

✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

Cycle Accurate HW dev model (AT)

✔ ✔ ✔ ✔

CPU Family coverage:

Full list (of several hundred) available on GreenSocs.com

SLIDE 10

Enabling System Level Design

10

QBox

Wraps up Qemu in a TLM2-0 API such that it can be used in

standard SystemC

QEMU is a generic and open source virtualizer – it covers

almost all CPU architectures and achieves extremely high performance.

SystemC QBox (qemu) TLM

QBox

SLIDE 11

Enabling System Level Design

11

Qbox Syncronisation options

Real Time
Each simulator runs as close to real time as possible.
Can be simple “run as fast as you can”, no sync.
Windowed
Each simulator is allowed to run within a window, but if it

reaches the end, it must stop and wait

The window will automatically extend as simulators run.
(Windowed ‘behind’ to keep SystemC behind and the tlm

delta time positive)

Deterministic/single threaded
Each simulator runs in turn.
Pseudo random ordering to ‘catch’ S/W bugs.
(The advantage of a model…)

SLIDE 12

Enabling System Level Design

12

Extending Qemu for Zynq

Clock framework

Enable the correct timing for events across the full Zynq device.

Large packet DMA framework

Significantly increase the speed of DMA activity in the simulated

device.

Fault Injection

Model fault injection in a convenient and scriptable way, to enable

safety and test features to be validated.

Safety and Test Library extensions to devices

Model the suite of devices in the Zynq that can be self tested.

GreenSocs is the partner upstreaming their device models

SLIDE 13

Enabling System Level Design

13

Extending Qemu Speed MULTI Thread Qemu

A massive speed improvement for Qemu to take

advantage of multi-core hosts 1 10 20 30 40 50 1 VCPUs 2 VCPUs 4 VCPUs 1 VCPUs 2 VCPUs 4 VCPUs Upstream MTTCG 1 2 4

SLIDE 14

Enabling System Level Design

14

Advanced features

NON-Deterministic Reverse Execution
Ability to debug from an error backwards,

irrespective of input stimulus

Supporting
No H/W required, No ‘JTAG collector’ limit.
Cache modeling
Cache Coherency performance estimation
Cache flushing S/W checking

SLIDE 15

Enabling System Level Design

15

What’s OpenVP

User Application and user level

device code

Kernel and kernel modules
Virtual Platform model,
Based on QEMU and SystemC
‘C’ model for NVDLA device itself

Guest User Space Guest Kernel Space Host User Space

CPU Cluster Model NVDLA FPGA Wrapper Model NVDLA Cmodel Mem Model KMD Applications UMD HW Tests FPGA Parser AWS Driver NVDLA device QEMU with TLM2C

SLIDE 16

Enabling System Level Design

16

Problem

Simulation speed… the NVDLA – Accelerator – is modelled on

the host, so it will not ‘accelerate’.

Changes to the core NVDLA architecture require changes to

the model.

SLIDE 17

Enabling System Level Design

17

Adding FPGA

User Application and user level

device code

Kernel and kernel modules
Virtual Platform model, with FPGA

wrapper

AWS framework
NVDLA FPGA hardware module
Runs at full speed!

Guest User Space Guest Kernel Space Host User Space Host Kernel AWS HW and FPGA

CPU Cluster Model NVDLA FPGA Wrapper Model NVDLA Cmodel Mem Model KMD Applications UMD AWS Kernel Driver AWS Shell NVDLA FPGA Transactor FPGA DRAM HW Tests FPGA Parser AWS Driver NVDLA device QEMU with TLM2C

SLIDE 18

Enabling System Level Design

18

SPEED

SW on NVDLA C-Model
Anybody can download packaged Docker release
Configurable – build time ½ hour.
FAST TO SET UP.
SW on FPGA with NVDLA RTL
Anybody can run AWS env with pre-packages AMI and AFI
With AWS setup, easy to alter both FPGA images and associated
drivers. (e.g. less than a day).
FAST TO RUN.

Both available from nvdla.org

SLIDE 19

Enabling System Level Design

19

All the components…

QEMU TLM2C Mem Model DLA Cmodel

SLIDE 20

Enabling System Level Design

20

HW Test on FPGA

HW Description FPGA Parser FPGA Driver AWS SDK AWS FPGA

Why we need HW tests on FPGA

To guarantee the quality of FPGA release To identify corner case and issues in RTL

SLIDE 21

Enabling System Level Design

21

Full S/W stack

Based on SW on Cmodel
Replace all Cmodels (NVDLA, Mem model) with FPGA

wrapper

Full user code executable on combined QEMU + FPGA model

UMD KMD QEMU FPGA Wrapper AWS SDK AWS HDK AWS FPGA

SLIDE 22

Enabling System Level Design

22

Generalisation

Making this ‘generally’ applicable

requires more work L

Enable any architecture to be

modeled in a ‘cloud’ (public/ private), off-loading onto FPGA when required/appropriate.

Enable ‘Virtulization’ when host/

guest match.

SLIDE 23

Enabling System Level Design

23

Future Possibilities

NVDLA Performance Model integration for Performance

evaluation

More AWS FPGA images release for different NVDLA

configuration

Enable RISCV in Virtual Platform
ARM Project Trillium
SiFive

SLIDE 24

Enabling System Level Design

24

More information:

www.greensocs.com mark@greensocs.com NVDLA page http://nvdla.org/ OpenVP Doc http://nvdla.org/contents.html OpenVP Github page https://github.com/nvdla/vp https://github.com/nvdla/vp_awsfpga