Fast dynamic and partial reconfiguration Data Path with low Hardware - - PowerPoint PPT Presentation

fast dynamic and partial reconfiguration data path with
SMART_READER_LITE
LIVE PREVIEW

Fast dynamic and partial reconfiguration Data Path with low Hardware - - PowerPoint PPT Presentation

Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs with low Hardware overhead on Xilinx FPGAs Michael Hbner 1 , Diana Ghringer 2 , Juanjo Noguera 3 , Jrgen Becker 1 1 Karlsruhe Institute of


slide-1
SLIDE 1

Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs with low Hardware overhead on Xilinx FPGAs

Michael Hübner1, Diana Göhringer2, Juanjo Noguera3, Jürgen Becker1

1 K

l h I tit t f T h l (KIT) G

Institut für Technik der Informationsverarbeitung (ITIV)

1 Karlsruhe Institute of Technology (KIT), Germany 2 Fraunhofer IOSB, Germany 3 Xilinx Inc., Dublin

Institut für Technik der Informationsverarbeitung (ITIV)

KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association

www.kit.edu

slide-2
SLIDE 2

Outline Introduction and motivation Related work Concept of Fast Simplex Link (FSL) internal configuration t (ICAP) access port (ICAP) Realization and results C l i d f t k Conclusion and future work

Institut für Technik der Informationsverarbeitung (ITIV) Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs 2 4/18/2010

slide-3
SLIDE 3

Introduction and motivation Dynamic and partial reconfiguration: “parts of a configuration can be substituted while other parts stay

  • perative without any disturbance”

Spatial and temporal partitioning exploitation to increase f d t d ti performance and to reduce power consumption In a processor based design (MicroBlaze), the configuration access port is one of the “devices” on the configuration access port is one of the devices on the OPB or PLB bus Why is it not a part of the processor’s microarchitecture? Why is it not a part of the processor s microarchitecture?

T l spatial parallelism Temporal parallelism

Institut für Technik der Informationsverarbeitung (ITIV) 3 4/18/2010

FPGA

Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs

slide-4
SLIDE 4

Traditional usage of the ICAP: 1. Dynamic Reconfiguration

ICAP was traditionally used for run-time adaptive systems: Loading partial bitreams from external memory transfer

g y g

Loading partial bitreams from external memory, transfer to ICAP

MicroBlaze/

User-IP FPGA

MicroBlaze/

User-IP FPGA

MicroBlaze/ PowerPC

User IP Interface

MicroBlaze/ PowerPC

User IP Interface UART

OPB-B

to PC

Module A UART

OPB-B

to PC

Module A

Flash- Controller External Flash Memory Bus Module B Module C Flash- Controller External Flash Memory Bus Module B Module C

Partial Module

31bit for Virtex 5 and Virtex 6

HWIcap

Module D

HWIcap

Module D and Virtex 6 16bit for Spartan 6

Institut für Technik der Informationsverarbeitung (ITIV)

  • Prof. Max Mustermann – Fakultät für Musterwissenschaften:

Präsentationstitel 4 4/18/2010

ICAP Module E ICAP Module E

slide-5
SLIDE 5

Traditional usage of the ICAP (cont) : Data transfer through read- and writeback ICAP was used to transfer data from one BRAM to another Reduction of signal line utilization novel degree of freedom g Reduction of signal line utilization, novel degree of freedom

Test application: Slide show on Virtex 2 Example with 7 encapsulated modules pp VGA core has no connection via signal line to PPC

This example: Sander et. Al.: „ Data Reallocation by Exploiting FPGA Configuration Mechanisms”, RAW 2008, April V i t i

Institut für Technik der Informationsverarbeitung (ITIV)

Very nice extension: Shelbourne et. Al.: “MetaWire: Using FPGA Configuration Circuitry to Emulate a Network-on-Chip“, FPL 2008, September „

Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs

slide-6
SLIDE 6

ICAP is more than a configuration port… ICAP can be used in different modes: g p

Access port to the reconfigurable logic, consuming configuration data Access port to the reconfigurable logic producing configuration Access port to the reconfigurable logic, producing configuration data (e.g. Readback of configuration data for safety reasons (bit flips etc.) Access port to processing elements already configured on the FPGA, consuming (write mode) data to be processed Access port to processing elements already configured on the

MicroBlaze/ PowerPC

User-IP Interface FPGA

MicroBlaze/ PowerPC

User-IP Interface FPGA

Access port to processing elements already configured on the FPGA, producing (read mode) data which were processed

PowerPC

HWIcap

Flash- Controller

Interface UART

External Flash Memory OPB-Bus to PC

Module A

Module B Module C M d l D

PowerPC

HWIcap

Flash- Controller

Interface UART

External Flash Memory OPB-Bus to PC

Module A

Module B Module C M d l D

In general two modes of operation:

  • 1. for hardware reconfiguration purposes
  • 2. for data transfer purposes

Institut für Technik der Informationsverarbeitung (ITIV)

p ICAP

Module D

Module E p ICAP

Module D

Module E

6 4/18/2010 Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs

slide-7
SLIDE 7

Realization alternatives for processor – ICAP connection p

Several approaches exist where a processor triggers the reconfiguration of an accelerator reconfiguration of an accelerator

FPGA

Dual bus version (newest version)

Processor (PPC or MicroBlaze)

P

Memory Controller External Memory e.g. Flash HW-ICAP Xilinx

FPGA (Xilinx Virtex 4)

Dual bus version (newest version) Efficient usage for MicroBlaze

MicroBlaze)

LB or OPB

Xilinx

ICAP

Processor (PPC405 or MicroBlaze)

PLB Memory Controller (MPMC) External Memory e.g. Flash PLB

  • r

XCL Other Peripheral Devices

B

Other Peripheral Devices

MicroBlaze)

HW-ICAP Xilinx

Si l b i bl DMA t f

ICAP

Single bus version enables DMA transfer for bistreams to the ICAP

Numerous previous work e g :

Institut für Technik der Informationsverarbeitung (ITIV)

  • Prof. Max Mustermann – Fakultät für Musterwissenschaften:

Präsentationstitel 7 4/18/2010

Numerous previous work, e.g.: Blodget et. Al.:“A Lightweight Approach for Embedded Reconfiguration of FPGAs“, DATE 2003 Claus et. Al.: „A multi-platform controller allowing for maximum dynamic partial reconfiguration throughput“, FPL 2008

slide-8
SLIDE 8

Novel exploitation possibilities of the ICAP in adaptive microprocessor architectures: The i-Core

Lets assume ICAP is integrated into the processor pipeline

p

That would mean:

  • Processor commands are reserved for

the ICAP: R fi i d Reconfiguration mode:

  • ICAP write config.
  • ICAP read config.

D t t f d

ICAP

Data transfer mode:

  • ICAP write process data
  • ICAP read process data
  • The ICAP is included directly into the

data path of the processor lowest delay for data transfer

Extended version based on the picture used by

lowest delay for data transfer see ICAP from „the software point

  • f view“ and write simple programs

for accessing it

Institut für Technik der Informationsverarbeitung (ITIV) 8 4/18/2010

Extended version based on the picture used by

  • Prof. Lizy Kurian John, Univ. Austin, Texas

for accessing it

Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs

slide-9
SLIDE 9

Exploitation of the novel concept p p

Pipeline with Configuration Port

The novel concept increases the flexibility of a FPGA based processor

Configuration Port

tremendously

The ICAP as data sink and source can be seen as a multipurpose ALU

ICAP

Legend:

p p From the user (programmer) point of view the hardware complexity is hidden through the provided libraries

  • IF:

Instruction fetch

  • ID:

Instruction decode

  • EX:

Execute

  • MEM:

Memory access

  • WB:

Writeback

p Accessible with standard C construction Further hardware abstraction which definitely will increase the acceptance of run-time adaptive hardware increase the acceptance of run-time adaptive hardware

Institut für Technik der Informationsverarbeitung (ITIV) 9 4/18/2010 Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs

slide-10
SLIDE 10

Exploitation of the novel concept (cont) p p ( )

The novel concept enables the run-time adaptation of the processors microarchitecture:

Realized instruction (within the ISA) be reconfigured at runtime and realizes therefore a dynamic reconfigurable instruction set processor In general: An adaptive microarchitecture is possible: g p p

  • Power and energy reduction via pipeline balancing
  • Using ipc (instruction per cycles) variation reduce power consumption
  • Dynamic instruction level parallelism pipeline adaptation

y p p p p

  • Adaptive issue queue for reduced power at high performance

(Please see in the our paper the references, they did not use this novel approach!) Decentralized processor approach: ICAP connects cores on any position of the chip

Novel quality of processors: The i-Core provides the run-time adaptation of the microarchitecture

Pipeline with Configuration Port

An example from a real experiment: adaptation of pipeline from 5 to 3 stages reduction of 90mW power consumption!

ICAP

Institut für Technik der Informationsverarbeitung (ITIV)

reduction of 90mW power consumption!

(Publication under review: ReCoSoC 2010)

10 4/18/2010

Legend:

  • IF:

Instruction fetch

  • ID:

Instruction decode

  • EX:

Execute

  • MEM:

Memory access

  • WB:

Writeback

Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs

slide-11
SLIDE 11

(One of the) First steps to the i-Core: FSL-ICAP Hardware view

Connecting the ICAP as near as possible to the processor core of MicroBlaze: the FSL connection provides a connection with the latency

  • f only one clock cycle

Concept: Transfer the configuration as well as data to be processed by

Memory External

FPGA (Xilinx Virtex 4)

Other P i h l

as well as data to be processed by IP cores through the processor. Simple programming model, no hardware knowledge required

Processor (PPC405 or MicroBlaze)

PLB Memory Controller (MPMC) External Memory e.g. Flash PLB

  • r

XCL Peripheral Devices

no hardware knowledge required. Approach does not target highest perfromance in data throughput It

FSL Controller (FSM) Other Peripheral Devices

perfromance in data throughput. It focuses to embedd the ICAP into C world. But side effect: 2-3x speedup

ICAP

But side effect: 2 3x speedup in comparison to XPS ICAP (through reduction of required clock cycles)

Institut für Technik der Informationsverarbeitung (ITIV) 11 4/18/2010

y )

Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs

slide-12
SLIDE 12

(One of the) First steps to the i-Core: FSL-ICAP Software view

Accessible ICAP with reserved commands (e.g. CPUTFSL and PUTFSL) out of C code: Hiding the hardware complexity increases development efficiency

/* Function for Bitstream transfer from external memory to FSL- ICAP */

Bit 28 31 Mode Description nds

y 1 int loadBitfromSRAM(Xuint32 baseaddr, Xuint32 size) 2 { 3 Xuint32 conf_word, size4, output_0, i; 4 5 /*point to Addr in Ext Mem */

28..31 p 0001 Reset Back to Reset state 0010 ICAP Status Send status of ICAP to processor 0100 ICAP write Write configuration data CAP comman

5 / point to Addr in Ext. Mem / 6 Xuint32 *pointword = (Xuint32*) baseaddr; 7 8 size4 = size >> 2; // Get size in words 9 output_0 = (size4<<16) | command; 10

Idle

data 1000 ICAP read Readback data from configuration memory FSL IC

10 11 /* write bitstream to FSL_HW_ICAP */ 12 cputfsl(output_0, 1); //bit 0 to 15 = size, bit 28 to 31 = command 13

Read Control Word Bit 29 = 1; Bit 0-15 = size Bit 28 = 1; Bit 0-15 = size Control = 1

14 for (i = 0; i < size4; i++) 15 {/* write memory content to FSL */ 16 conf_word = pointword[i]; 17 putfsl(conf_word, 1);} 18 return 0;}

ICAP Read ICAP Write Bit 0 15 size Bit 0 15 size Size = 0 Size = 0

18 return 0;}

FSM for ICAP control purposes

Sample code for reconfiguration access Similar for data transfer mode

Done

Institut für Technik der Informationsverarbeitung (ITIV) 12 4/18/2010

realized within the FSL ICAP controller

Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs

slide-13
SLIDE 13

Implementation results: Utilization and performance p

We compared the results with already exisiting ICAP hardware drivers (see references in the paper) Goal was not to gain in performance the benefit of a “new ICAP thinking” Goal was not to gain in performance, the benefit of a new ICAP thinking is in the foreground

Technical Data FSL-ICAP XPS ICAP [6] ICAP [8] Utilized Slices 78 + 44 (for 2 FSLs) 2868 637 Utilized BRAM 2 FPGA Board Xilinx ML405 Xilinx ML405 Xilinx ML410 External Memory DDR SDRAM (32- bit interface) DDR SDRAM (32-bit interface) DDR2 SDRAM (64-bit interface) Bl Bl Processor µBlaze, PPC µBlaze, PPC PPC Throughput with µBlaze (MB t / ) 25,89 12,79 n/a (MByte/s) Throughput with PPC (MByte/s) 28,28 8,60 295,4

Institut für Technik der Informationsverarbeitung (ITIV) 13 4/18/2010

FSL ICAP is easy to use and comparably fast

Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs

slide-14
SLIDE 14

Conclusion and future work The ICAP can be used in different modes of operation f It depends from the perspective, what impact these modes have to a system’s realization:

The ICAP in the processors data path enables new degrees of The ICAP in the processors data path, enables new degrees of freedom for the adaptivity of the processor while run time Previous work shows, what optimizations of a processor’s microarchitecture enable in terms of reduced power consumption and increased perfomance The reconfigurable Application-specific instruction-set processor The reconfigurable Application specific instruction set processor (ASIP) enables to provide a “multipurpose” but “application tailored” processor core W ill ll thi h i C ( l t d t th l t t lt i We will call this approach i-Core (related to the latest results in programming paradigm: Invasic Computing (see reference in the paper)

Institut für Technik der Informationsverarbeitung (ITIV) 14 4/18/2010 Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs

slide-15
SLIDE 15

Conclusion and future work con’t

One of the first steps to the i-Core was the FSL ICAP FSL ICAP targets the idea of hiding hardware complexity from the FSL ICAP targets the idea of hiding hardware complexity from the developer

Simple C libraries enable the access to ICAP as sink and source for fi ti d t ll f d t t b d b IP configuration data as well as for data to be processed by IP cores The IP has a very low footprint in comparison to other solutions FSL ICAP can easily be adapted to other Xilinx FPGAs: y p e.g. we adapted the FSL ICAP quickly and successfully to the requirements of the Virtex 5 and Spartan 6 FPGAs for our demonstrators

Next steps are the exploration of the possible adaptation mechanisms Next steps are the exploration of the possible adaptation mechanisms related to the processor microarchitecture (e.g. manipulation of the pipeline width and depth…etc.) A further step is to use the processor at the FSL ICAP to run an A further step is to use the processor at the FSL ICAP to run an “intelligent” ICAP related OS (don’t miss talk of Mrs. Göhringer later ☺)

Institut für Technik der Informationsverarbeitung (ITIV) 15 4/18/2010 Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs

slide-16
SLIDE 16

Thanks a lot for your attention! y I hope that you are interested in this work. Please share your ideas with me Contact: Dr.-Ing. Michael Hübner Karlsruhe Institute of Technology (KIT) Institut für Technik der Informationsverarbeitung (ITIV) Email: michael.huebner@kit.edu Skype: huebner_michael

Institut für Technik der Informationsverarbeitung (ITIV) 16 4/18/2010 Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs