Area and Time Tradeoffs in FPGAs Examining the concept of area/time - - PowerPoint PPT Presentation

area and time tradeoffs in fpgas
SMART_READER_LITE
LIVE PREVIEW

Area and Time Tradeoffs in FPGAs Examining the concept of area/time - - PowerPoint PPT Presentation

Area and Time Tradeoffs in FPGAs Examining the concept of area/time tradeoffs in FPGA design, pattern matching, and Advanced Encryption Standard (AES) Richie Ung Trang (z5061606) Harry Gougousidis (z5159917) Henry Veng (z5113239) Presentation


slide-1
SLIDE 1

Area and Time Tradeoffs in FPGAs

Richie Ung Trang (z5061606) Harry Gougousidis (z5159917) Henry Veng (z5113239)

Examining the concept of area/time tradeoffs in FPGA design, pattern matching, and Advanced Encryption Standard (AES)

slide-2
SLIDE 2

Presentation Overview

  • 1. Background
  • 3. AES
  • 6. Prototype
  • 5. Timeline
  • 4. Pattern Matching

2

  • 2. FPGA Design Tradeoffs
slide-3
SLIDE 3

Motivation – Diverse Set of Vendors and Boards

  • 1. Background

3

A market full of FPGAs with differing cost, performance, and consumption requirements. What are the circuit and architectural design attributes of an FPGA that trade

  • ff area and speed?

What are the magnitude of these tradeoffs?

slide-4
SLIDE 4

Motivation – Narrowing Gap with ASICs

  • 1. Background

4

Learning which attributes affect area/time performance will help FPGAs narrow the gap to ASICs in one area. ~35x ~14x ~1/3

slide-5
SLIDE 5

Measuring Time in FPGA Design

  • 1. Background

5

Simple Approach: Take a set of circuits that make up the critical paths in a collection of benchmark designs to create a performance metric.

slide-6
SLIDE 6

Measuring Time in FPGA Design

  • 1. Background

6

Model Approach: Use the shortest register to register path within the FPGA that contains all unique components. Use a weighted average based on the frequency each component is tested at during a critical path test.

slide-7
SLIDE 7

Measuring Area in FPGA Design - SRAM

  • 1. Background

7

The SRAM is the single most frequently repeated structure in the FPGA. Significant effort is therefore spent optimizing the layout of the 6 transistors that make up a single bit.

Transistor Model

slide-8
SLIDE 8

Measuring Area in FPGA Design – SRAM

  • 1. Background

8

Minimum Transistor Width Model

slide-9
SLIDE 9

Space/Time Tradeoff Results in FPGA Design

  • 1. Background

9

Some Comparative Results

slide-10
SLIDE 10

PATTERN MATCHING ON FPGA

By Henry Veng

10

slide-11
SLIDE 11

Background

11

Pattern Matching:

  • Process of finding a particular

substring (pattern) within a string

slide-12
SLIDE 12

Background

12

Uses:

  • Gene detection in DNA sequences
  • Network Intrusion Detection

Systems(IDS)

slide-13
SLIDE 13

Background

13

Network IDS Requirements:

  • Many patterns against one string
  • Support as many rules as possible
  • Real time
  • Internet speeds
slide-14
SLIDE 14

Design

14

Overview

  • Custom implementation of KMP
  • Pipelining within pattern matching

units

  • Linear array of pattern matching units
slide-15
SLIDE 15

Design

15

KMP Algorithms Characteristics:

  • Allows input stream to keep moving

forward

  • Good worst-case performance
slide-16
SLIDE 16

Design

16

Custom KMP Pattern Matching Units:

  • Two comparators and a buffer
  • Buffer of specific size
  • Allows one character per clock cycle

throughput

slide-17
SLIDE 17

Design

17

Pipelining within Units

  • Two patterns sharing the same

combinational circuit

  • Matching occurs out of phase
  • Allows a lower hardware per pattern

ratio

  • Allows an increase in clock speed

Pattern Memory Pattern Memory Combinational Circuit

slide-18
SLIDE 18

Design

18

Linear Array of Pattern Matching Units:

  • Input characters must pass through all

units

  • Different patterns loaded in different

units

  • Allows parallelisation
  • Units are quickly reconfigurable
slide-19
SLIDE 19

Area/Time Tradeoff

19

Metrics:

  • Time: Throughput (Mb/s, Gb/s, etc)
  • Area: Logic cells used
slide-20
SLIDE 20

Area/Time Tradeoff

20

slide-21
SLIDE 21

Area/Time Tradeoff

21

U/Crete:

  • Very high throughput (10.8Gb/s) but

high area cost (532 logic cells/32- char unit)

  • Achieved through hardwired

comparators and replicated 4 times

  • Supports ~100 rules and

reconfiguration is very slow

slide-22
SLIDE 22

Conclusion

22

Different design decisions can affect/the area time tradeoff

slide-23
SLIDE 23

By Harry Gougousidis

Advanced Encryption Standard

slide-24
SLIDE 24

FPGAs and Encryption

  • FPGAs allow more flexibility and

potential speedup than most hardware

  • ptions.
  • Maximising data throughput requires

balancing resource utilisation and time delays.

  • Encryption takes significant time to

process data and is often required

  • n devices with small resource pools.

24

slide-25
SLIDE 25

Introduction to AES

  • Cryptography is popular on

hardware due to relatively simple

  • perations in a highly parallelisable

way.

  • Fixed data block length, fixed

amount of transformations, single key for encryption and decryption.

  • Four transformations operations: a

lookup table, matrix multiplication, byte shifting, and key XORing.

slide-26
SLIDE 26

AES Enhancements

  • Inter-round: unrolling, pipelining
  • Intra-round: pipelining, partitioning

26

slide-27
SLIDE 27

AES Implementations

  • Most transformations are

Configurable Logic Blocks.

  • Byte substitution can be BRAM,

Distributed RAM or CLBs to different effect.

  • Conventional processors run

significantly worse than FPGAs.

  • ASICs run slightly faster than

FPGAs but lack the flexibility.

27

slide-28
SLIDE 28

Time/Area Tradeoffs

  • Fully unrolled has high area cost

but large performance increase.

  • Pipelining and partitioning has a

large increase in performance for minimal area increase. Can increase latency by a lot.

  • Performance can be calculated

via maximum throughput (data per second), latency (time until first packet), and efficiency (throughput per slice count).

28

slide-29
SLIDE 29

Paper Results

  • Optimal choice depends on user

metrics.

  • Unrolling performed well but had

poor efficiency.

  • Distributed RAM performed better

than block RAM but block RAM was much better at efficiency.

  • Transformation partitioning

alone gave significant improvement efficiently.

29

slide-30
SLIDE 30

Thanks for listening! Question Time

30