Hybrid Computer Architecture Brian Van Essen Benjamin Ylvisaker - - PowerPoint PPT Presentation

hybrid computer architecture
SMART_READER_LITE
LIVE PREVIEW

Hybrid Computer Architecture Brian Van Essen Benjamin Ylvisaker - - PowerPoint PPT Presentation

Hybrid Computer Architecture Brian Van Essen Benjamin Ylvisaker Carl Ebeling Moores Law: Is it Over? n von Neumann processors no longer scale n Overhead of speculative execution is too high n Complexity of superscalar OOO core is n 2 n


slide-1
SLIDE 1

Hybrid Computer Architecture

Brian Van Essen Benjamin Ylvisaker Carl Ebeling

slide-2
SLIDE 2

Moore’s Law: Is it Over?

n von Neumann processors no longer scale

n Overhead of speculative execution is too high n Complexity of superscalar OOO core is n2 n Optimum power / performance pipeline depth is ~7

stages

n Spatial processors benefit from added transistors

n Reconfigurability allows virtualization

n Enables programming abstraction

slide-3
SLIDE 3

Keeping up with streams is hard

n Multimedia workloads

n Audio & Video

n Communication workloads

n Networking

ƒ(x)=… Example of a streaming transformation Spatial processors are good at this

slide-4
SLIDE 4

Hybrid Architecture Research

n Blend sequential and spatial computing

n One program executes both types of

computation

slide-5
SLIDE 5

Overview

n What is spatial computing

n Why is it interesting

n Hybrid Architectures

n What is hard about hybrid architectures

n Future Research

slide-6
SLIDE 6

What is spatial computing?

n Spatial processors:

n Parallel array of compute elements (fabric) n Assign operations to different physical

resources

n Stream operands through the fabric n Execute many operations in parallel

n Sequential processors:

n Step through a sequence of instructions

slide-7
SLIDE 7

Encoding a program

Load r 1 , A Load r 2 , B Load r 3 , C Load r 4 , D Add r 5 , r 1 , r 2 Mul r 6 , r 2 , r 3 Add r 7 , r 1 , r 5 Sub r 8 , r 5 , r 4 Sub r 9 , r 7 , r 6 Add r 10 , r 7 , r 8 Mul r 11 , r 8 , r 4

LD LD LD LD + x +

  • +

x

Instruction Stream Dataflow Graph

slide-8
SLIDE 8

Processors: Under the hood

PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE

Data

Traditional Computer (Load / Store Arch) Spatial Computer (e.g. FPGA, PipeRench)

Instructions

Fetch Decode WriteBack Execute

ALU LS

Register File Memory

slide-9
SLIDE 9

Why spatial processors?

n Extremely efficient for certain applications

n Regular computation n Regular communication

n e.g. Streaming Data

n Excellent performance / power ratio n Limitations:

n Difficult to execute control flow n Hard to program

slide-10
SLIDE 10

Basic Hybrid Architectures

n Two processors on a single chip

n Integrates control plane and data plane processors n Provide high speed interconnect n Share memory

n Execute independent programs

n Manage synchronization

slide-11
SLIDE 11

Unified Hybrid Architecture

n Single programming model

n Collapses control plane and data plane processors into single

abstraction

n Implicit synchronization n Simplified programming abstraction n Program “Automagicly” executes on appropriate processor

n Runtime system manages

fabric configuration

slide-12
SLIDE 12

Research Challenges

n Creating a new Instruction Set Architecture (ISA)

n Provides canonical sequential interpretation n Exposes good spatial configuration n Efficient synchronization of runtime control

n Virtualization of spatial processors is hard

n Necessary to provide abstract programmers model n Use dynamic reconfiguration

n Programming Language

n Explicit stream operations n Disambiguate memory references

slide-13
SLIDE 13

Research Synopsis

n Define new processor architecture and ISA

n New level of ease of use

n Unified programming model

n Blend sequential and spatial computing

n Excels at streaming data applications n One program executes both types of computation

n Implicit communication

n Efficient virtualization of spatial processors n System-level programming language

slide-14
SLIDE 14

Appendix

Type Architectures Programming Languages

slide-15
SLIDE 15

Abstract processor models

n von Neumann Type Architecture - RAM Model

n A processor interpreting 3-address instructions n PC describing the next instruction of program in memory n Flat, randomly accessed memory requires 1 time unit n Memory is composed of fixed sized addressable units n One instruction executes at a time, and is completed before the

next instruction executes

n Modern RISC & CISC processors emulate this model

C directly implements this model

slide-16
SLIDE 16

Hybrid Type Architecture

n

von Neumann sequential processor

n

Spatial Fabric

n

P operations per cycle

n

Statically scheduled

n

Main Memory

n

~ 1 access per cycle

n

Local Memory (Workspace)

n

~ P accesses per cycle

n

enough to maintain P ops

n

Alternating Execution

n

Sequential program executes

n

Control transferred to spatial fabric

n

Shared state transferred

n

Atomic execution of spatial section

n

Shared state transferred back

Main Memory M1 Sequential Processor Spatial Computing Fabric Local Memory M2 Spatial Processor

Working set

slide-17
SLIDE 17

A new Programming Language

n “System level”

n Full control of underlying ISA n Explicit resource management

n Key Issues

n Expressing parallel portions of computation

n Easily mapped to spatial processor

n “Relaxed” memory access ordering

n e.g. streams

n Disambiguate memory references

n mitigate aliasing

n Reflect constraints of type architecture

n e.g. low main memory bandwidth