ATLAS ATLAS A Scalable Emulator for A Scalable Emulator for - - PowerPoint PPT Presentation

atlas atlas
SMART_READER_LITE
LIVE PREVIEW

ATLAS ATLAS A Scalable Emulator for A Scalable Emulator for - - PowerPoint PPT Presentation

ATLAS ATLAS A Scalable Emulator for A Scalable Emulator for Transactional Parallel Systems Transactional Parallel Systems Christos Kozyrakis and Kunle Olukotun Christos Kozyrakis and Kunle Olukotun Computer Systems Laboratory Computer


slide-1
SLIDE 1

ATLAS ATLAS

A Scalable Emulator for A Scalable Emulator for Transactional Parallel Systems Transactional Parallel Systems

Christos Kozyrakis and Kunle Olukotun Christos Kozyrakis and Kunle Olukotun

Computer Systems Laboratory Computer Systems Laboratory Stanford University Stanford University http:// http://tcc.stanford.edu tcc.stanford.edu

slide-2
SLIDE 2
  • C. Kozyrakis, WARFP, Feb. 2005

2

2

Motivation Motivation

  • CMPs are here, but how do we program them?

CMPs are here, but how do we program them?

  • Our proposal: transactional programming & execution

Our proposal: transactional programming & execution

  • Programs written as sequences of transactions

Programs written as sequences of transactions

  • CMP executes transactions in parallel with optimistic concurrenc

CMP executes transactions in parallel with optimistic concurrency y

  • More details at

More details at http://tcc.stanford.edu http://tcc.stanford.edu

  • Challenges

Challenges

  • Explore programming model with large applications & datasets

Explore programming model with large applications & datasets

  • Interactions with operating systems and IO

Interactions with operating systems and IO

  • Large

Large-

  • scale transactional architectures (>16 nodes)

scale transactional architectures (>16 nodes)

  • Need a fast, scalable emulator for system

Need a fast, scalable emulator for system-

  • level studies

level studies

  • Full

Full-

  • system simulation too slow for our purposes…

system simulation too slow for our purposes…

slide-3
SLIDE 3
  • C. Kozyrakis, WARFP, Feb. 2005

3

3

ATLAS Overview ATLAS Overview

  • A multi

A multi-

  • board emulator for transactional

board emulator for transactional parallel systems parallel systems

  • Goals

Goals

  • 16 to 64 CPUs (8 to 32 boards)

16 to 64 CPUs (8 to 32 boards)

  • 50 to 100MHz

50 to 100MHz

  • Stand

Stand-

  • alone full

alone full-

  • feature system

feature system

  • OS, IDE disks, 100Mb Ethernet,

OS, IDE disks, 100Mb Ethernet, … …

CPU

Transaction Cache

CPU

Transaction Cache

DRAM

IO/DRAM Control DISK Net PCI

CMP NETWORK

  • ATLAS architecture space

ATLAS architecture space

  • Small, medium, and large

Small, medium, and large-

  • scale CMPs and

scale CMPs and SMPs SMPs

  • UMA and NUMA

UMA and NUMA

  • Flexible transactional memory hierarchy & protocol

Flexible transactional memory hierarchy & protocol

  • Flexible network model

Flexible network model

  • Flexible clocking, latency, and bandwidth settings

Flexible clocking, latency, and bandwidth settings

slide-4
SLIDE 4
  • C. Kozyrakis, WARFP, Feb. 2005

4

4

Building Block: Building Block: Xilinx Xilinx ML310 Board ML310 Board

  • XC2VP30 FPGA features

XC2VP30 FPGA features

  • 2 PowerPC 405 cores

2 PowerPC 405 cores

  • 2.4Mb dual

2.4Mb dual-

  • ported SRAM

ported SRAM

  • 30K logic cells

30K logic cells

  • 8

8 RocketIO RocketIO 3.125Gbps transceivers 3.125Gbps transceivers

  • System features

System features

  • 256MB DDR, 512MB

256MB DDR, 512MB CompactFlash CompactFlash

  • Ethernet, PCI, USB, IDE, …

Ethernet, PCI, USB, IDE, …

  • Design and development tools

Design and development tools

  • Foundation ISE for design entry, synthesis, …

Foundation ISE for design entry, synthesis, …

  • For the transactional memory hierarchy and network

For the transactional memory hierarchy and network

  • Chipscope

Chipscope Pro logic analyzer for debugging Pro logic analyzer for debugging

  • EDK for system simulation, system SW development, configuration,

EDK for system simulation, system SW development, configuration, … …

  • Montavista

Montavista Linux 3.1 Pro Linux 3.1 Pro

slide-5
SLIDE 5
  • C. Kozyrakis, WARFP, Feb. 2005

5

5

Example: 2 Example: 2-

  • way bus

way bus-

  • based transactional CMP

based transactional CMP

Transaction State Store Queue Transaction State Store Queue

PowerPC 405 PowerPC 405 BRAM

OCM

BRAM BRAM BRAM BRAM BRAM

OCM

Logic Logic Logic BRAM

PLB

Logic Macro

PLB PLB PLB PLB

slide-6
SLIDE 6
  • C. Kozyrakis, WARFP, Feb. 2005

6

6

ATLAS Software Framework ATLAS Software Framework

  • PowerPC and ML310 features provide rich SW framework

PowerPC and ML310 features provide rich SW framework

  • Linux OS

Linux OS

  • Port for

Port for Xilinx Xilinx boards available from boards available from Montavista Montavista

  • Allows exploration of transactions with IO and scheduling

Allows exploration of transactions with IO and scheduling

  • Gcc

Gcc C/C++ software framework C/C++ software framework

  • TCC API for transactional programming

TCC API for transactional programming

  • Allows experimentation with wide range of applications

Allows experimentation with wide range of applications

  • Jikes

Jikes-

  • RVM Java framework

RVM Java framework

  • TCC API for transactional programming

TCC API for transactional programming

  • Allows exploration of dynamic optimization techniques

Allows exploration of dynamic optimization techniques

  • Allows us to focus on parallel programming quickly

Allows us to focus on parallel programming quickly

  • No need to develop significant infrastructure from scratch

No need to develop significant infrastructure from scratch

  • Gradual path to parallel application development

Gradual path to parallel application development

  • Sequential version of C/C++/Java apps runs immediately

Sequential version of C/C++/Java apps runs immediately

slide-7
SLIDE 7
  • C. Kozyrakis, WARFP, Feb. 2005

7

7

Trade Trade-

  • offs & Scalability
  • ffs & Scalability
  • ATLAS trade

ATLAS trade-

  • offs
  • ffs

– – Sacrifice some hardware modeling flexibility Sacrifice some hardware modeling flexibility

  • Simple CPU, SW or coprocessor FPU, bounded on

Simple CPU, SW or coprocessor FPU, bounded on-

  • chip memory

chip memory

+ + Fast hardware prototyping Fast hardware prototyping

  • Develop RTL for transactional memory + networking protocol

Develop RTL for transactional memory + networking protocol

+ + Rich software framework Rich software framework + + Based on commercial hardware and software Based on commercial hardware and software

  • Low cost, timely upgrades and improvements

Low cost, timely upgrades and improvements

  • Scaling

Scaling

  • Scalability by adding boards (size & performance)

Scalability by adding boards (size & performance)

  • Use

Use RocketIO RocketIO tranceivers tranceivers and and Xilinx Xilinx Aurora protocol Aurora protocol

  • Limitations

Limitations

  • 32

32-

  • bit cores can address up to 4GB of shared memory

bit cores can address up to 4GB of shared memory

  • 8 transceivers per chip

8 transceivers per chip ⇒ ⇒ must synthesize router for >16 CPU must synthesize router for >16 CPU

slide-8
SLIDE 8
  • C. Kozyrakis, WARFP, Feb. 2005

8

8

Summary Summary

  • A scalable emulator for transactional parallel systems

A scalable emulator for transactional parallel systems

  • Based on commercial FPGA chips, boards, and software

Based on commercial FPGA chips, boards, and software

  • 32 to 64 CPUs at 50 to 100MHz

32 to 64 CPUs at 50 to 100MHz

  • A 6.4 GIPS emulator at full scale

A 6.4 GIPS emulator at full scale

  • Low cost, fast, flexible

Low cost, fast, flexible

  • ATLAS architecture space

ATLAS architecture space

  • Large

Large-

  • scale parallel systems with transactional memory support

scale parallel systems with transactional memory support

  • ATLAS software space

ATLAS software space

  • Transactional parallel programming and optimizations

Transactional parallel programming and optimizations

  • Operating systems and IO research

Operating systems and IO research

  • Large

Large-

  • scale application development

scale application development

  • Embedded, server, desktop

Embedded, server, desktop