Implications of Cache Asymmetry on Server Consolidation Performance - - PowerPoint PPT Presentation

implications of cache asymmetry on server consolidation
SMART_READER_LITE
LIVE PREVIEW

Implications of Cache Asymmetry on Server Consolidation Performance - - PowerPoint PPT Presentation

Implications of Cache Asymmetry on Server Consolidation Performance Presenter: Omesh Tickoo Padma Apparao, Ravi Iyer, Don Newell *Hardware Architecture Lab Intel Corporation 1 IISWC 2008 Outline Server Consolidation Asymmetric


slide-1
SLIDE 1

1 IISWC 2008

Implications of Cache Asymmetry

  • n Server Consolidation Performance

Presenter: Omesh Tickoo

Padma Apparao, Ravi Iyer, Don Newell *Hardware Architecture Lab Intel Corporation

slide-2
SLIDE 2

2 IISWC 2008

Outline

  • Server Consolidation
  • Asymmetric Caches
  • Performance Implications
  • Measurement-Based Analysis
  • Conclusions / Future Work
slide-3
SLIDE 3

3 IISWC 2008

Server Consolidation

  • Motivation

– Virtualization and consolidation are a growing trend in datacenters – Majority of servers expected to run consolidated workloads within few years

Workload Single O/S Server

  • Problem

– Performance analysis of consolidation scenarios is challenging

  • Different virtualization overheads depending on VMM & platform virtualization support
  • Resource contention (core, cache, memory, etc) between VMs affects performance
  • Focus

– Server consolidation performance as a function of cache contention & asymmetry

Workload 1 Guest OS Server Workload 2 Guest OS Workload 3 Guest OS VMM or Hypervisor

slide-4
SLIDE 4

4 IISWC 2008

Why study asymmetry?

  • CMP platforms today have symmetric caches

– But space in cache is asymmetrically allocated depending

  • n demand from virtual machines

=> Virtual Asymmetry

  • Future CMP platforms may have asymmetric caches

– Asymmetry to reduce cache space domination of die area – Asymmetry due to process variability / faults => Physical Asymmetry

slide-5
SLIDE 5

5 IISWC 2008

Cache Asymmetry

C

Cache

C

Cache

C

Cache

( a) Sym m etric Private Caches of Equal Size

Task1 Task2 Taskx

C

Cache

Taskn

C

Cache

C

Cache

C

Cache

( a) Sym m etric Private Caches of Equal Size

Task1 Task2 Taskx

C

Cache

Taskn

C C

Task1 Task2 Taskn

Cache

Taskx

( b) Virtually Asym m etric Shared Caches of Equal Size

C C

Cache

C C

Task1 Task2 Taskn

Cache

Taskx

( b) Virtually Asym m etric Shared Caches of Equal Size

C C

Cache

C

Cache

C

Cache

C

Cache

( c) Physically Asym m etric Private Caches of Different Size

Task1 Task2 Taskx

C

Cache

Taskn

C

Cache

C

Cache

C

Cache

( c) Physically Asym m etric Private Caches of Different Size

Task1 Task2 Taskx

C

Cache

Taskn

C C

Task1 Task2 Taskn

Cache

Taskx

( d) Virtually & Physically Asym m etric Shared Caches of Different Size

C C

Cache

C C

Task1 Task2 Taskn

Cache

Taskx

( d) Virtually & Physically Asym m etric Shared Caches of Different Size

C C

Cache

W hat are the im plications

  • n server consolidation perform ance?
slide-6
SLIDE 6

6 IISWC 2008

Consolidation Benchmark

  • vConsolidate

VM/Workload Vcpus Configuration Memory Configuration in MB Java/SPECjbb (bops/sec) 2 2056 Database/Sysbench (Tx/sec) 2 1544 Web/Webench (Tx/sec) 2 1544 Mail/Exchange (hits/sec) 1 1544 Idle 1 418

5 VMs

  • - SPECjbb VM
  • - Sysbench VM
  • - Webbench VM
  • - MailServer VM
  • - Idle VM
slide-7
SLIDE 7

7 IISWC 2008

Platform Configuration

Hardware

  • Intel Xen 5400 series

– Quadcore per socket – 6MB+6MB $ per socket – Used 4MB, 3MB, 2MB cache configs also to create physical asymmetry

VMM

– Xen 3.1

LLC LLC LLC LLC Mem ory Xen 3.1 vConsolidate VM VM VM

slide-8
SLIDE 8

8 IISWC 2008

Analyzing Implications

  • Four Key Configurations

– 1 Virtual Machine

  • On physically symmetric cache
  • On physically asymmetric cache

– Multi-Virtual Machine

  • On physically symmetric cache

– But virtually asymmetric

  • On physically asymmetric cache

– But virtually asymmetric also

slide-9
SLIDE 9

9 IISWC 2008

1VM / Symmetric Caches

SPECjbb Performance (Symmetric Caches) 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Thruput CPI MPI Metric normalized to 2MB 2MB 3MB 4MB 6MB Sysbench Performance (Symmetric Caches) 0.2 0.4 0.6 0.8 1 1.2 Thruput CPI MPI Metric normalized to 2MB 2MB 3MB 4MB 6MB Webbench Performance (Symmetric Caches) 0.2 0.4 0.6 0.8 1 1.2 Thruput CPI MPI Metric normalized to 2MB 2MB 3MB 4MB 6MB

LLC LLC LLC LLC Mem ory

Virtual Machine (no sharing)

All LLCs

  • f same

size

Virtual Machine (w/ sharing)

OR

SPECjbb2005 most sensitive to cache – 50% perf improvement from 2MB to 6MB Sysbench and Webbench show less than 10% improvement

slide-10
SLIDE 10

10 IISWC 2008

Multi-VM / Virtually Asymmetry

LLC LLC LLC LLC Mem ory

Consolidated Virtual Machines (vCon)

All LLCs

  • f same

size

SPECjbb Performance with Virtual Cache Asymmetry (6MB)

0.00 0.40 0.80 1.20 1.60

J B B a l

  • n

e J B B + J B B J B B + S y s b e n c h J B B + W e b b e n c h J B B i n v C

  • n

Metric normalized to when running alone 6MB Thruput 6MB CPI 6MB MPI SPECjbb Performance with Virtual Cache Asymmetry (4MB) 0.00 0.40 0.80 1.20 1.60 JBBalone JBB+JBB JBB+Sysbench JBB+Webbench JBB in vCon Metric normalized to when running alone 4MB Thruput 4MB CPI 4MB MPI

Consolidation causes causes ~30% loss in performance Cache Interference => 20% Core Inteference => 9%

slide-11
SLIDE 11

11 IISWC 2008

1VM / Physical Asymmetry

LLC LLC LLC LLC Mem ory

Individual Virtual Machine

LLCs are 6M size LLCs are smaller (4M, 3M or 2M)

SPECjbb (Physically Asymmetric Caches)

0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 1.80 6-6 6-4 6-3 6-2 Metric normalized to 6MB-6MB Thruput CPI MPI

Sysbench (Physically Asymmetric Caches)

0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 6-6 6-4 6-3 6-2 Metric normalized to 6MB-6MB Thruput CPI MPI

Webbench (Physically Asymmetric Caches)

0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 6-6 6-4 6-3 6-2 Metric normalized to 6MB-6MB Thruput CPI MPI

SPECjbb2005 is affected the most Sysbench and Webbench are not affected much

slide-12
SLIDE 12

12 IISWC 2008

Multi-VM / Virtual+Physical Asymmetry

LLC LLC LLC LLC Mem ory

LLCs are 6M size LLCs are smaller (4M, 3M or 2M)

Consolidated Virtual Machines (vCon)

SPECjbb Performance onVirtual+Physical asymmetry

0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 6-6 6-4 6-3 6-2 Metric normalized to 6MB-6MB Thruput CPI MPI

Sysbench Performance on Virtual+Physical Asymmetry

0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 6-6 6-4 6-3 6-2 Metric normalized to 6MB-6MB Thruput CPI MPI WebBench Performance onVirtual+Physical Asymmetry 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 6-6 6-4 6-3 6-2 Metric normalized to 6MB-6MB Thruput CPI MPI

SPECjbb is affected the most (as expected) Sysbench and Webbench are not affected much Opportunity to move Sysbench and Webbench to smaller cache cores => can improve performance of SPECjbb?

slide-13
SLIDE 13

13 IISWC 2008

Inferences

  • Asymmetry-Aware

Scheduling

– Virtual Asymmetry

  • Monitor usage and

interference

  • Modify VMM scheduler to

take this into account

– Physical Asymmetry

  • Monitor usage in large

and small cores

  • Modify VMM scheduler to

affinitize

– Cache-sensitive VMs to large-cache-cores – Cache-insensitive VMs to small-cache-cores

JBB vcpu0 (affinitized to 6MB) vcpu1 (floating) % benefit CPI 1.51 1.80 19% MPI 0.0051 0.0070 39% Sysbench vcpu0 (affinitized to 6MB) vcpu1 (floating) % benefit CPI 2.51 2.96 18% MPI 0.0016 0.0020 25% Webbench vcpu0 (6MB cache) vcpu1 (floating) % benefit CPI 2.59 2.88 11% MPI 0.0023 0.0026 11%

Affinitization Experiment: Affinitize one vcpu to large core Leave the other vcpu floating Allows for detection of sensitivity for Improved scheduling

slide-14
SLIDE 14

14 IISWC 2008

Summary

  • Presented cache asymmetry

– Symmetric – Virtual Asymmetry – Physical Asymmetry – Virtual + Physical Asymmetry

  • Studied the implications of cache asymmetry on a consolidation

workload

– Using vConsolidate & asymmetric CMP platform

  • Showed cache contention overheads and overall cache sensitivity
  • Discussed the potential for asymmetry-aware scheduling