A Parallel Implementation of Quicksort and its Performance - - PowerPoint PPT Presentation

a parallel implementation of quicksort and its
SMART_READER_LITE
LIVE PREVIEW

A Parallel Implementation of Quicksort and its Performance - - PowerPoint PPT Presentation

A Parallel Implementation of Quicksort and its Performance Evaluation Philippas Tsigas Yi Zhang Department of Computing Science Chalmers University of Technology (c) Ph. Tsigas, Y. Zhang The aim of our work Sorting is an


slide-1
SLIDE 1

(c) Ph. Tsigas, Y. Zhang

A Parallel Implementation of Quicksort and its Performance Evaluation

Philippas Tsigas Yi Zhang Department of Computing Science Chalmers University of Technology

slide-2
SLIDE 2

(c) Ph. Tsigas, Y. Zhang

The aim of our work

Sorting is an important kernel Parallel implementations of sorting

Based on message-passing machines, Sample sort

New developments in computer architecture

bring us new research opportunities

Cache-Coherent shared memory Tightly-coupled multiprocessor

slide-3
SLIDE 3

(c) Ph. Tsigas, Y. Zhang

Quicksort

Advantages

General purpose In-place Good cache-behavior Simple

Disadvantages

Parallel implementations do not scale up.

slide-4
SLIDE 4

(c) Ph. Tsigas, Y. Zhang

Our Approach

3+1 Phases

Parallel Partition of the Data

Block based partition Cache efficient

Sequential Partition of the Data

At most P+1 blocks (P: Number of processors)

Process Partition Sequential Sorting with Helping

Load-balancing Non-blocking synchronization

slide-5
SLIDE 5

(c) Ph. Tsigas, Y. Zhang

The advantages of our approach

General purpose In-place Good cache-behavior Fine grain parallelism Good speedup in theory

slide-6
SLIDE 6

(c) Ph. Tsigas, Y. Zhang

Experimental Results (8M Integers)

2 4 6 8 10 12 14 16 PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS [U]-8M [G]-8M [Z]-8M [B]-8M [S]-8M Speedup 1P 2P 4P 8P 16P 32P

slide-7
SLIDE 7

(c) Ph. Tsigas, Y. Zhang

Experimental Results (32M Integers)

5 10 15 20 25 30 PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS [U]-32M [G]-32M [Z]-32M [B]-32M [S]-32M Speedup 1P 2P 4P 8P 16P 32P

slide-8
SLIDE 8

(c) Ph. Tsigas, Y. Zhang

Experimental Results (64M Integers)

5 10 15 20 25 30 PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS [U]-64M [G]-64M [Z]-64M [B]-64M [S]-64M Speedup 1P 2P 4P 8P 16P 32P

slide-9
SLIDE 9

(c) Ph. Tsigas, Y. Zhang

Experimental Results (128M Integers)

5 10 15 20 25 30 PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS [U]-128M [G]-128M [Z]-128M [B]-128M [S]-128M Speedup 1P 2P 4P 8P 16P 32P

slide-10
SLIDE 10

(c) Ph. Tsigas, Y. Zhang

Conclusions

Quicksort can beat Sample Sort on cache-

coherent shared memory multiprocessors.

Fine grain parallelism that incorporates non-

blocking synchronization can be efficient.

Cache-coherent shared memory

multiprocessors offer many new research

  • pportunities.