(c) Ph. Tsigas, Y. Zhang
A Parallel Implementation of Quicksort and its Performance Evaluation
Philippas Tsigas Yi Zhang Department of Computing Science Chalmers University of Technology
A Parallel Implementation of Quicksort and its Performance - - PowerPoint PPT Presentation
A Parallel Implementation of Quicksort and its Performance Evaluation Philippas Tsigas Yi Zhang Department of Computing Science Chalmers University of Technology (c) Ph. Tsigas, Y. Zhang The aim of our work Sorting is an
(c) Ph. Tsigas, Y. Zhang
Philippas Tsigas Yi Zhang Department of Computing Science Chalmers University of Technology
(c) Ph. Tsigas, Y. Zhang
Sorting is an important kernel Parallel implementations of sorting
Based on message-passing machines, Sample sort
New developments in computer architecture
Cache-Coherent shared memory Tightly-coupled multiprocessor
(c) Ph. Tsigas, Y. Zhang
Advantages
General purpose In-place Good cache-behavior Simple
Disadvantages
Parallel implementations do not scale up.
(c) Ph. Tsigas, Y. Zhang
Parallel Partition of the Data
Block based partition Cache efficient
Sequential Partition of the Data
At most P+1 blocks (P: Number of processors)
Process Partition Sequential Sorting with Helping
Load-balancing Non-blocking synchronization
(c) Ph. Tsigas, Y. Zhang
General purpose In-place Good cache-behavior Fine grain parallelism Good speedup in theory
(c) Ph. Tsigas, Y. Zhang
2 4 6 8 10 12 14 16 PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS [U]-8M [G]-8M [Z]-8M [B]-8M [S]-8M Speedup 1P 2P 4P 8P 16P 32P
(c) Ph. Tsigas, Y. Zhang
5 10 15 20 25 30 PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS [U]-32M [G]-32M [Z]-32M [B]-32M [S]-32M Speedup 1P 2P 4P 8P 16P 32P
(c) Ph. Tsigas, Y. Zhang
5 10 15 20 25 30 PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS [U]-64M [G]-64M [Z]-64M [B]-64M [S]-64M Speedup 1P 2P 4P 8P 16P 32P
(c) Ph. Tsigas, Y. Zhang
5 10 15 20 25 30 PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS [U]-128M [G]-128M [Z]-128M [B]-128M [S]-128M Speedup 1P 2P 4P 8P 16P 32P
(c) Ph. Tsigas, Y. Zhang
Quicksort can beat Sample Sort on cache-
Fine grain parallelism that incorporates non-
Cache-coherent shared memory