The Sliding Window Algorithm The Sliding Window algorithm sums - - PowerPoint PPT Presentation

▶

Jan 28, 2024 236 likes •470 views

The Sliding Window Algorithm The Sliding Window algorithm sums several small sub-matrices of a matrix of values. The maximum of these sums is then located and it's coordinates passed out of the program. This is used as a

SLIDE 1

The Sliding Window Algorithm

The “Sliding Window” algorithm sums

several small sub-matrices of a matrix of values.

The maximum of these sums is then

located and it's coordinates passed out of the program.

This is used as a calorimetry trigger –

used for locating events with high energy jets.

– (The following 6 slides have been copied

from Matthew's presentation)

SLIDE 2

Sliding window: serial

SLIDE 3

Sliding window: serial

SLIDE 4

Sliding window: serial

SLIDE 5

Sliding window: serial

SLIDE 6

Sliding window: serial etc...

SLIDE 7

Sliding window: parallel (5x5 border around each submatrix – use your imagination)

SLIDE 8

Two Approaches to the Sliding Window

CPU Algorithm (Standard)
GPU Algorithm
Hybrid Algorithm

– Use the GPU to perform the sliding

window sum

– Transfer the resulting matrix of sums to

CPU

– Use the CPU to locate the maximum

SLIDE 9

Motivation : Time Complexity

The time complexity of the algorithm to

locate a maximum on the CPU is linear O(N)

The time complexity of the best algorithm

to do the same on the GPU is O(N log N)

– Even if the GPU cores and CPU cores

had the same processing speed, more calculations are required by the GPU to perform the same task.

SLIDE 10

Small Problem : Find Max

The speed-up is not fully realized for small window sizes because the GPU finishes the calculation nearly as fast as new calculation commands are issued. Note that at ATLAS scale problems (~5,000) this algorithm performs MUCH worse than the CPU version. Matthew wrote a new algorithm that works better at ATLAS scale problems but not as well at extreme values.

SLIDE 11

Small Problem : Sliding Window Speed-Up

For those concerned : The sudden drops on this plot are because of my testing procedure with rectangular grids of varying dimensions. Threads are issued 1 warp (32 threads) at a time and I declared each block to be a constant 256 threads. Because of this there are problem sizes for which a large number of threads are inactive.

SLIDE 12

Large Problem : Find Max

Even at extremely large sizes the speed up offered by the GPU algorithm pales compared to the speed-up of the sliding window.

SLIDE 13

Large Problem : Sliding Window Speed-Up

Here the algorithm has plateaued. The speed-up for any algorithm is limited by the number of GPU cores which can run simultaneously.

SLIDE 14

Motivation : Processing Speed vs Copy Speed

The GPU cores are individually much

slower than a CPU core.

The copy speed from the GPU to CPU is

very fast – and the result that needs to be copied is relatively small.

– It may be worth the time to copy the

memory to the CPU as it can do it much faster.

SLIDE 15

Small Problem Fraction Plots

SLIDE 16

Large Problem Fraction Plots

SLIDE 17

Conclusion

At ATLAS scale, the speed-up grows fastest

and is greatest for the Hybrid algorithm (see ratio plot)

Beyond the ATLAS scale (a factor of ~10

greater) the purely GPU algorithm becomes better.

SLIDE 18

Small Problem : Ratio

At small problem sizes (current ATLAS size is around 5,000) the Hybrid algorithm provides greater speed-up

SLIDE 19

Large Problem : Ratio

This shows that at extremely large problem sizes the purely GPU based algorithm provides a greater speed-up than the hybrid.

SLIDE 20

Small Problem Speed Up

SLIDE 21

Large Problem Speed-Up

The Sliding Window Algorithm The Sliding Window algorithm sums - - PowerPoint PPT Presentation

The Sliding Window Algorithm

several small sub-matrices of a matrix of values.

located and it's coordinates passed out of the program.

used for locating events with high energy jets.

from Matthew's presentation)

Two Approaches to the Sliding Window

window sum

CPU

Motivation : Time Complexity

locate a maximum on the CPU is linear O(N)

to do the same on the GPU is O(N log N)

had the same processing speed, more calculations are required by the GPU to perform the same task.

Small Problem : Find Max

Small Problem : Sliding Window Speed-Up

Large Problem : Find Max

Large Problem : Sliding Window Speed-Up

Motivation : Processing Speed vs Copy Speed

slower than a CPU core.

very fast – and the result that needs to be copied is relatively small.

memory to the CPU as it can do it much faster.

Small Problem Fraction Plots

Large Problem Fraction Plots

Conclusion

and is greatest for the Hybrid algorithm (see ratio plot)

greater) the purely GPU algorithm becomes better.

Small Problem : Ratio

Large Problem : Ratio

Small Problem Speed Up

Large Problem Speed-Up

Backup Slide : Cuda Card Specs

cores each (1,536 cores) @ ~1000 MHz each