SLIDE 17 Empirically Modelling
General scheme empirical modelling hybrid DGEMM(.,
M, N, K, A, LDA, B, LDB, C, LDC, B, LDB, N CPU) Installation Set {384, 1152, · · · , 8064} N CPU = N CPU + ∆N CPU N GPU = N − N CPU Execution LEAST SQUARE Tdgemm(m, n) = k1m2n + k2m2 + k3m Tdgemm gpu(m, n) and Tdgemm cpu(m, n) ki gpu and ki cpu Tcomu(n) = ts + ntw Tcomu h2d and Tcomu d2h tsh2d, twh2d and tsd2h, twd2h TEXEC = max (Tdgemm cpu + γTcomu, Tdgemm gpu + Tcomu)
INSTALLATION Empirically Modelling hybrid LU routine
The values of the coefficients ki for the multiplication on GPU and for the multiplication on CPU are obtained as described previously. But taking into account that m = n ≫ b. The performance improvement is greater than considering m = n = k, as further discussed in the experimental results section
Bernab´ e et al. (SCPPG) gbernabe@um.es ICCS / June 10-12, 2014 17 / 24