SLIDE 26 Cornell University Ji Kim 26/20 Cornell University Ji Kim 26/20 Cornell University Ji Kim
FG-SIMT Detailed Microarchitecture
SGU SLU SAU1 SAU0 Lane 7 SSU SGU SLU SAU1 SAU0 Lane 2 SSU SGU SLU SAU1 SAU0 Lane 1 SSU SRF 124 × 32b 6r3w SGU SLU SAU1 SAU0 Lane 0 SSU Lane Control SAU0 SAU1 SGU SLU SSU Memory Coalescing Unit SMU SRF 124 × 32b 6r3w SRF 124 × 32b 6r3w SRF 124 × 32b 6r3w
256b 256b 32b 32b 32b 32b 32b 32b 32b 32b
SIU SLWQ SMRQ SLDQ SMRQ SLDQ SMRQ SLDQ SMRQ SLDQ BRMR Control Processor CP RF 31 × 32b 2r1w SIQ CP PC Microarch Kernel State
CEVS Execution Engine L1 Memory System
L1 D$ Bank 0 16 KB L1 D$ Bank 1 16 KB L1 D$ Bank 2 16 KB L1 D$ Bank 7 16 KB L1 I$ 16 KB D$ Request and Response Crossbars L2 Request and Response Crossbars
32b 32b 256b 256b 256b 256b 256b 256b 256b 256b
PWFB AWFR PC Mask SMRRQ Shared Load Cache
Eight SIMT lanes Dynamic reconvergence Five vector functional units with support for chaining Multi-ported banked regfile with support for executing 32 threads at a time Shared load cache for kernel input parameters Memory coalescing to dynamically create wide accesses
24/27 Motivation GP-SIMT vs. FG-SIMT Value Structure FG-SIMT Baseline Compact Affine Execution Evaluation