Jawed
Legend
Slide 19
http://developer.amd.com/wordpress/media/2013/06/2620_final.pdf
It's a key differentiator as compared with the VLIW SIMD design: "Vector back-to-back wavefront instruction issue", versus "Interleaved wavefront instruction required" for VLIW.
I have got respectable performance out of GCN with just a single wavefront per SIMD (i.e. more than 128 VGPR allocation). Depends on ALU:MEM and incoherent control flow, in the end.
32KiB (shared by several CUs) of I$ is plenty large enough for fairly complex compute (a single very heavy kernel). Multiple, large, competing kernels sharing I$ is obviously going to be a factor with the various kernels seen by graphics. Still doesn't change the fact that GCN was designed explicitly for back-to-back execution of instructions from a single wavefront.
http://developer.amd.com/wordpress/media/2013/06/2620_final.pdf
It's a key differentiator as compared with the VLIW SIMD design: "Vector back-to-back wavefront instruction issue", versus "Interleaved wavefront instruction required" for VLIW.
I have got respectable performance out of GCN with just a single wavefront per SIMD (i.e. more than 128 VGPR allocation). Depends on ALU:MEM and incoherent control flow, in the end.
32KiB (shared by several CUs) of I$ is plenty large enough for fairly complex compute (a single very heavy kernel). Multiple, large, competing kernels sharing I$ is obviously going to be a factor with the various kernels seen by graphics. Still doesn't change the fact that GCN was designed explicitly for back-to-back execution of instructions from a single wavefront.