Ah yes just saw that. So no more clauses. They seem to be embracing a lot of things nVidia has been preaching for years. Guess it'll come down to who has the best implementation.
Interesting. Wonder what this means for the APU that comes after Trinity. Who needs an FPU when you can make the GPU circuitry do double duty?
Yes, it does look a lot more like Fermi than Cayman did. There's still one major difference, though: no SIMT, just a classic Scalar + SIMD.
pcper said:no roadmaps, specific products, or feature rollout time lines
Anyone who does not want to invest a dime in his existing codes.Who needs an FPU when you can make the GPU circuitry do double duty?
So, if I understand correctly, AMD has reduced the branching granularity from 64 to 16. Right?
No. Instead of 16 vec5 over 4 cycles they will do 4x 16 scalar over 4 cycles.
It's easy. As each "thread" in a GPU has no own control flow, it is really only a D, not a T in SIMD (lane masking dosn't help that fundamental issue). There is no difference (and never was) between AMD an nvidia in this respect. Everything else are just stupid marketing terms.From a software perspective how do we know each D isn't a T? Each SIMD lane could very well be dedicated to a pixel/vertex/control-point/work-item. Otherwise how are they going to get SIMD instructions out of the average compute program?
Edit: To account for edit.So,
a) 4 different wavefronts from one workgroup issue to a CU.
b) 4 different wavefronts from 4 different workgroups issue to a CU.
What is it?
It's easy. As each "thread" in a GPU has no own control flow, it is really only a D, not a T in SIMD (lane masking dosn't help that fundamental issue). There is no difference (and never was) between AMD an nvidia in this respect. Everything else are just stupid marketing terms.
So,
a) 4 different wavefronts from one workgroup issue to a CU.
b) 4 different wavefronts from 4 different workgroups issue to a CU.
What is it?