Fixed vs. programmable

When looking at the tests hardware.fr did, I get the hunch that the lowered performance of FP16 writes (without blending) ....

For the sake of clarity you have to include a G16R16F test (yes that is a valid rendertarget format). Then you can compare 32bit integer (RGBA8) vs. 32bit fp (G16R16F) writes without bandwidth contention/difference. That should boil down to pure FP vs. INT blending performance difference.

I think we agree that it makes no difference for the ROP if it's 2xFP16 or 4xFP16, I'm pretty sure they contain fully redundant vector-ALUs (always do 4x scalars).
 
I think we agree that it makes no difference for the ROP if it's 2xFP16 or 4xFP16, I'm pretty sure they contain fully redundant vector-ALUs (always do 4x scalars).
Probably, though I'll point out that apparently nvidia gpus in fact have ROP blending units which are shared by all lanes for fp32. No idea if this would affect other formats too, in any case AMD ROPs shouldn't work like that.
 
"In the UE4 Elemental demo, the majority of the GPU’s FLOPS are going into general compute algorithms, rather than the traditional graphics pipeline. This shouldn’t be surprising, as the core of the traditional pipeline is fed by fixed-function hardware and will ultimately be saturated given more performance at a fixed resolution. But the compute pipeline has unlimited forward scalability, so the compute trend should only grow in the future." - Tim Sweeney

This is quite interesting in light of the huge similarities between AVX2 and Knights Corner. It hints at a homogeneous computing future.
 
Back
Top