But mightn't they attain significantly higher memory clocks?I think the ROP/bandwidth deficit would be too large.
But mightn't they attain significantly higher memory clocks?I think the ROP/bandwidth deficit would be too large.
Notice the 7 "Polymorph engines". That explains the Heaven scores…
I don't know, does NVidia fixing GDDR5 performance per pin have anything to do with going to 32nm? Seems unlikely to me. NVidia still isn't achieving GDDR5 speeds that you see on the 55nm HD4870.But mightn't they attain significantly higher memory clocks?
That's the big question, isn't it? I've wondered ever since I got used to the idea that GF104 had 384 SPs why it's a "4" - clearly that's not standard NVIDIA nomenclature as it should be roughly 1/4, not more than 1/2! One possible explanation is that it refers to the number of GPCs, of which there would only be one...So... how many GPCs?
So... how many GPCs?
Hmm, "matching" HD4870, I think. 900MHz?NVidia still isn't achieving GDDR5 speeds that you see on the 55nm HD4870.
That matches, I think, with most people's expectations that NVidia would reduce GPC count for the lower GPUs.
Fermi appears to send all even warps to one SIMD and all odd warps to the other (page 10 of the Architecture Whitepaper).48 SPs -- does that mean a third warp scheduler per cluster?
Now, now, who was saying that 48 is weird numberInteresting ALU:TEX ratio
So it appears to be 48 ALUs per Polymorph Engine with 8 TMUs.
http://forum.beyond3d.com/showpost.php?p=1441721&postcount=6024Now, now, who was saying that 48 is weird number
I just want to see nVidia put out a DX11 card with 8800 GTX-level performance with low enough power requirements that it only needs a single slot. I'm getting the impression that that's not remotely reasonable until at least the refresh of this architecture (presumably next year).
They could just reduce the warps per scheduler to 16.For CUDA/OpenCL/DirectCompute, I think GF104 may be somewhat of a step backwards. GF100 has 64kB on-chip memory/(2 schedulers * 24 warps/scheduler * 32 work-items/warp) = ~42 Bytes per work-item. If GF104 has 3 schedulers and keeps the L1/Local Store the same, then GF104 may have 64 kB/(3 schedulers * 24 warps/scheduler * 32 work-items/warp) = ~28 Bytes per work-item of on-chip memory. This will make it harder to program than GF100.