NVIDIA GF100 & Friends speculation

CRoland · Jul 8, 2010

Jawed said:
I think the ROP/bandwidth deficit would be too large.

But mightn't they attain significantly higher memory clocks?

Picao84 · Jul 8, 2010

Alexko said:
Notice the 7 "Polymorph engines". That explains the Heaven scores…

I disagree with that.. If you look at nVIDIA slides, you will see GTX460 outperforming Radeon 5830, something which doesnt happen in Gigabyte slides.
The catch? Different configuration. Resolution is lower on NVIDIA slide than Gigabyte and there is no AA at nVIDIA one.
That tells me the problem in Gigabyte one is running out of memory/bottlenecks on the 192 bit memory bus.

Jawed · Jul 8, 2010

So... how many GPCs?

Jawed · Jul 8, 2010

CRoland said:
But mightn't they attain significantly higher memory clocks?

I don't know, does NVidia fixing GDDR5 performance per pin have anything to do with going to 32nm? Seems unlikely to me. NVidia still isn't achieving GDDR5 speeds that you see on the 55nm HD4870.

Though to be fair my earlier point about the huge distance of the MCs from the I/O pads in GF100 and its effect on power/clocks could be key here. That should be lessened in GF104, since the die is smaller overall, and would be better still in a 32nm chip.

---

Maybe NVidia's saving up a full-spec GF104 with 1GHz+ GDDR5 for later?

Arun · Jul 8, 2010

Jawed said:
So... how many GPCs?

That's the big question, isn't it? I've wondered ever since I got used to the idea that GF104 had 384 SPs why it's a "4" - clearly that's not standard NVIDIA nomenclature as it should be roughly 1/4, not more than 1/2! One possible explanation is that it refers to the number of GPCs, of which there would only be one...

Sontin · Jul 8, 2010

Jawed said:
So... how many GPCs?

Two GPCs:
http://www.adrenaline.com.br/files/...i/2010-07-07_nvidia_cdesempenhogeometrico.jpg

Jawed · Jul 8, 2010

Jawed said:
NVidia still isn't achieving GDDR5 speeds that you see on the 55nm HD4870.

Hmm, "matching" HD4870, I think. 900MHz?

Picao84 · Jul 8, 2010

Sontin said:
Two GPCs:
http://www.adrenaline.com.br/files/...i/2010-07-07_nvidia_cdesempenhogeometrico.jpg

Care to explain?
EDIT: You are reaching that conclusion by the 1.4 number? As GF100 is supposed to be 2,8 (4 in theory).

Jawed · Jul 8, 2010

Sontin said:
Two GPCs:
http://www.adrenaline.com.br/files/...i/2010-07-07_nvidia_cdesempenhogeometrico.jpg

That matches, I think, with most people's expectations that NVidia would reduce GPC count for the lower GPUs.

fellix · Jul 8, 2010

48 SPs -- does that mean a third warp scheduler per cluster?

Jawed · Jul 8, 2010

fellix said:
48 SPs -- does that mean a third warp scheduler per cluster?

Fermi appears to send all even warps to one SIMD and all odd warps to the other (page 10 of the Architecture Whitepaper).

Is that actually what it's doing? Would GF104 be doing the same? Implying 24-wide SIMDs?

Since the register file is common to the SIMDs, the banking preferably "matches up" with the SIMD width.

Dunno, really, which way it goes.

fellix · Jul 8, 2010

From the GF100's die shot I simply can't figure out, how would a third SIMD "pack" be implemented within the current multiprocessor configuration?! It will break all the nice symmetry in there and bloat the whole thing, so a revamped warp size is possible by extending the two SIMD arrays.

Jawed · Jul 8, 2010

Warp size of 32 seems "important" to NVidia, there's sort of a promise to CUDA developers for it to stay at 32. I forgot about that earlier, so I think 3 SIMDs.

RecessionCone · Jul 8, 2010

For CUDA/OpenCL/DirectCompute, I think GF104 may be somewhat of a step backwards. GF100 has 64kB on-chip memory/(2 schedulers * 24 warps/scheduler * 32 work-items/warp) = ~42 Bytes per work-item. If GF104 has 3 schedulers and keeps the L1/Local Store the same, then GF104 may have 64 kB/(3 schedulers * 24 warps/scheduler * 32 work-items/warp) = ~28 Bytes per work-item of on-chip memory. This will make it harder to program than GF100.

I also wonder what they've done with the L2 cache in GF104...

chavvdarrr · Jul 8, 2010

Jawed said:
Interesting ALU:TEX ratio

So it appears to be 48 ALUs per Polymorph Engine with 8 TMUs.

Now, now, who was saying that 48 is weird number

Kaotik · Jul 8, 2010

Soo, anyone got extreme tesselation benches from other sources in heaven? I can't remember the difference being that big between hd5870 and gtx480 before

no-X · Jul 8, 2010

chavvdarrr said:
Now, now, who was saying that 48 is weird number

http://forum.beyond3d.com/showpost.php?p=1441721&postcount=6024

KimB · Jul 8, 2010

I just want to see nVidia put out a DX11 card with 8800 GTX-level performance with low enough power requirements that it only needs a single slot. I'm getting the impression that that's not remotely reasonable until at least the refresh of this architecture (presumably next year).

Tchock · Jul 8, 2010

Chalnoth said:
I just want to see nVidia put out a DX11 card with 8800 GTX-level performance with low enough power requirements that it only needs a single slot. I'm getting the impression that that's not remotely reasonable until at least the refresh of this architecture (presumably next year).

Aiming rather low aren't we?
GTS450/GF106 at high clocks should make short work of this.

ATI already did it with the 5750/70, it's double slot just for the fact that Eyefinity screams ports; new cards are technically single slotted but bracketed for two. If perf/W gets closer to Evergreen for GF10X then you might see your card there.

But then again unless you didn't jump on a 4850 (which is essentially that sans DX11) then this seems marvelous.

Jawed · Jul 8, 2010

RecessionCone said:
For CUDA/OpenCL/DirectCompute, I think GF104 may be somewhat of a step backwards. GF100 has 64kB on-chip memory/(2 schedulers * 24 warps/scheduler * 32 work-items/warp) = ~42 Bytes per work-item. If GF104 has 3 schedulers and keeps the L1/Local Store the same, then GF104 may have 64 kB/(3 schedulers * 24 warps/scheduler * 32 work-items/warp) = ~28 Bytes per work-item of on-chip memory. This will make it harder to program than GF100.

They could just reduce the warps per scheduler to 16.

NVIDIA GF100 & Friends speculation

CRoland

Picao84

Jawed

Jawed

Arun

Unknown.

Sontin

Jawed

Picao84

Jawed

fellix

Jawed

fellix

Jawed

RecessionCone

chavvdarrr

Kaotik

Drunk Member

no-X

KimB

Tchock

Jawed

Similar threads