NVIDIA Kepler speculation thread

Hopefully soon we'll get some board shots!

I'm intrigued to see if they've dropped the hot-clock
- if makes sense to that they would - in these days where everything is power limited, the hot-clock seems like a mistake architecturally

I wish I could edit my comments & correct all the typos!
:rolleyes:
 
- if makes sense to that they would - in these days where everything is power limited, the hot-clock seems like a mistake architecturally
Differential clocking was the most viable approach for NV to boost computational throughput straight up, given the specifics and goals of their architectures -- single-scalar issue with (relatively) small warp size. The other approach was the complex and costly instruction scheduling.
It wasn't until GF104 that ILP had come in to play.
 
Differential clocking was the most viable approach for NV to boost computational throughput straight up, given the specifics and goals of their architectures -- single-scalar issue with (relatively) small warp size. The other approach was the complex and costly instruction scheduling.
It wasn't until GF104 that ILP had come in to play.

Obviously, double clocking means they can do twice as much per ALU, but also obviously, comes at a cost in terms of the extra pipeline stages needed to run at higher clocks, which would also effect the warp size indirectly, most likely. And all those extra, fast pipeline stages cost power..
- so it will be interesting to see if Kepler still has it, or if they go back to a more normal clocking scheme.
 
"Just like GF110, the GK104 comes in two different versions: the GeForce board will run double-precision at one quarter rate - while Quadro and Tesla will run at half-rate."

Apart from GK110, I am especially sceptic about this particular part. Not only does it make no sense to invest transistors here, but I imagine it to be quite costly in terms of die area for a performance part.
 
"Just like GF110, the GK104 comes in two different versions: the GeForce board will run double-precision at one quarter rate - while Quadro and Tesla will run at half-rate."

Apart from GK110, I am especially sceptic about this particular part. Not only does it make no sense to invest transistors here, but I imagine it to be quite costly in terms of die area for a performance part.

I only said a couple of pages ago that the infamous chiphell chart was as close as it can get for GK104. I also recall saying that the supposed "GK100" in that one is as wrong as it can get.

Yeah Quadros need DP for what exactly? Even dumber if you can get a Quadro for way less why bother with a Tesla SKU? :rolleyes:
 
I only said a couple of pages ago that the infamous chiphell chart was as close as it can get for GK104. I also recall saying that the supposed "GK100" in that one is as wrong as it can get.

Yeah Quadros need DP for what exactly? Even dumber if you can get a Quadro for way less why bother with a Tesla SKU? :rolleyes:

Aren't Quadros usually more expensive than Teslas?
 
The Quadro line of Fermi boards have full-rate primitive setup, AFAIK. 1/2 DP throughput is kept for Tesla SKUs. GeForce have neither of them (full-rate setup is only active with tessellation enabled) -- the only advantage for GF boards is higher stock clock-rate.

Looks like there's no SKU with all the features and performance enabled by default. :???:
 
I only said a couple of pages ago that the infamous chiphell chart was as close as it can get for GK104. I also recall saying that the supposed "GK100" in that one is as wrong as it can get.
I don't see the connection to quoting me.
edit: Ah, now... I was interpreting the Tesla and Quadro to be GK104 ASIC variants, not GK110.

Yeah Quadros need DP for what exactly? Even dumber if you can get a Quadro for way less why bother with a Tesla SKU? :rolleyes:
If you don't need maximum setup-rate for visible triangle, you could be tempted by 29% more DP-GFLOPS and 23% more bandwidth per card/processor.

@fellix:
No, Quadros also have max. DP throughput, but they come at max. 448 ALUs, so you're right after all.
 
Last edited by a moderator:

So that's 1536 SPs and no hot-clock double-confirmed then!
:D

Looks good

Memory bandwidth looks lacking though, but as others have said, the devils in the detail.

If die size is around 360mm2, then this is about the same size as the GF114
- pricing can be much lower than $349-$399 if they need to

And BSN is also confirming 2304 SPs for the GK110
- which would means at least a 540mm2 die, 50% more performance
- and they would need a more mature 28nm process to really make that one fly, so it makes sense to go with the smaller part for now..
 
So that's 1536 SPs and no hot-clock double-confirmed then!
:D

Looks good

Memory bandwidth looks lacking though, but as others have said, the devils in the detail.

If die size is around 360mm2, then this is about the same size as the GF114
- pricing can be much lower than $349-$399 if they need to

And BSN is also confirming 2304 SPs for the GK110
- which would means at least a 540mm2 die, 50% more performance
- and they would need a more mature 28nm process to really make that one fly, so it makes sense to go with the smaller part for now..

BSN "confirming"? :LOL:

I wonder how you get 50% more performance with GK110 vs GK104 assuming those specs would even be real, GF114 vs 110 for example with similar difference gives around 30-35% performance difference.

360mm^2 would make it about as big as Tahiti, too
 
So that's 1536 SPs and no hot-clock double-confirmed then!
:D

Looks good

Memory bandwidth looks lacking though, but as others have said, the devils in the detail.

If die size is around 360mm2, then this is about the same size as the GF114
- pricing can be much lower than $349-$399 if they need to

And BSN is also confirming 2304 SPs for the GK110
- which would means at least a 540mm2 die, 50% more performance
- and they would need a more mature 28nm process to really make that one fly, so it makes sense to go with the smaller part for now..
340mm2 is quoted here for the GK104

http://www.3dcenter.org/news/die-aktuellen-spezifikationen-zum-gk104-kepler-performance-chip

That would put the GK110 at 510mm2.

As for the clock frequency if 950mhz gets the GK104 to 2.9 TFLOPS single-precision I could see them work to get the actual clock frequency to 983mhz or above (probable 1 ghz) to get that TFLOPS single-precision number to 3 TFLOPS single-precision.
 
BSN "confirming"? :LOL:
I remember BSN's breaking story on "Intel Larrabee story" 2 months before Intel canned Larrabee. Theo was roasted in the forums on this story (including Charlie) but it turned out he was absolutely correct.

So do not blindly ignore Theo.

Charlie also can't confirm or deny anything right now as semiaccurate.com is off-line this morning

>> Firefox can't establish a connection to the server at semiaccurate.com.

I wonder how you get 50% more performance with GK110 vs GK104 assuming those specs would even be real, GF114 vs 110 for example with similar difference gives around 30-35% performance difference.
The GF114 is not just a cut down GF110 so you really can't draw the same conclusions for the GK110 vs GK104.

360mm^2 would make it about as big as Tahiti, too
340mm2 is quoted here:
http://www.3dcenter.org/news/die-aktuellen-spezifikationen-zum-gk104-kepler-performance-chip
 
Back
Top