NVIDIA Kepler speculation thread

Yes, it is. Just take a look at past generations, it's extremely rare that one is so much ahead of the other

Rare doesn't equal impossible though. At worst we'll see history repeat itself as we've seen multiple times since GT200; at best that rather boring string could somewhat break this time.

If you look at the speculated GK104 specifications some parts of those imply quite high differences compared to even GF110, while GK104 obviously according to those will lack in terms of ROPs, bandwidth and possible per FLOP efficiency against the latter. Else the net result could easily be at least equal to the GF110 if not slighly higher, yet of course nowhere near the increase some parts of the chip could imply.

All the above true the GK104 would hypothetically end up on average about 40-50% (always depending on the set of benchmarks used where you can always yield lower or higher persentages) faster than the GF114 and that's really not any sort of "uber-achievement" as some of you make it sound.

As for Tahiti vs. Cayman, the first obviously has quite a bit more bandwidth yet the amount of ROPs is the same. One thing I've noticed so far is that the difference between the two is smaller with 8xMSAA than with 4xMSAA:

http://www.computerbase.de/artikel/...adeon-hd-7970/10/#abschnitt_leistung_mit_aaaf

1920*1200
4xAA
7970 = 6970+36%
8xAA
7970 = 6970+26%

2560*1600
4xAA
7970 = 6970+42%
8xAA
7970 = 6970+31%
 
Rare doesn't equal impossible though. At worst we'll see history repeat itself as we've seen multiple times since GT200; at best that rather boring string could somewhat break this time.

If you look at the speculated GK104 specifications some parts of those imply quite high differences compared to even GF110, while GK104 obviously according to those will lack in terms of ROPs, bandwidth and possible per FLOP efficiency against the latter. Else the net result could easily be at least equal to the GF110 if not slighly higher, yet of course nowhere near the increase some parts of the chip could imply.

All the above true the GK104 would hypothetically end up on average about 40-50% (always depending on the set of benchmarks used where you can always yield lower or higher persentages) faster than the GF114 and that's really not any sort of "uber-achievement" as some of you make it sound.

Indeed it doesn't, which is why I only said yes to "hard to believe", not to "impossible" ;)

I don't doubt GK104 speculated specs [amount of SM's/GPC's etc] much, I doubt the claimed performance and believe it's at best GTX580 level, more likely slightly under.

The other thing I doubt is claimed GK100/110 being pretty much twice everything GK104 is, but in one chip.
 
Indeed it doesn't, which is why I only said yes to "hard to believe", not to "impossible" ;)

I don't doubt GK104 speculated specs [amount of SM's/GPC's etc] much, I doubt the claimed performance and believe it's at best GTX580 level, more likely slightly under.

Then why go through all the hussle and waste a ton of resources for a new architecture and not go for a simple die shrink instead?
 
Then why go through all the hussle and waste a ton of resources for a new architecture and not go for a simple die shrink instead?

DX11.1 support, hd audio support, other new features (3+ display support?) for starters?
(and if you think of it, the expected diesize of GK104 is somewhere below Tahiti, but not by much - GF110 shrunk to 28nm should be around that size too)
 
DX11.1 support, hd audio support, other new features (3+ display support?) for starters?
Those are no reasons for massive changes though. DX11.1 should be possible to do with minimal changes, and the rest are outside the 3d core. So you could do "little more than a shrink".
(Not saying though there aren't possibly good reasons for changes.)
 
Yes, it is. Just take a look at past generations, it's extremely rare that one is so much ahead of the other

Who is ahead? GK104 is similar in size to Tahiti, but might enjoy the advantages of getting rid of some of the GPGPU stuff. So if it ends up around the 7950 level, then I would call it a draw. I think it not normal that a performance GPU could challenge a high-end GPU, like the 69700 could last generation. But then it lacks some GPGPU features compared to GF100/110. This round we might see the results, which could be expected when neither side has huge faults and problems.
 
DX11.1 support, hd audio support, other new features (3+ display support?) for starters?
(and if you think of it, the expected diesize of GK104 is somewhere below Tahiti, but not by much - GF110 shrunk to 28nm should be around that size too)

None of it requires a new architecture. It's all doable with relatively small effort on Fermi, unless you'd think that AMD went to GCN for something as ridiculous as DX11.1.
 
A shrink of the same overall design with modifications to allow it to work on the new process can be done, but the lessons that applied at one node may be invalid on the next.
An example of this would be AMD's experience running the base K7/K8 design ragged across so many node transitions.
The example where a redesign that reflects the realities of the scaling challenges of the future is no guarantee of success is AMD's experience replacing it.

In Nvidia's case, there is the need for more intensive power savings, which is not so easily shoehorned into a design.
If the hot clocks are dropped, it may also be that the power/area tradeoffs that were a net positive at 40nm and above are not that great for 28nm onward.
 
Variability seems to be a problem with 28nm too, and even Intel seems to have to combat leakage rather aggressively/extensively. Depending on design choices, this could give hotclocks another node's worth of lifespan.

--
btw:
http://semiaccurate.com/2012/02/09/there-are-two-gk104kepler-variants/comment-page-1/#comment-17595
Sources are now telling SemiAccurate that Nvidia has two variants of the GK104 in the pipe. These two variants hint at a finer grained fusing ability for the end product.

The two siblings are said to be GK104-400 and GK104-335, basically a full working and partially fused off version of the same chip. The -400 is said to be an “8 group” device, the -335 described as “7 group’. If you recall the sad tale of Fermi/GF100, the chip had large swathes of shaders turned off, the ability to do less radical surgery was not there. This is a fairly painful way to deal with defects, the more granular you can make the disabling, the better off you are.
I am in dire need for an explanation how GTX 560 Ti is not an 8-group part and GTX 560 non-TI is not a 7-group part.
 
http://semiaccurate.com/2012/02/09/there-are-two-gk104kepler-variants/comment-page-1/#comment-17595

I am in dire need for an explanation how GTX 560 Ti is not an 8-group part and GTX 560 non-TI is not a 7-group part.
Lol that's rather funny indeed. GF100 was even a "16 group" part at that which is of course even more fine grained (and I think nvidia sold just about every possible version of it ranging from 8 to 15 SMs except of course 16 SMs were reserved for GF110).
 
Variability seems to be a problem with 28nm too, and even Intel seems to have to combat leakage rather aggressively/extensively. Depending on design choices, this could give hotclocks another node's worth of lifespan.

--
btw:
http://semiaccurate.com/2012/02/09/there-are-two-gk104kepler-variants/comment-page-1/#comment-17595

I am in dire need for an explanation how GTX 560 Ti is not an 8-group part and GTX 560 non-TI is not a 7-group part.

Well, the regular GTX 560 Ti is GF114-based, not GF100.

That said:

(7/8 = 0.875 = Tahiti_Pro/Tahiti_XT = 1792/2048) < (GF100/GF110 = 480/512 = 0.9375)

So err… that still makes no sense.
Edit: oops, mczak beat me to it.
 
Not really "reserved for GF110"; GF100 just wouldn't give decent yields with all 16 SMs enabled, or at least that was my impression.
 
Variability seems to be a problem with 28nm too, and even Intel seems to have to combat leakage rather aggressively/extensively. Depending on design choices, this could give hotclocks another node's worth of lifespan.

--
btw:
http://semiaccurate.com/2012/02/09/there-are-two-gk104kepler-variants/comment-page-1/#comment-17595

I am in dire need for an explanation how GTX 560 Ti is not an 8-group part and GTX 560 non-TI is not a 7-group part.

Yes, exactly, he's at it again....
The GF100 & GF110 had 16 SMs, and could fuse of a SM (i.e. a granularity of 1/16)
- Charlie though this was 'large swathes of shaders turned off'
- IIRC, Cypress had a granularity of 1/20, so no big difference, except of course, if you're a certain Mr C....
- and the GF104/GF114 were 8 SM chips, with a granularity of 1/8

Fast forward to today, and we have Tahiti with it's ability to disable a CU Array, What? Wait, one of it's 8 CU Array's can be disabled to get the HD7950?
- ok, I get it, a granularity of 1/8 is now great!
:rolleyes::rolleyes::rolleyes:
 
Not really "reserved for GF110"; GF100 just wouldn't give decent yields with all 16 SMs enabled, or at least that was my impression.
Yes I really meant "reserved" as no card was released which actually had all 16 SMs enabled.
Of course if you can't sell the full parts at all even if you have that granularity you're still screwed
(though I don't think that not selling GF104 in the full configuration really was because of issues with this chip itself).
Though "8 group part" of GK104 would probably indicate 8 SMs. Must be fat SMs then (not sure if 4 GPCs would still make sense with only 8 SMs).

- IIRC, Cypress had a granularity of 1/20, so no big difference, except of course, if you're a certain Mr C....
Not quite I'm quite sure you couldn't disable just one simd it had to be one from either group hence granularity was 1/10.
 
Last edited by a moderator:
http://semiaccurate.com/2012/02/09/there-are-two-gk104kepler-variants/comment-page-1/#comment-17595
Sources are now telling SemiAccurate that Nvidia has two variants of the GK104 in the pipe. These two variants hint at a finer grained fusing ability for the end product.

The two siblings are said to be GK104-400 and GK104-335, basically a full working and partially fused off version of the same chip. The -400 is said to be an “8 group” device, the -335 described as “7 group’. If you recall the sad tale of Fermi/GF100, the chip had large swathes of shaders turned off, the ability to do less radical surgery was not there. This is a fairly painful way to deal with defects, the more granular you can make the disabling, the better off you are.
Clearly NVIDIA has the ability to disable CCs in groups of only 2, that's how fine grained it is. So the GK104-400 has 800 CCs and the GK104-335 has 670 CCs. :D
 
Yes I really meant "reserved" as no card was released which actually had all 16 SMs enabled.

Reserved stands in my book for something that has been held back on purpose for a successing part. In the GF100 case it sounded more like a one way street than anything else.

Of course if you can't sell the full parts at all even if you have that granularity you're still screwed
(though I don't think that not selling GF104 in the full configuration really was because of issues with this chip itself).

GF104 might have been an exception because a full version would had come too close to GF100 salvage parts. I can't exclude however that there wasn't any problem either with GF104, since there were only 7 SM parts. Why not use f.e 8 SM parts for Quadros only?

Though "8 group part" of GK104 would probably indicate 8 SMs. Must be fat SMs then (not sure if 4 GPCs would still make sense with only 8 SMs).

6*32? That would mean a crapload of SPs/SM and just 2 GPCs.
 
IF 8 SMs is right and a SM still does 2pix it makes sense that it has 16 ROPs(+16ROPs w MSAA) w 256bit bus.. my GTX 460(900/4200MHz) uses 80% of it's memory controller @vantage color fillrate test.. and i've never seen it went above 60% when gaming hence why it scales with gpu clock linearly..
 
Last edited by a moderator:
GF104 might have been an exception because a full version would had come too close to GF100 salvage parts. I can't exclude however that there wasn't any problem either with GF104, since there were only 7 SM parts. Why not use f.e 8 SM parts for Quadros only?
Well I don't know but no GF104 based Quadro parts exist, either because of the higher geometry throughput of GF100 or because those 8 SM GF100 had to go somewhere :).
If you think it really had so big problems that they didn't have enough chips with 8 working SMs (and certainly clock didn't seem to be much of an issue neither as shown by overclocking), you'd think there were also parts which had less than 7 good SMs. How many products did nvidia sell again based on GF104 which had less than 7 SMs enabled?

6*32? That would mean a crapload of SPs/SM and just 2 GPCs.
Personally I don't think there'd be much wrong with that. Yes this cuts back geometry significantly compared to shader capability, but the ratio would still in fact be sligthly higher than that of Tahiti (granted might need to beef up GPC rasterization rate a bit). I can see though how that might be seen as a step back.
But OTOH (all just based on the "8 group part" comment) with 8 SMs you've got the geometry cut back anyway, why do you need 4 GPCs for that (so just 2 SMs/GPC).


IF 8 SMs is right and a SM still does 2pix it makes sense that it has 16 ROPs(+16ROPs w MSAA) w 256bit bus.. my GTX 460(900/4200MHz) uses 80% of it's memory controller @vantage color fillrate test.. and i've never seen it went above 60% when gaming hence why it scales with gpu clock linearly..
I think if the shader alus in a SM really are beefed up by a factor of two it would make sense if the SMs also had twice the shader export capabilities.
 
Clearly NVIDIA has the ability to disable CCs in groups of only 2, that's how fine grained it is. So the GK104-400 has 800 CCs and the GK104-335 has 670 CCs. :D

It could be one of Charlie's infamous ruses to winkle out copy cats
- 335 being an unlikely number for anything....
- even 400 is a strange number, of say, CCs.....
- but if 400 was correct, then 336 would be more logical
- but, doesn't call into the 1/8 scheme, ....

So probably both numbers are made up, somewhere along the line....
:LOL:
 
How many products did nvidia sell again based on GF104 which had less than 7 SMs enabled?
For Desktop one part, actually. GTX 460 SE.


edit
Just for the record: GK104-400 and -335 are product numbers only, not unit counts. GF100 had -275 and -375 suffixes for example.
 
Last edited by a moderator:
Back
Top