NVIDIA Kepler speculation thread

Each GK104 SMX takes ~16.5 mm^2. One additional SMX per GPC would add ~66 mm^2, bringing the die size up to 360 mm^2. In that case they might jump up to a 320-bit memory interface, but I wouldn't bet on it. Either way, such a GK114 shouldn't have any trouble keeping up with a 2560sp/160tmu HD89XX.
Just adding SMX won't do much for performance - just look at gtx 670 vs gtx 680, clock them the same and that additional smx amounts to something like a 2% advantage. Adding four of them without other changes might gain you like 5% just enough to catch the 7970 Ghz edition, a colossal waste of die space (for a non-compute part that is). Heck nvidia didn't even enable the 8th SMX on their top mobile part, even though it is generally more power efficient to go with more units at lower clocks. "Doesn't scale with shader units" is a property it shares with Tahiti but Tahiti successor still serves as compute part hence the rules for what makes sense or not to add are slightly different (but of course a Tahiti successor with just more CUs would also be hardly an improvement over Tahiti for gaming).
It seems though gk104 is not totally limited by memory bandwidth (increasing core clocks still helps some, but certainly it is limited by memory bandwidth to some degree), I don't know if adding another GPC would help more than just adding SMX, but something like 1 GPC more and also a 64bit memory channel more sounds like it would be way faster than just tacking on SMX. Unless there'd be some other changes increasing efficiency somehow.
 
Last edited by a moderator:
It's not ludicrous at all. GK110 wasn't designed with games as a primary concern,

Oh, it seems so.
The same about GK114 is true too. It is not designed with heavy computation loads in mind and it will suck in that regard, I presume.
I hope we wil see soon tho.

and from the beginning, even NVIDIA wasn't sure it would end up in gaming cards

To me this can't have anything in common with reality. If true, doesn't speak well about their engineers. I mean how on Earth it is possible that they were not sure?!
 
It's not ludicrous at all. GK110 wasn't designed with games as a primary concern, and from the beginning, even NVIDIA wasn't sure it would end up in gaming cards.

Which is not to say that OBR is not making random crap up, but it's possible.


But a flagship gaming card can have a ridiculous dollar mark up and a ridiculous watts mark up.
So I'm sure a GK110 is fine, unless a radeon 8970 is strong and would make it look bad.
 
GK110 is certainly no stranger to games. Its extensive virtualization additions are perfectly suited for remote/cloud game rendering.

But regular Kepler might be enough. For remote use (I'd say thin client more than cloud) they have a card with four GK104. With heavier demands you might stack geforce 690 or equivalent rather than GK110 cards.
 
But a flagship gaming card can have a ridiculous dollar mark up and a ridiculous watts mark up.
So I'm sure a GK110 is fine, unless a radeon 8970 is strong and would make it look bad.

It all depends on how much faster it can be compared to a high-clocked GK114, and how power-efficient. If it's much faster than the 8970, then maybe it can command a significant premium; if not, the resulting margins might not be worth it.

If you throw a dual-GK114 into the mix, then things get even less clear. The same goes for the 8990, to an extent.
 
I'd love to stand corrected, but how would anyone suggest NV would cover the extreme R&D expenses for GK110 without a desktop variant from it? Professional markets are typically high margin-low volume while desktop the exact opposite (low margin-high volume).

As for the funky suggestions I can read above from Videocardz, assuming it would be a GK114 adding N amount of more SMXs (and probably a wider bus) doesn't automatically mean that the end result would equal GK110 performance and most definitely not in terms of compute performance unless they've hypothetically pumped up register file and cache sizes and what not where the result is still probably below GK110 3D performance but with quite a bit more die area than GK104 and obviously quite a bit more power consumption for what kind of gain exactly other than quite some higher overall development costs?

Besides that, when would anyone suggest NV came up with any such idea and mastered in no time to develop a decent enough replacement for GK110? In a couple of months maybe?

Even worse if NV actually intends to call an upcoming higher end GPU "GTX780" GK110 would fit that marketing image since it has unique and exlusive capabilities compared to current 6xx cores.

Granted I could be completely wrong, but these are the times where my gut feeling plain and simply screems bullshit.
 
A GK104 refresh is no reason to expect GK110 will not be sold to consumers.

Each GK104 SMX takes ~16.5 mm^2. One additional SMX per GPC would add ~66 mm^2, bringing the die size up to 360 mm^2. In that case they might jump up to a 320-bit memory interface, but I wouldn't bet on it. Either way, such a GK114 shouldn't have any trouble keeping up with a 2560sp/160tmu HD89XX.

They could also just add 1 GPC at ~340 mm^2. Such a part would probably end up fairing like the GTX680 does against the 7970 1GHz edition.

In either case, neither would come very close to the performance of GK110. If it is an HPC only chip, then why does it have 6 GPCs and 240TMUs?

EDIT:
Source?

GK104 is already memory bandwidth limited. Adding more computational power will have very little benefit in most situations. A 320-bit memory bus at the same 6.0ghz speeds would increase bandwidth, TMU's, and ROP's by 25%, but at the same time if they add in another 2 SMX units (384 cores), they will have just increased the core computational power by 25% as well - thus STILL remaining bandwidth limited on the new chip. They either need to go with faster vram - which would be hard to do since gddr5 memory controllers are approaching near theoretical max for what kinds of memory speeds they can handle (AND also no one is mass producing 7ghz gddr5 to my knowledge), or they need to move to a 384-bit bus which would be time consuming and cost more die space creating and reworking the memory controller interconnects.

I think the 384-bit bus is probably the more likely scenario, but if Nvidia is confident they can get their memory controllers to yield good at handling around 6.6ghz vram speeds, then that solution would probably be the easiest and most cost effective for them (assuming 7ghz vram doesn't cost an arm and a leg more than 6ghz vram). For rerefernce, 6.6ghz gddr5 on a 320-bit bus would yield 264 gb/s bandwidth (a 37.5% bandwidth increase over what the gtx680 has now). 6ghz on a 384-bit bus yields 288 gb/s. 7ghz on a 320-bit bus would yield 280 gb/s.
 
Last edited by a moderator:
I'd love to stand corrected, but how would anyone suggest NV would cover the extreme R&D expenses for GK110 without a desktop variant from it? Professional markets are typically high margin-low volume while desktop the exact opposite (low margin-high volume).

As for the funky suggestions I can read above from Videocardz, assuming it would be a GK114 adding N amount of more SMXs (and probably a wider bus) doesn't automatically mean that the end result would equal GK110 performance and most definitely not in terms of compute performance unless they've hypothetically pumped up register file and cache sizes and what not where the result is still probably below GK110 3D performance but with quite a bit more die area than GK104 and obviously quite a bit more power consumption for what kind of gain exactly other than quite some higher overall development costs?

Besides that, when would anyone suggest NV came up with any such idea and mastered in no time to develop a decent enough replacement for GK110? In a couple of months maybe?

Even worse if NV actually intends to call an upcoming higher end GPU "GTX780" GK110 would fit that marketing image since it has unique and exlusive capabilities compared to current 6xx cores.

Granted I could be completely wrong, but these are the times where my gut feeling plain and simply screems bullshit.


I think we'll still get a GK110 based Geforce card at some point down the line. TSMC thinks a first run of low volume 20nm products will be sometime next year (LINK HERE) but that will likely get delayed and the costs of 20nm are going to be very high at first. Nvidia could be planning on a gtx780 based on this mystery Kepler card, with GK110 possibly filling gtx785 and gtx780 TI slots in a limited run in the second half of 2013.

EDIT: Or it could be that the demand for HPC is finally high enough that there is no reason to produce another 100,000 chips to use in Geforce branded cards. Nvidia has said themselves that Kepler preorders have exceeded total sales of all their past HPC sales combined, so it is very possible they are finally splitting their lineup. If that is the case, then this mystery Kepler card (GK112???), if it exists, could in fact be a really, really beefed up GK104 capable of producing GK110 like performance in graphics with a lower power target and w/o the extra compute fat on the die.
 
Last edited by a moderator:
tviceman said:
GK104 is already memory bandwidth limited.
Which is why the GTX680 fares badly in games compared to the HD7950...

I've heard GK110 has full rate DP and won't be much faster than GK104 in games too...
 
I think we'll still get a GK110 based Geforce card at some point down the line. TSMC thinks a first run of low volume 20nm products will be sometime next year (LINK HERE) but that will likely get delayed and the costs of 20nm are going to be very high at first. Nvidia could be planning on a gtx780 based on this mystery Kepler card, with GK110 possibly filling gtx785 and gtx780 TI slots in a limited run in the second half of 2013.

EDIT: Or it could be that the demand for HPC is finally high enough that there is no reason to produce another 100,000 chips to use in Geforce branded cards. Nvidia has said themselves that Kepler preorders have exceeded total sales of all their past HPC sales combined, so it is very possible they are finally splitting their lineup. If that is the case, then this mystery Kepler card (GK112???), if it exists, could in fact be a really, really beefed up GK104 capable of producing GK110 like performance in graphics with a lower power target and w/o the extra compute fat on the die.

What has 20nm to do with GK110? As for the last sentence it sounds awefully simple to yield GK110 alke 3D performance with a GK1x4 refresh and that at same time additional compute efficiency (more die area) and on top of that a lower power target. I leave it up to you to detect the oxymoron in that last sentence.

At the very best a GK1x4 refresh you're describing will be highly competitive with Tahiti's successor and that most likely on the 3D level only.

Which is why the GTX680 fares badly in games compared to the HD7950...

I must be reading the wrong reviews then.

I've heard GK110 has full rate DP and won't be much faster than GK104 in games too...

Then maybe you shouldn't rely as much on hearsay and read either official GK110 whitepapers or a few serious editorials out there about its architecture. GK110 has officially a 3:1 SP/DP ratio and that's clearly not a full DP rate.
 
What has 20nm to do with GK110?

It has everything to do with product release cycles. If it isn't coming until mid 2014, a refreshed Kepler line starting in January-February (or even December) would have a long run. Being able to release a "new" product next fall between entirely new cycles would be a great way to keep sales going.

Anyways, Nvidia has already said GK110 wafers will get allocation to HPC until that demand is met. And again, if the demand is high enough (from what nvidia is saying, it is), then we won't see GK110 as a Geforce product for some time, if ever. I was of the mindset there was without a doubt going to be a Geforce GK110 card, but now I question if there will be one, especially if Nvidia decided to beef up GK104.
 
Which is why the GTX680 fares badly in games compared to the HD7950...

I've heard GK110 has full rate DP and won't be much faster than GK104 in games too...

DP has nothing to do with gaming. Use common sense and extrapolate GK110's performance - it has 87.5% more cores and 50% more bandwidth (and likely ROP's and TMU's) than GK104.
 
You misunderstood my point (ironically while illustrating it perfectly)... but I digress.
 
Last edited by a moderator:
DP has nothing to do with gaming. Use common sense and extrapolate GK110's performance - it has 87.5% more cores and 50% more bandwidth (and likely ROP's and TMU's) than GK104.
The 16 TMU/smx just as with gk10x chips are confirmed (so 87.5% more there as well - that's a lot of texture units...), and IIRC the 48 ROPs and 5 GPCs are as well.
Clocks (and number of enabled smx for the top gaming card) are not, however, but with the expected somewhat lower (core) clocks it seems safe to say there will a very healthy increase (compared to gtx 680) in shader power (alus and tmus) and memory bandwidth (both probably around 50%), a more modest increase in rop throughput, and very little (if any) increase in frontend/rasterizer throughput (unless those 5 rasterizers would be beefed up but I don't think so).
 
It has everything to do with product release cycles. If it isn't coming until mid 2014, a refreshed Kepler line starting in January-February (or even December) would have a long run. Being able to release a "new" product next fall between entirely new cycles would be a great way to keep sales going.

Anyways, Nvidia has already said GK110 wafers will get allocation to HPC until that demand is met. And again, if the demand is high enough (from what nvidia is saying, it is), then we won't see GK110 as a Geforce product for some time, if ever. I was of the mindset there was without a doubt going to be a Geforce GK110 card, but now I question if there will be one, especially if Nvidia decided to beef up GK104.

I agree, GK110 can live on its own and even then after Tesla there are Quadros.
Think of IBM, they make POWER7 without selling it to consumers because they think there's a big enough market for it at a nice high price. Even though it's a huge chip.
They had a consumer version of POWERn at a point, the G5 derived from POWER4, similar to a GK104/GK110 situation but abandoned even that.

The "extreme R&D expenses" for GK110 are still there but offset by the GK107 to GK104 sharing the same base architecture.
This time GK110 is a lot more desirable GPGPU too and the user base grows significantly, most customers who will get it only had vanilla x86 racks or even just beige towers.

----------

Maybe GK104 refresh is GK124, the GK110 is in its own class. To make an analogy with past products GK104 would be G92, GK110 the GT200 and GK124 the cancelled GT212.
 
It has everything to do with product release cycles. If it isn't coming until mid 2014, a refreshed Kepler line starting in January-February (or even December) would have a long run. Being able to release a "new" product next fall between entirely new cycles would be a great way to keep sales going.

Who says that AMD/NV product refreshes will appear either in late 12' or early 13' anyway? What's the point anyway of releasing product family after product family on a quite frequent schedule for 28nm when cost and in extension final product prices are still too high?

What the 28nm GPU standalone market right now needs are way bigger manufacturing capacities and quite a bit lower prices in order for sales to pick up.

Anyways, Nvidia has already said GK110 wafers will get allocation to HPC until that demand is met. And again, if the demand is high enough (from what nvidia is saying, it is), then we won't see GK110 as a Geforce product for some time, if ever. I was of the mindset there was without a doubt going to be a Geforce GK110 card, but now I question if there will be one, especially if Nvidia decided to beef up GK104.

I don't doubt the last sentence per se, I doubt that they had the time to develop a "beefier" GK1x4 refresh on such short notice. It's not like they've seen early this year that GK110 might not be needed for desktop after all and they just decided within a couple of months to further pump up an already existing design which was years in development. Either they took that decision much earlier in the past (which would indicate that they never really intended to release 110 for desktop, for which there's no clear indication) or the GK104 refresh will simply end up just highly competitive to Tahiti's successor just as GK104 vs. Tahiti is.
 
The 16 TMU/smx just as with gk10x chips are confirmed (so 87.5% more there as well - that's a lot of texture units...), and IIRC the 48 ROPs and 5 GPCs are as well.
Clocks (and number of enabled smx for the top gaming card) are not, however, but with the expected somewhat lower (core) clocks it seems safe to say there will a very healthy increase (compared to gtx 680) in shader power (alus and tmus) and memory bandwidth (both probably around 50%), a more modest increase in rop throughput, and very little (if any) increase in frontend/rasterizer throughput (unless those 5 rasterizers would be beefed up but I don't think so).

That's not even the entire story since SMX != SMX between GK110 and 104. It doesn't take too long to consider register file and/or cache amount differences (amongst others) between the two to see that it's not an as simple equasion. GF114 has 2/3rd the ALU amount of GF110 (at higher frequencies), the same amount of TMUs and it's not just the bandwidth and/or ROP amount difference that gives the latter a ~42% lead in average performance.

I'd love to stand corrected but I have severe doubts that if you give a GK104 refresh 50% more units a wider bus and a healthy amount of additional bandwidth without any deeper changes that the net end result would increase performance as much as with a GK110.
 
If when GK110 is in production it is Nvidia's fastest GPU, would it really not find it's way onto benchmark charts?

That depends on whether it is faster than GK114 in most gaming workloads.

What if it was actually slower in gaming workload but toasted it in computational workloads?

What if it was the same speed or only 5% faster in gaming workloads but toasted it in computational workloads?

How much faster than GK114 and 8970 must it be before it even makes sense to release it in the consumer channel?

What if it consumes so much more power than GK114 that GK114 can clock significantly higher and thus ends up significantly faster in gaming workloads?

Add to that GK114 likely being significantly smaller and cheaper to manufacture.

BTW - just to be clear, absolutely none of this is even speculation. It's just hypothetical situations where a GK110 is hugely fast in computational workloads but as a result potentially suffers in gaming workloads.

The following is purely speculation on my part.

I personally feel that Nvidia is moving towards their xx0 products (GK110 for example) being targeted almost purely for the professional and HPC markets while their xx4 becomes the top of the line consumer card.

Considering that they had problems with GF100 and presumably now with GK100 (potentially canned before they even had silicon), it may make sense for them to no longer target the consumer market with their largest die chip. Doing so may allow them to reduce the complexity and make it slightly smaller than if it had to also be the fastest with consumer gaming tasks.

I can certainly see where a high computation/low gaming performance card might make sense.

Or they could be like ATI with the x20 chips. Just stop making them and make the x70 chips the top of the line models.

Regards,
SB
 
Back
Top