NVIDIA Kepler speculation thread

rpg.314 said:
Corporate schizophrenia?
How about 'irrelevant' ? :wink
I mean: only a very narrow audience is going to be pedantic enough to care and they'll account for 0.1% of the actual customers. The broad trend lines are more important.
 
The first GPU with ECC (a requirement for serious HPC) came to market less than 18 months ago and already has a strong presence in the fastest computers in the world. That's a very fast uptake by any measure.

Well, some people would do HPC with clusters of gaming consoles, but I'll admit that the lack of ECC was probably a big issue for most people.

Still, strong presence sounds like a overstatement to say the least. The Top500 website doesn't give statistics for GPU-accelerated machines, but I counted 12 computers among the 500. Does 2.4% qualify as strong presence, especially considering that the fastest of the bunch are often PR moves with very low margins? NVIDIA is clearly making inroads, but it's hardly anything spectacular yet.

Besides, they're pretty much alone at this point: AMD doesn't really have a credible computing GPU offer, at least until Southern Islands comes out, at which point the competitive situation should make market penetration measurements more meaningful. Ditto for Intel Knights products.

Well whatever it is, they're currently making less from their half of the market than nVidia. At first blush it would seem they're trading margin for marketshare.

Again, Quadros! I doubt NVIDIA is making significantly (if at all) more than AMD on consumer products, but AMD's FirePro sales are so small that they're practically irrelevant. Quadros, not so much.

Anyhow, my point is that Tegra and Tesla successfully invaded two new markets previously unavailable to nVidia. They seem to know what they're doing and if they claim Kepler is 3x as power efficient as Fermi I'll take their word on it unless proven otherwise.

That claim about Kepler was originally made in September 2010, most likely long before they had any silicon in hand, so that was an estimate based on simulations. From Fermi, we all know how reliable those can be, right? :D

Still, I'm not saying they won't meet this 3× improvement target for performance/W, but I'm not convinced it will be met for the same 300W budget as Fermi. In other words I don't think Kepler will be three times as fast as the GTX 580 or even 480.
 
Still, strong presence sounds like a overstatement to say the least.

Look at the number of recent entrants to the top 10 and count how many are GPU powered. Looking at the list of established (and old) members of the list isnt very useful unless you're going to great lengths to downplay the emergence of GPUs.

Besides, they're pretty much alone at this point: AMD doesn't really have a credible computing GPU offer, at least until Southern Islands comes out, at which point the competitive situation should make market penetration measurements more meaningful. Ditto for Intel Knights products.

I'm not sure what you mean. If AMD GPUs and Knight's Ferry are successful at HPC it would only further validate nVidia's claims. I'm actually very interested in AMD's strategy to see how well they can leverage OpenCL in any attempt to gain a foothold here. It's still lagging behind in key areas.

Again, Quadros! I doubt NVIDIA is making significantly (if at all) more than AMD on consumer products, but AMD's FirePro sales are so small that they're practically irrelevant. Quadros, not so much.

We can guess till the cows come home. Or we can accept the fact that nVidia made 140m more than AMD's graphics division last quarter. Hard to believe the consumer segment lost money with that large of a gap. Would need to see numbers to change my mind.

That claim about Kepler was originally made in September 2010, most likely long before they had any silicon in hand, so that was an estimate based on simulations. From Fermi, we all know how reliable those can be, right? :D

Right. There was also a more recent quote about Kepler hitting 5Gflops/w. Not impressive at all considering the M2090 is already at 3Gflops/w.*

Still, I'm not saying they won't meet this 3× improvement target for performance/W, but I'm not convinced it will be met for the same 300W budget as Fermi. In other words I don't think Kepler will be three times as fast as the GTX 580 or even 480.

What I find amusing is that Fermi is supposedly the most broken and inefficient architecture ever, yet there's so much skepticism that nVidia can improve it significantly. The only conclusion is that their engineers are simply incompetent, no? :D

Agreed that nobody should expect 3x absolute performance from Kepler. They need to throw in lower power consumption as part of the deal.*
 
What I find amusing is that Fermi is supposedly the most broken and inefficient architecture ever, yet there's so much skepticism that nVidia can improve it significantly. The only conclusion is that their engineers are simply incompetent, no? :D
GF100 was broken, but that had nothing to do with the high-level architecture. Is Fermi on the whole broken or inefficient? Well I don't think it's quite the most efficient design, some things a bit overkill most likely (like the 4 raster engines on GF100/GF110), I'm not quite sure the fully distributed vertex processing was worth it neither (all things told, 4 of these Polymorph Engines don't really appear to be faster than one from AMD, but since we have no idea (well I don't) how these solutions compare in transistor count (or die area) it's hard to tell if it is really inefficient).
So I certainly think the design can be improved, but not to the point of being "3 times better" (though as I said, I think it really translates more to 2-2.5 compared to anything non-GF100). And I don't know how much 28nm is going to help, but a factor of 2 sounds maybe doable.
 
Edit: Looks like I looked at the numbers a bit wrong. 5.6% is the percentage of total systems that are DX11-capable. There are many more systems that use DX11 GPUs but an older OS (roughly 3-4 times as many as those who have both DX11 and Windows Vista/7!). However, DX11 isn't everything, and nVidia still has massive overall marketshare.

Sorry for the brief OT but you must have read the data incorrectly. 5.6% of systems are DX11 capable (DX11 GPU+OS) and 16.76% of systems are DX10/11 GPUs but with WinXP (this must be where the confusion is, most of those will be DX10 GPUs). Logic dictates that almost every system with a DX11 GPU will have an OS that supports it.

Carry on :)
 
How about 'irrelevant' ? :wink
I mean: only a very narrow audience is going to be pedantic enough to care and they'll account for 0.1% of the actual customers. The broad trend lines are more important.

Fair enough. But it reflects poorly on the company anyway.
 
GF100 was broken, but that had nothing to do with the high-level architecture. Is Fermi on the whole broken or inefficient? Well I don't think it's quite the most efficient design, some things a bit overkill most likely (like the 4 raster engines on GF100/GF110), I'm not quite sure the fully distributed vertex processing was worth it neither (all things told, 4 of these Polymorph Engines don't really appear to be faster than one from AMD, but since we have no idea (well I don't) how these solutions compare in transistor count (or die area) it's hard to tell if it is really inefficient).

Nvidia said, distributed geometry and raster grew the chip by 10% compared to a traditional solution. WRT to advantages (or the lack of it) compared to Cayman, keep in mind that every Fermi-type except for Quadros has an artificial limit of 1 triangle per clock if it's actually drawn. Only triangle culling is running at full speed.
 
Nvidia said, distributed geometry and raster grew the chip by 10% compared to a traditional solution.
Yes but it's not exactly clear what the comparison point here really is, could be something very slow at least for tesselation.
WRT to advantages (or the lack of it) compared to Cayman, keep in mind that every Fermi-type except for Quadros has an artificial limit of 1 triangle per clock if it's actually drawn. Only triangle culling is running at full speed.
IIRC that's only really true for non-tesselated triangles.
Regardless, that's one of the reasons it's overdesigned (for gaming use anyway) 1 tri or 4 per clock probably makes squat of a difference (well could always try to see if a quadro card running at the same clocks is actually a tiny bit faster...).
 
The comparison point was IIRC something that was using a single geometry engine - of course, making something fast throughout a chip instead of just bolting it on probably requires changes in other parts like data paths etc. I asked that and they said back then, the total cost for distributed geometry including all those side effects made the chip grow about 10 percent.

WRT to tessellated triangles you're right, but then I think nobody debates that if Nvidia has any advantage at all, it is with tessellation, so we were talking about normal gaming use anyway.

WRT the Quadro comparison: I think, both vendors saw that normal triangle, vertex or even geometry shader throughput is not a limiting factor. At least not compared to letting geometry run full speed and subsequently killing off gaming performance or creating other unforeseeable caveats. Otherwise, I think, AMD would not have tuned down their geometry processing in R600 quite soon after launch and after having demonstrated a "50x" advantage over G80 in geomtry shaders.
 
Look at the number of recent entrants to the top 10 and count how many are GPU powered. Looking at the list of established (and old) members of the list isnt very useful unless you're going to great lengths to downplay the emergence of GPUs.

Looking at the Top10 introduces a strong bias, because these deals carry so much PR weight that IHVs are often willing to settle for unusually low margins to get them.

That said, I'm not trying to downplay the emergence of GPUs for HPC, it's happening, and the market is growing; but these are still the early phases.


I'm not sure what you mean. If AMD GPUs and Knight's Ferry are successful at HPC it would only further validate nVidia's claims. I'm actually very interested in AMD's strategy to see how well they can leverage OpenCL in any attempt to gain a foothold here. It's still lagging behind in key areas.

Yes, it would validate NVIDIA's claims, and I bet it will.

But I mean that for now, NVIDIA has the entire GPGPU—or perhaps I should say parallel coprocessor—market almost all to themselves. But if AMD and Intel are successful with GCN and Knights, they'll have to share it.

So in a few quarters the market may be 2 or 3 times as large as it is now, but if they have to share it with 2 more competitors, their absolute share may not grow much. It could even decrease, considering that competition tends to pull prices down. I don't expect a raging price war, because to a large extent, GPUs are already competing with CPUs for HPC dollars, but a 20~30% price drop across the board wouldn't surprise me at all.


We can guess till the cows come home. Or we can accept the fact that nVidia made 140m more than AMD's graphics division last quarter. Hard to believe the consumer segment lost money with that large of a gap. Would need to see numbers to change my mind.

You can't deny the fact that Quadros make a lot of money either. I don't know if whether the consumer segment made a profit, but if so it can't have been much.


Right. There was also a more recent quote about Kepler hitting 5Gflops/w. Not impressive at all considering the M2090 is already at 3Gflops/w.*

Yeah. Without knowing more than perf/W it's impossible to conclude anything, really.


What I find amusing is that Fermi is supposedly the most broken and inefficient architecture ever, yet there's so much skepticism that nVidia can improve it significantly. The only conclusion is that their engineers are simply incompetent, no? :D

Agreed that nobody should expect 3x absolute performance from Kepler. They need to throw in lower power consumption as part of the deal.*

Fermi is rather Inefficient for gaming, but for compute it's pretty good. It's just that NVIDIA is claiming an improvement that's well above what we usually see in the industry, and there's not a lot of low-hanging fruit in Fermi. I suppose a complete redesign of the cache hierarchy could go a long way towards improving power-efficiency, probably at the expense of area-efficiency. Removing fixed-function hardware (or at least decreasing the FF/GP ratio) would also help.

Anyway, 3× Fermi's power-efficiency is doable, but probably not the way we'd like to see it. In all likelihood, there will still be 250~300W SKUs, and the really interesting question is how those will perform.
 
Removing fixed-function hardware (or at least decreasing the FF/GP ratio) would also help.
Since area efficiency for Tesla matters a lot less than GeForce margins unless they make a dedicated chip for GPGPU, a very good solution would be to power gate (rather than simply clock gate) as much graphics-centric hardware as possible. There is a small area cost to power gating but if done properly it could even help graphics power efficiency (e.g. workloads with a very low amount of texturing, or very long shaders that let the ROPs idle a lot). You'd probably want to be able to shut down these blocks for hundreds of cycles to be really worth it but given the size of GPU buffers (which is very easy for compute but hard for graphics) but still there should definitely be cases where this can be done at no risk.

I'm pretty darn sure Fermi doesn't do this, and I have no idea at all about Kepler, but it would be interesting.
 
Since area efficiency for Tesla matters a lot less than GeForce margins unless they make a dedicated chip for GPGPU, a very good solution would be to power gate (rather than simply clock gate) as much graphics-centric hardware as possible. There is a small area cost to power gating but if done properly it could even help graphics power efficiency (e.g. workloads with a very low amount of texturing, or very long shaders that let the ROPs idle a lot). You'd probably want to be able to shut down these blocks for hundreds of cycles to be really worth it but given the size of GPU buffers (which is very easy for compute but hard for graphics) but still there should definitely be cases where this can be done at no risk.

I'm pretty darn sure Fermi doesn't do this, and I have no idea at all about Kepler, but it would be interesting.

Yes I was thinking the same thing. I think power-gating TMUs in Teslas should be straight-forward enough. Perhaps the geometry engines could be powered off as well. I'm not too sure about the rest, given how tightly coupled ROPs and memory controllers seem to be, for instance.
 
TMU's are exposed in CUDA.
Yes but very few programs use them (and even fewer will if there's a clear power penalty to doing so) and this is known at compile time, so it's easy to power gate the TMUs only if all running shaders are known not to be using them. Of course, as I said it'd be even better to be able to dynamically power gate them off even for graphics if the workload isn't using them much if at all.

I don't think power gating the ROPs would be a problem even if they're tightly coupled - they're essentially still a separate block likely designed by a separate engineering team and I very much doubt CUDA workloads use them at all.
 
I don't think power gating the ROPs would be a problem even if they're tightly coupled - they're essentially still a separate block likely designed by a separate engineering team and I very much doubt CUDA workloads use them at all.

AFAIK, rops do the atomics on nv.

EDIT
Yes but very few programs use them (and even fewer will if there's a clear power penalty to doing so)
If your application matches texture filtering, there's no way in hell you wouldn't use it. Power cost of doing it on ALU's is way more. And nobody except handheld devs look at power consumption of their code, so doesn't apply for Kepler anyway.

J Random Dev said:
Gee, using a debugger and a profiler is hard enough. You want us to use a power meter too :oops:

:)
 
Last edited by a moderator:
NVidia's still lying about the introduction date of Fermi (beginning of 2009 on that cute graph, well over a year earlier than reality), why the hell would anyone take anything else NVidia says seriously?
They are just crazy:

late 2010: Fermi - 2009 / Kepler - 2011 / Maxwell 2013
May 2011: Fermi - late 2009 / Kepler - late 2011 / Maxwell 2013
June 2011: Fermi - 2010 / Kepler - 2012 / Maxwell 2014
July 2011: Fermi - 2009 / Kepler - 2011 / Maxwell 2013

(the second roadmap was missed by many sites)
 
Alexko said:
You can't deny the fact that Quadros make a lot of money either. I don't know if whether the consumer segment made a profit, but if so it can't have been much.
What leads you to that conclusion?

How do you define profit in the first place?

The gross profit is easy: at least equal or probably higher than Quadro profit. (Use your 75% GM figure and run the numbers.)

After that, it's all bookkeeping on how you divide NRE costs among GF and Quadro, but they are likely to be higher or equal for Quadro than for GF.

If you think there is a way to make GF 'hardly profitable', I'd like to see it, because I don't.
 
What leads you to that conclusion?

Very similar market share and prices compared to AMD, assuming that AMD doesn't give bigger discounts to OEMs than NVIDIA.

How do you define profit in the first place?

The gross profit is easy: at least equal or probably higher than Quadro profit. (Use your 75% GM figure and run the numbers.)

After that, it's all bookkeeping on how you divide NRE costs among GF and Quadro, but they are likely to be higher or equal for Quadro than for GF.

If you think there is a way to make GF 'hardly profitable', I'd like to see it, because I don't.

Of course I'm just guessing based on the very limited information we have, but it seems to me that removing Quadros altogether wouldn't remove a lot of spending (maybe something like $50~70M?) and probably somewhere around $200M of revenue.
 
Looks like it. I guess NVIDIA isn't comfortable taking the wraps of Kepler just yet, and didn't want to host a conference with no new hardware to show.
 
Back
Top