NVIDIA Kepler speculation thread

A1xLLcqAgt0qc2RyMz0y · Nov 10, 2010

Sxotty said:
If you produce something with a sufficiently horrendous performance/watt improving upon it is not nearly as difficult .

Yet many posters here have stated that Nvidia could never improve performance/watt on the same process node.

Just wanted to show that they can and did.

PSU-failure · Nov 11, 2010

Nobody with half a brain could ever say that.

nVidia couldn't improve perf/watt and perf/mm² to the point GF110 went past Cypress (well... they could, with some magic), but with such a low performance they were bound to improve.

Kepler will probably be Fermi's last iteration, just like GT200 was G80's one, so I don't think they will do considerably better than GF110 (which is in fact a fixed GF100, with redesigned TMUs and power grid). Maxwell should be the real next arch and improve these, but it should be noted that perf/mm² will stay low if they continue to follow the half-rate DP + ECC route (assuming AMD doesn't improve DP rate and/or DP doesn't get over-used).

About the ASCII art, I see nVidia doing incremental improvements and periodical overhauls: G8x gave birth to GT200, then GF100 to Kepler, then Maxwell to its successor. G9x are in fact G8x, just like GF104 and GF110 are GF100 (well, GF104 is already further from GF100 than G92 to G80, it's almost as far as GT200 was).

It's safe to assume AMD went the same route, they just have one/two more iteration(s) of incremental improvements: R580 => (R500 ->) R600 -> R700 -> R800 => new arch.

mczak · Nov 11, 2010

PSU-failure said:
Kepler will probably be Fermi's last iteration, just like GT200 was G80's one, so I don't think they will do considerably better than GF110 (which is in fact a fixed GF100, with redesigned TMUs and power grid).

Well it's on 28nm which should help some. Of course it will help AMD too

Maxwell should be the real next arch and improve these, but it should be noted that perf/mm² will stay low if they continue to follow the half-rate DP + ECC route (assuming AMD doesn't improve DP rate and/or DP doesn't get over-used).

If you compare GF104 (which has neither half-rate DP nor ECC) with GF110, it doesn't look that to me like these features cost them that much, neither in perf/area nor perf/power - I'm sure it does make SOME difference, just not that much.

trinibwoy · Nov 11, 2010

If nVidia is to be believed it was the parallelization of geometry processing that cost the most transistors in Fermi. I figure Kepler's and Maxwell's success hinge on:

1. Developers buying into nVidia's "geometry is king" message.
2. AMD being unable to match their geometry performance in a more efficient way.

Cayman hopefully can answer #2 in a few weeks.

A1xLLcqAgt0qc2RyMz0y · Nov 11, 2010

PSU-failure said:
Nobody with half a brain could ever say that.

Well try rereading this thread from the beginning there are more than a few 1/2 brains posts.

nVidia's Kepler is going after the HPC and Professional market where other features such as ECC and Cache are as important as DP throughput. Those take transistors and thus power. Comparing AMD's gaming GPU ppw against a HPC GPU ppw is not valid.

trinibwoy · Nov 11, 2010

A1xLLcqAgt0qc2RyMz0y said:
Comparing AMD's gaming GPU ppw against a HPC GPU ppw is not valid.

It's a valid comparison because that's the product nVidia chose to offer to the gaming community. They don't get a free pass because useless transistors are burning power - that was their decision not ours. However, you're right that we need to take that into consideration when talking about ppw.

PSU-failure · Nov 11, 2010

mczak said:
If you compare GF104 (which has neither half-rate DP nor ECC) with GF110, it doesn't look that to me like these features cost them that much, neither in perf/area nor perf/power - I'm sure it does make SOME difference, just not that much.

Well, as I said, GF104 is already (almost?) as far from GF100 as GT200 was from G80, so I don't know if we could compare directly.

Scheduling/dispatch logic is different, there are 33% more TMUs and some areas (mainly L2) simply don't scale with anything else than the removing of ECC.

A1xLLcqAgt0qc2RyMz0y said:
nVidia's Kepler is going after the HPC and Professional market where other features such as ECC and Cache are as important as DP throughput. Those take transistors and thus power. Comparing AMD's gaming GPU ppw against a HPC GPU ppw is not valid.

Then compare GF100/110 and GF104, and laugh.

GF104 isn't better than GF110 despite not having these "useless" elements.

A1xLLcqAgt0qc2RyMz0y · Nov 11, 2010

trinibwoy said:
It's a valid comparison because that's the product nVidia chose to offer to the gaming community. They don't get a free pass because useless transistors are burning power - that was their decision not ours. However, you're right that we need to take that into consideration when talking about ppw.

No it is not a valid comparison . When AMD has a HPC GPU then you can compare. And as of now AMD has officially stated they will enter the HPC market when they see it as viable. Who knows when that will be.

A1xLLcqAgt0qc2RyMz0y · Nov 11, 2010

PSU-failure said:
Then compare GF100/110 and GF104, and laugh.

Why should I laugh. I will smile though as the GF110 is much improved in ppw over the GF100. It will be interesting the see the new Tesla that will come out next year based on the GF110. When that product arrives ppw will be very much improved over the current Tesla's.

trinibwoy · Nov 11, 2010

A1xLLcqAgt0qc2RyMz0y said:
No it is not a valid comparison . When AMD has a HPC GPU then you can compare. And as of now AMD has officially stated they will enter the HPC market when they see it as viable. Who knows when that will be.

What do you mean? AMD not having an HPC GPU doesn't have anything to do with Geforce v Radeon or the power consumption of Geforce cards. We can all understand why Geforces pull more power but that doesn't change the fact that they do.

Sxotty · Nov 11, 2010

PPW is still not that important for most folks unless you run into HVAC problems b.c of it. PP$ is much more important as evidenced by buyers habits.

mczak · Nov 11, 2010

PSU-failure said:
Well, as I said, GF104 is already (almost?) as far from GF100 as GT200 was from G80, so I don't know if we could compare directly.

Scheduling/dispatch logic is different, there are 33% more TMUs and some areas (mainly L2) simply don't scale with anything else than the removing of ECC.

This is true though you'd think that without a need to cater for anything compute, those changes would benefit graphic workloads (that is, perf/area would be higher for these workloads). Though I can't really see much of that. Well certainly gt2xx failed on that front too.
So the questions still unanswered I have are:
- How big would a GF110 be without the "unnecessary" stuff, i.e. without ECC and slower DP mostly?
- How would a "3/4" GF110 (that is, 3 GPC, 12SM, 384 "cores", 256bit/32 ROPs, and without the "useless" stuff) compare to a (fully enabled) GF104, in terms of die size (or transistor count) and performance?

PSU-failure · Nov 11, 2010

A1xLLcqAgt0qc2RyMz0y said:
Why should I laugh. I will smile though as the GF110 is much improved in ppw over the GF100. It will be interesting the see the new Tesla that will come out next year based on the GF110. When that product arrives ppw will be very much improved over the current Tesla's.

Who told you nVidia will release "GTX580" Tesla 2.5 versions?

HPC is a SLOW market, nothing to do with the gaming market and so the Tesla line hasn't to be updated before the next step (probably Kepler, just like with Tesla 0.8 and 1). There will be GF110-based Tesla 2 for sure, but I don't see any reason to do such an incremental upgrade.

Look at the entire lineup, you'll see only one chip is barely competitive (perf, power and size), and you'll probably find nVidia's issue is the chips' being too big for their perf level. Static power seems ultra-low (awesome work there), but then come clock distribution and other fun issues still related to die size.

MDolenc · Nov 11, 2010

mczak said:
- How big would a GF110 be without the "unnecessary" stuff, i.e. without ECC and slower DP mostly?

GT200 had a seperate DP unit. Fermi doesn't have that. Yet DP implementation on Fermi is cheap compared to GT200. Not all SP-s are created equal on Fermi, and that's also the reason behind GF104 DP performance.

mczak said:
- How would a "3/4" GF110 (that is, 3 GPC, 12SM, 384 "cores", 256bit/32 ROPs, and without the "useless" stuff) compare to a (fully enabled) GF104, in terms of die size (or transistor count) and performance?

"3/4" GF110 would win geometry wise and lose pixel fillrate wise. Die size would probably be so so. And generally for such a chip you should not be that worried about geometry throughput.

Megadrive1988 · Nov 12, 2010

jlippo said:
I would almost go for the following and developement on next gen pretty much starts when the last one is out of the door.

Code:

G80 ------------------------ GF100 (and friends) --------- Maxwell? (hopefully..) | | G90 and friends Kepler (and friends) | GT200 (and Friends)

I loved the old good NV code names, NVx0 new and NVx5 refresh.

Me too.

G80 = NV50
GT200 = NV55
Fermi = NV60
Kepler = NV65
Maxwell = NV70

trinibwoy · Nov 12, 2010

G80=NV50? So what's G70/1?

Kaotik · Nov 12, 2010

NV45?

neliz · Nov 12, 2010

Kaotik said:
NV45?

No!

no-X · Nov 12, 2010

NV47. NV45 was NV40+PCIe bridge.

Megadrive1988 · Nov 14, 2010

trinibwoy said:
G80=NV50? So what's G70/1?

G70 / G71 / and PS3's RSX = NV47

NVIDIA Kepler speculation thread

A1xLLcqAgt0qc2RyMz0y

PSU-failure

mczak

trinibwoy

Meh

A1xLLcqAgt0qc2RyMz0y

trinibwoy

Meh

PSU-failure

A1xLLcqAgt0qc2RyMz0y

A1xLLcqAgt0qc2RyMz0y

trinibwoy

Meh

Sxotty

mczak

PSU-failure

MDolenc

Megadrive1988

trinibwoy

Meh

Kaotik

Drunk Member

neliz

GIGABYTE Man

no-X

Megadrive1988

Similar threads