NVIDIA Kepler speculation thread

Oh I see , so this will be the fun thread , right ? I bet we won't see any real reliable news for at least 6 months to come .. so please guys save some of your mental strength and humor !

or come to think of it .. NOT ! I need the laughs , they are so precious nowadays !

History have a tendency to repeat itself .
 
"Our next-generation GPU, Kepler, also named after a scientist, is expected to deliver 3-4x the performance-per-watt of Fermi. Kepler is based on 28nm and we expect to go into production next year."

http://www.fudzilla.com/graphics/graphics/graphics/nvidia-reveals-gpu-codenames-up-until-2013

Going into production is quite different from being released, so interesting choice of words, here.

And of course the word "expect" gives them all the wiggle room they need. It encourages the investors despite their current poor performance in the market, whilst giving Nvidia the usual escape clause.

For a company that doesn't comment on unreleased products, why else would they be promoting their next architecture before they've even got the current family out the door?
 
Dear Leader said:
We’re shipping Fermi today, which brought to the world very high-performance double precision. Fermi is rated at 768GFLOPS peak.

Ehh? that must be the 2009 Fermi :runaway:
Btw, when will we see the Watt or Bell architectures?

Anyone with the updated version of Dkanter's efficiency chart (the one including 5870, gf100 etc added) at hand? - it's an awful lot of pages to skim through otherwise.
 
I got it. using a Fermi die shot in such a graph is confusing as to which number it precisely points at.

if Fermi is two gigs/watt then the upper boundary of the die gives the Y-axis height and thus the number. Thus the Tesla die, if G80, should have its top hover at zero, or if GT200 should have its top at less that half the height for "2". Either way more than half the chip's body would be under the floor of X-axis.

but you can't desecrate a die like that!
 
Hehe love the pre-emptive scorn. Wonder how they can be so confident that they'll hit targets for Kepler after Fermi went so wrong.
 
Hehe love the pre-emptive scorn. Wonder how they can be so confident that they'll hit targets for Kepler after Fermi went so wrong.

easy. just dont repeat fermi's mistakes. they have no baggage from fermi, only experience. i know it's hard to believe at times but the engineers who work at big semiconductor companies are very smart and know what they are doing. it's not an easy task to design a chip and there is so much change from generation to generation that failure is just a thing you have to accept. to quote einstein, "anyone who has never made a mistake hasnt tried anything new".
 
Anyone else hearing "Big dies make good GPUs?" ;)
....
Even more geometry? A Polymorph engine for each ALU perhaps? =)

So is Fermi hot and power consuming because Enrico Fermi worked in Nuclear Physics? ;)

If so, probably Maxwell will fry us all by a blast of EMP. :cool:
 
what about that GF119 thingy.
surely Kepler is a name for that updated, "Fermi II" new line, and may be renamed as GK11x rather than GF11x?

if the rumoured small, process-inaugurating GF119 is the only Kepler chip "going to production" in 2010, then the above statement is technically true.
 
Don't get too excited... This chart is all about DP (64 bit float) performance per watt. Not about gaming (32 bit float) performance per watt.

Tesla had 1/8 DP flop rate compared to it's SP flop rate. Fermi has half rate DP. If Kepler is otherwise as (in)efficient as Fermi and has full speed DP, it already would be around 2x more efficient per watt in DP flops. When you factor in the die shrink, you'll get easily 3x more DP efficiency without any other changes.

Kepler being 5x more efficient per watt in DP compared to Tesla that had 1/8 DP rate doesn't sound that great really. A SP (32 bit float) flops per watt chart would be much more interesting to see. But I doubt it would be anywhere as dramatic as this one.
 
"Between now and Maxwell, we're going to Introduce features like virtual memory. We're going to ENHANCE the GPU's Ability Thurs autonomously process, so it's less dependent on the CPU, along with a very large Improvement in performance. "

Paging over PCIe? Could be painful. Or may be they'll load up HPC boards with LOTS of flash. :???: Producer - Consumer kernels running concurrently - and communicating via on chip cache - would be super cool though.
 
So is Fermi hot and power consuming because Enrico Fermi worked in Nuclear Physics? ;)

If so, probably Maxwell will fry us all by a blast of EMP. :cool:

Two things:
a) Here's a common prejudice about anglo-saxons: They like telling names. :)
b) I don't think they pick these names completely random, so please, grant me the pleasure coming up with possible interpretations based on that :)
 
Don't get too excited... This chart is all about DP (64 bit float) performance per watt. Not about gaming (32 bit float) performance per watt.

Tesla had 1/8 DP flop rate compared to it's SP flop rate. Fermi has half rate DP. If Kepler is otherwise as (in)efficient as Fermi and has full speed DP, it already would be around 2x more efficient per watt in DP flops. When you factor in the die shrink, you'll get easily 3x more DP efficiency without any other changes.

Kepler being 5x more efficient per watt in DP compared to Tesla that had 1/8 DP rate doesn't sound that great really. A SP (32 bit float) flops per watt chart would be much more interesting to see. But I doubt it would be anywhere as dramatic as this one.

I re-checked their own claimed specifications and the Tesla C1060 gets 78GFLOPs/s DP with a 187,8W TDP, while the Tesla C2050 is at 515GFLOPs/s with a 238W TDP.

If I increase the 1060 to a hypothetical 238W TDP the DP would be 99GFLOPs/s (not necessarily correct, but neither is that chart either heh....). That's roughly over a 5x times increase between Tesla10x0 and 20x0 in terms of DP throughput per W already.

It looks like that Fermi sits in the wrong spot in that scale from the get go. If you'd place it in the spot it actually deserves, any future design being on a rough 5x times increase compared to the original Tesla is more of a joke than anything else.
 
Don't get too excited... This chart is all about DP (64 bit float) performance per watt. Not about gaming (32 bit float) performance per watt.

Tesla had 1/8 DP flop rate compared to it's SP flop rate. Fermi has half rate DP. If Kepler is otherwise as (in)efficient as Fermi and has full speed DP, it already would be around 2x more efficient per watt in DP flops. When you factor in the die shrink, you'll get easily 3x more DP efficiency without any other changes.

Kepler being 5x more efficient per watt in DP compared to Tesla that had 1/8 DP rate doesn't sound that great really. A SP (32 bit float) flops per watt chart would be much more interesting to see. But I doubt it would be anywhere as dramatic as this one.

I'm no expert in these things, but I don't think full speed DP makes much sense. If you can do N DP FLOPs per cycle, it probably doesn't require much additional hardware to be able to do 2N SP FLOPs per cycle.

And since even HPC applications sometimes rely on SP…
 
I'm no expert in these things, but I don't think full speed DP makes much sense. If you can do N DP FLOPs per cycle, it probably doesn't require much additional hardware to be able to do 2N SP FLOPs per cycle.
Actually, you could do 4N sp flops easily.

And since even HPC applications sometimes rely on SP…
Umm.., no.
 
except (unless my memory/math is truly THAT bad) Fermi increased Tesla DPFP effectiveness by over 4x already..

tesla (GT200b Cuda 1.3 78GFlops at 190W) = .42 GF per Watt
fermi (GF100 Cuda 2.0 515GFlops @ 247W) = 2.09 GF per Watt

heck Tesla -> Fermi increased GF/Watt effectiveness almost 5x (4.98). I do take the slides (and almost anything from nV given their past history) with a ton of salt however.. the performance increase of 3-4x over Tesla is already present. Do I think we'll see another 4x increase in the next year.. nope.. 4 years.. possibly.

Where does your 190W for GT200b come? TDP?

If you are running only DP FPU code on GT200, most of the execution units(8/9) are idling, and you are not consuming that much power with just the DP FPU's.

On Fermi, where there are more DP FPU's, there are less idling SP FPU's idling, and you will be running "closer to the TDP" when running DP FPU code.

So you have invalid comparison.
 
Back
Top