NVIDIA Kepler speculation thread

Latest as in the newest-plan-to-launch, or latest as in definitely-by-then?
A definitely-by-then. as long as nothing major pops up
I would put my money on a launch very similar to 6x00, when Barts came out 2 months before Cayman, only move the whole timeline up a month.
 
Last edited by a moderator:
Ah bummer I was hoping for the 600 series to be out by Christmas time. I definitely like seeing both competitor's products first before making a choice, so I hope Nvidia can do a paper launch or something and send out some engineering samples when AMD launches their cards so that some comparos can be done. I have been holding out on playing a lot of games just because I want to play them in DX 11 goodness!
 
Ah bummer I was hoping for the 600 series to be out by Christmas time. I definitely like seeing both competitor's products first before making a choice, so I hope Nvidia can do a paper launch or something and send out some engineering samples when AMD launches their cards so that some comparos can be done. I have been holding out on playing a lot of games just because I want to play them in DX 11 goodness!

But if there's a 6-month gap, as there was with Fermi, will the 600 series compete with Southern Islands or the generation after that?
 
Actually it was the other way around last time I heard about it - AMD/ATI paying for working chips only, while nV buys whole wafers

No idea if Harison is right about a supposed recent deal between NV and TSMC, but up to the recent past and always afaik neither AMD nor NVIDIA are paying per working chips or per wafer at TSMC in the strict sense. Both their contracts should be far more complicated and I doubt anyone but a few insiders have a complete and accurate picture about them.
 
keplerexascale8se2.png


Googlish

Of course, NVIDIA is a step forward towards the exa-scale, GPU generation, for example "Kepler" (Kepler, code-named), the "performance per watt, reached three times the ratio of Fermi" , and He is introduced in the form of complete Malachowsky. Data have been obtained at trial, that it might be a visible and can be obtained three times the power efficiency.

 However, Kepler is mostly in power efficiency, he said that due process technology 28nm. Mr. Malachowsky the "improvement of the process as there will, ultimately, electronic circuits, that will depend on the power architecture," words followed, and only process technology computer Exa-scale can not be achieved suggest was.  He said, "Hierarchy and I / O interconnect for computing unit, the software will require further improvements before." Down to how to improve the踏Mi込Manakatta Specifically, will the next generation of GPU and Kepler "Maxwell" (Maxwell, code-named), the major point that becomes the point mentioned by Mr. Malachowsky Maybe it's that.

GTC Workshop Japan

http://www.4gamer.net/games/120/G012093/20110722064/
 
Last edited by a moderator:
Typical marketing slide - look how close to the target they already are, only a few pixels! And so early too. ;)

If they want to match the perf/watt Kepler Target, then it looks like they will have to implement full rate FP64 already, as that will be reltively cheapest transistor wise.
 
I am wondering: If you have all the basics set (distributed geometry processing, unified memory space, high performance gather/scatter and so on), are fixing some of the shortcomings of your arch (narrow path between shader core and ROPs [which I am still inclined to view as an artificial limitation in order to keep power in check and you'd need double the data width for single cycle FP64 anyway]) and you go FP64 single cycle all the way - taken for granted that this is the cheapest way, wouldn't it also be one of the possible best inflection points to get rid of some of the fixed function hardware?

Reasoning being as follows:
- Games are limited with this-generation console ports, not requiring vast amounts of gfx horsepower for at least one generation of gfx hardware
- You have a new process tech and a fundamentally unchanged architecture (presuming the FP64 units were already present as one half of the SMs in Fermi as a kind of trailblazer - or has there been definite prove that two units are coupled together in FP64 mode?), so you can concentrate on porting this in the most power and space efficient way. Much like AMD seems to have done it in the past btw.

It would seem like a smart move, unless I've forgot something important.
 
I am wondering: If you have all the basics set (distributed geometry processing, unified memory space, high performance gather/scatter and so on), are fixing some of the shortcomings of your arch (narrow path between shader core and ROPs [which I am still inclined to view as an artificial limitation in order to keep power in check and you'd need double the data width for single cycle FP64 anyway]) and you go FP64 single cycle all the way - taken for granted that this is the cheapest way, wouldn't it also be one of the possible best inflection points to get rid of some of the fixed function hardware?

Reasoning being as follows:
- Games are limited with this-generation console ports, not requiring vast amounts of gfx horsepower for at least one generation of gfx hardware
- You have a new process tech and a fundamentally unchanged architecture (presuming the FP64 units were already present as one half of the SMs in Fermi as a kind of trailblazer - or has there been definite prove that two units are coupled together in FP64 mode?), so you can concentrate on porting this in the most power and space efficient way. Much like AMD seems to have done it in the past btw.

It would seem like a smart move, unless I've forgot something important.

1:1 sp:dp is a monumentally bad idea. Unless you have no pretensions of catering to the gaming market at all. You'll lose an entire shrink while the competition isn't sitting idle.

EDIT:
Which is why I am not buying this particular bit.
 
Yes sure, that's a very valid point which I won't argue. But I am wondering (or maybe it already is done that way in GF1x0?): Can't you execute two 32 bit ops for every 64 bit op? I don't know though how large an overhead there is involved when doing full-speed FP64 versus twice the amount of ALUs and doing half-speed FP64.
 
Yes sure, that's a very valid point which I won't argue. But I am wondering (or maybe it already is done that way in GF1x0?): Can't you execute two 32 bit ops for every 64 bit op? I don't know though how large an overhead there is involved when doing full-speed FP64 versus twice the amount of ALUs and doing half-speed FP64.

The natural sp:dp ratio is 3:1. So even 2:1 is prioritizing dp over sp. The multiplier block (which should be on the critical path and probably the largest component of the ALU) would be ~3x bigger than it needs to be for full rate dp.

Besides, IMHO, they should try to speed up Denver instead of abandoning $600 market and the halo effect. HPC will like that more than any full rate stunt.
 
The numbers in that graph don't make sense for Fermi if using theoretical flops. It's very possible that Kepler isn't just bumping up marketing flops but making better use of them too. Lots of HPC workloads are far below theoretical performance on current GPUs.
 
The natural sp:dp ratio is 3:1. So even 2:1 is prioritizing dp over sp. The multiplier block (which should be on the critical path and probably the largest component of the ALU) would be ~3x bigger than it needs to be for full rate dp.
Maybe for the ALUs itself. But for the data paths it's closer to 2:1. And getting data in and out of the reg files and the ALUs is quite a bit of an effort lately, also from a power perspective. Therefore, the step from 3:1 to 2:1 is probably relatively small. But I agree, 1:1 would be an incredible waste of resources for SP centric tasks (and there are a lot).
 
So it would be more likely that real world performance was shown on the recent Nvidia slide for power needed to get to exascale?
 
So it would be more likely that real world performance was shown on the recent Nvidia slide for power needed to get to exascale?

It's possible. There's also no mention of the TDP for individual GPUs, so perhaps they intend future architectures to be better optimized for 100~150W operation at low voltages, which could, theoretically, considerably improve performance/W.

On the other hand, we all know NVIDIA's tendency to promise wonderful things and fall short by a mile or two.
 
That's not quite accurate. There were missteps along the way but they've found a place in Cray's flagship offering and their investment in mobile is paying off with Tegra. What have they promised and not delivered? Both of their strategic gambles have successfully made it to market and are doing well and they've managed to defend their consumer graphics market share all the while.
 
We do. The-Fermi-that-was-supposed-to-be (GF110) was late almost about a year.

Yeah, after grand promises, just like NV30.

I haven't gone back and checked the figures and statements, but I doubt Tesla & Tegra sales are anywhere near initial projections, though they are doing OK.

And CUDA hasn't made CPUs irrelevant, PhysX hasn't changed the world of video games, etc.

PS: NVIDIA has managed to keep decent market share, but it's been eroding for several quarters already. I somehow doubt Southern Islands will make that situation any better for NV.
 
I haven't gone back and checked the figures and statements, but I doubt Tesla & Tegra sales are anywhere near initial projections, though they are doing OK.

What were the initial projections? They are the leading provider of SoC's for non-Apple tablets and have significant smartphone wins as well. Given the strength of the establishment (TI/Qualcomm/Imagination) it's a damn near miracle. The transformer is 400k units a month all by itself.

And CUDA hasn't made CPUs irrelevant, PhysX hasn't changed the world of video games, etc.

I don't recall nVidia promising that CUDA will run your OS and serial apps. They claim GPUs are more efficient for parallel workloads and it seems that supercomputer designers and users agree. PhysX is available as a product for anyone to use and it does change the games that use it.

PS: NVIDIA has managed to keep decent market share, but it's been eroding for several quarters already. I somehow doubt Southern Islands will make that situation any better for NV.

nVidia makes more money in the consumer market even when AMD has all possible advantages. They'll be just fine. They talk big and stumble often but they're ambitious and eventually get it done. That's what matters. Look at how difficult it is for others to do the same - AMD v Intel, Everybody v Apple etc.
 
nVidia makes more money in the consumer market even when AMD has all possible advantages. They'll be just fine. They talk big and stumble often but they're ambitious and eventually get it done. That's what matters. Look at how difficult it is for others to do the same - AMD v Intel, Everybody v Apple etc.
Not sure why you say that. IIRC, AMD had >50% share for a while now. Besides, the dx11 numbers from steam suggest AMD won big there.
 
Back
Top