Nvidia Pascal Speculation Thread

Adored · Mar 23, 2016

Yes that's why I mentioned 1.2GHz as being the likely reasonable limit on the process.

http://www.pcper.com/files/review/2016-01-04/polaris-12.jpg

Obviously Nvidia can push the clock speeds higher but after a certain point the gains won't be worth the extra power. The extra gains they got from Maxwell due to knowledge of the 28nm node may not carry foward to 16FF.

Anyway I'm not saying that they won't increase clocks at 16nm, just saying that there may be reasons why they don't or by as much as what some might believe. I don't expect to see the recurrence of anything like what happened on 28nm. Nvidia made more gains there because they tried.

silent_guy · Mar 24, 2016

Adored said:
Anyway, Ryan Smith at Anandtech did.
http://www.anandtech.com/show/7764/the-nvidia-geforce-gtx-750-ti-and-gtx-750-review-maxwell
http://www.anandtech.com/show/7764/the-nvidia-geforce-gtx-750-ti-and-gtx-750-review-maxwell/3

It's hard to know exactly what they exactly mean by this. I'm going for a combo of some real transistor optimizations as well as using that terms for low level clock gating optimizations.

When you look at real transistor optimizations, there's not that much you can do unless you're willing to move major parts from a standard cell based flow to full custom. I don't think that's a very likely option. What's left then are minor improvements that can be deployed widely. Think RAM building blocks that are used in generators. Or a few custom standard cells the expand the default library for some specific cases that could be unusually common in a particular design. In any case, these are the kind of optimizations that will gain you a few percentage points in improvement, but since low level power optimization is a long slog for small gains, they are in the same league as low level clock gating. You just need to find enough of those cases.

But all of that pales in comparison to what you can do with architectural stuff. Low level optimizations is a way to try to do things that have to be done a bit more efficiently, architectural stuff is about not doing stuff at all, or in a completely different way. Low level optimizations have been used extensively for over a decade now, so whatever was done additionally for Maxwell wouldn't be low hanging fruit anymore. No chance of huge gains.

Finally: your suggestions that they might be exploiting some process improvements along the way. How would that work?

Now the real question is just how much of that can be attributed to the clock speed increase from Kepler to Maxwell?

Clock speed increases on the same process are pretty much guaranteed to be architectural. How could they not be?

Other questions might be as FreneticPony suggested - Will FinFet's top-out at similar frequencies to these anyway? There are for sure reasons to believe that Nvidia may not increase on Maxwell's clock speeds, or not by much. 1.2GHz is the magic number I believe.

Yeah, I don't buy Mr. Pony's theories at all.
He basing it off a dimensionless chart that says "chart for illustrative purposes only".
Nvidia didn't have a power problem in 28nm, and 16nm will be much better no matter what. There is no justification whatsoever to not search the high speed/relatively lower perf/W corner for their next designs.

Adored said:
Obviously Nvidia can push the clock speeds higher but after a certain point the gains won't be worth the extra power.

If AMD is willing to trade away perf for reduced power, they'll lose (don't worry, they will not do that) and Nvidia will simply play the absolute performance card. And rightfully so. Maxwell was great was not because perf/W was excellent, it was because perf/W and absolute performance were both excellent.

The extra gains they got from Maxwell due to knowledge of the 28nm node may not carry foward to 16FF.

That's the biggest issue: your whole argument rests on this broken premise.

What exactly is that secret magic? Why are you choosing something mysterious when there are logical explanations: major architectural changes that are well known and visible for anybody who's willing to look for it?

silent_guy · Mar 24, 2016

Razor1 said:
Architecture has a bit to do with clock speed, but there are ceilings for the process itself, ...

You can find plenty of ARM CPUs out there on lower performance, higher perf/W processes that easily be Maxwell clock speeds despite those being on a high performance process. Why? Because it's a completely different architecture. It does not have 'a bit to do with clock speed'. On the contrary, it's the single most important factor.

lanek · Mar 24, 2016

http://www.guru3d.com/news-story/micron-starts-sampling-gddr5x-memory-to-customers.html

Way ahead of schedule (the target was late Summer) Micron has started shipping GDDR5X Memory its customers, likely Nvidia first. Micron will offer the ICs in 8 Gb (1 GB) and 16 Gb (2 GB) densities which indeed is indicative for 8GB adn 16GB graphics cards. The upcoming GeForce GTX 1070 and 1080 (if they are named that) already have been indicated as 8GB products.

GDDR5X is your standard GDDR5 memory however, opposed to delivering 32 byte/access to the memory cells, this is doubled up towards 64 byte/access. And that in theory could double up graphics card memory bandwidth. Early indications according to the presentation show numbers with the memory capable of doing up-to 10 to 12 Gbps, and in the future 16 Gbps. So your high-end graphics cardsthese days hover at say 400 GB/s. With GDDR5X that could increase to 800~1000 GB/sec and thus these are very significant improvements, actually they are competitive enough with HBM.

The big advantage of 64 byte/access GDDR5X is that it doesn't need hefty design changes other it would need compatible memory controllers, so this might be a very cost efficient methodology until HBM has matured and will be a more affordable solution.

AMD and NVIDIA are working on GDDR5X support.

Told ya

Adored · Mar 24, 2016

silent_guy said:
Clock speed increases on the same process are pretty much guaranteed to be architectural. How could they not be?

Same way that for example, AMD can increase clock speeds from 3GHz to 3.7GHz at lower voltage on the same 45nm node with Phenom maybe?

https://en.wikipedia.org/wiki/List_...2Deneb.22_.28C2.2FC3.2C_45_nm.2C_Quad-core.29

Not seeing any major architectural changes there.

That's the biggest issue: your whole argument rests on this broken premise.

What exactly is that secret magic? Why are you choosing something mysterious when there are logical explanations: major architectural changes that are well known and visible for anybody who's willing to look for it?

I'm quite willing to believe it's a combination of both, it's you who seems determined to dismiss any explanation that doesn't involve 100% incredible Nvidia engineering.

silent_guy · Mar 24, 2016

Adored said:
Same way that for example, AMD can increase clock speeds from 3GHz to 3.7GHz at lower voltage on the same 45nm node with Phenom maybe?

Come on, please don't tell me that, all this time, you were talking about increasingly aggressive binning, helped by tightening process variations?

You know why they call it "black edition"? Because it's a specialty bin that only covers a fraction of the samples. And the tightening variations reduce the amount of tail end samples that fail.

Just like there are GTX 980 Ti with base clocks that are 20%+ higher than the reference one.

What a waste of time.

Not seeing any major architectural changes there.

Exactly. That's why it's a black edition.

Whereas every Maxwell GPU in existence needs to run with a faster base clock than one that is Kepler based.

I'm quite willing to believe it's a combination of both, it's you who seems determined to dismiss any explanation that doesn't involve 100% incredible Nvidia engineering.

Sure, when all else fails, pull out the fanboy card.

Adored · Mar 24, 2016

Sure SilentGuy, there were no transistor-level improvements regardless of what Nvidia themselves said about it. Is full custom unlikely? I guess, but unless you work for Nvidia or TSMC, the possibility can't be dismissed.

They are called Black Edition due to having unlocked multipliers. I should know, I had one. The fact is that over the course of a year or two, the clock speeds on the average part increased by more than 20% - and I do mean average as the only parts available later on were the "Black Edition" parts.

There are ways to improve a process mid-life, as surely you know. If not, check out the low-k stuff back on 45nm. And no, I'm not suggesting anything similar took place on 28nm, but neither am I dismissing anything, given that Nvidia themselves clearly believe they did something.

silent_guy · Mar 24, 2016

Adored said:
And no, I'm not suggesting anything similar took place on 28nm, ...

Then why bring it up?

... but neither am I dismissing anything, given that Nvidia themselves clearly believe they did something.

Believe what you want to believe. We know the reality soon enough.

If Pascal runs at 1.2GHz or less, I'll be more than willing to call you the oracle of silicon engineering and myself a complete fool (and Nvidia idiots for not exploiting the main benefit for 16nm.) Are you willing to do the opposite of Pascal runs higher than 1.4GHz?

Adored · Mar 24, 2016

silent_guy said:
Then why bring it up?

Because for the sake of accuracy it's good to know that processes can be improved upon without the need of major architectural changes.

If Pascal runs at 1.2GHz or less, I'll be more than willing to call you the oracle of silicon engineering and myself a complete fool (and Nvidia idiots for not exploiting the main benefit for 16nm.) Are you willing to do the opposite of Pascal runs higher than 1.4GHz?

That's an easy bet for you to make as we can be quite certain of one thing - that is Nvidia will increase clock speeds to "win" on pure performance vs Polaris 10 rather than be slower, with no regard to perf/Watt. That assumes that Polaris launches first of course.

Let me make a counter offer - I believe that Polaris 10 will beat GP104 in perf/Watt. That should also be a safe bet for you to take, given your confidence in Nvidia's ability to outengineer AMD and given Nvidia's clear lead in that area right now.

CSI PC · Mar 24, 2016

lanek said:
http://www.guru3d.com/news-story/micron-starts-sampling-gddr5x-memory-to-customers.html

That might also explain recent rumour about AMD also being 256-bit bus as well.
I must admit after seeing how 980 performs (even overclocked) with 256-bit bus with resolutions beyond 1080 I am a bit leery of this, yeah GDDR5x is a nice theory in the additional bandwidth boost it can provide but will it pan out that way completely in the real world.
Anyone got or seen real world report showing GDDR5X benchmarks and under what circumstances?
I get the feeling everyone is looking at GDDR5X from the POV of its theoretical maximum or close to it capabilities of just under 16Gbps being achievable at launch.
Also wonder what impact/considerations if any there is using this memory in its most efficient way when considering gaming/visual developments/etc.
Cheers

silent_guy · Mar 24, 2016

Adored said:
That's an easy bet for you to make as we can be quite certain of one thing - that is Nvidia will increase clock speeds to "win" on pure performance vs Polaris 10 rather than be slower, with no regard to perf/Watt.

Ok. You're now weaseling out of your 1.2GHz claim.

Let me make a counter offer - I believe that Polaris 10 will beat GP104 in perf/Watt.

Go look through my posting history. You'll find plenty of cases where I've stated that I expect Nvidia and AMD to be similar in perf/mm2 and perf/W for the next generation. Back when the latest AMD rebrands were cause for the usual outrage, I claimed that it was the right thing for them to do to have more time to get there perf/W house in order.

Do I really have to make a 180 turn on everything that I've stated earlier just to satisfy to mistaken preconceptions? I'd rather not.

That should also be a safe bet for you to take, given your confidence in Nvidia's ability to outengineer AMD and given Nvidia's clear lead in that area right now.

When one company has similar or better absolute performance, similar or better perf/mm2, and a huge advantage in perf/W on the same process, is it that unreasonable to consider that a case one out engineered the other? But again, irrelevant, since I've always expected AMD to fix most, if not all, of that for the next generation.

lanek · Mar 24, 2016

CSI PC said:
That might also explain recent rumour about AMD also being 256-bit bus as well.
I must admit after seeing how 980 performs (even overclocked) with 256-bit bus with resolutions beyond 1080 I am a bit leery of this, yeah GDDR5x is a nice theory in the additional bandwidth boost it can provide but will it pan out that way completely in the real world.
Anyone got or seen real world report showing GDDR5X benchmarks and under what circumstances?
I get the feeling everyone is looking at GDDR5X from the POV of its theoretical maximum or close to it capabilities of just under 16Gbps being achievable at launch.
Also wonder what impact/considerations if any there is using this memory in its most efficient way when considering gaming/visual developments/etc.
Cheers

I think, but thats my own opinion, that GDDR5x are a bit over estimate at this moment. you take peak bandwith, but double the access bit will not do miracle. this said the impact is more on the capacity than the peak access speed. Thats maybe due to the "HBM effect".. Peoples seems think that every new gen of memory are somewhat revolutionnary ( but we are really really far of HBM there ).

In fact, i dont understand why it have not been done before . double the access bit to 64 is not really something who was not feasible with even GDDR3.. You will have not need a " 512bit memory contoller " like on Hawaii.

I still have to see real world benchmark for see what is capable to do this GDDr5x...

silent_guy · Mar 24, 2016

lanek said:
I think, but thats my own opinion, that GDDR5x are a bit over estimate at this moment. you take peak bandwith, but double the access bit will not do miracle.

GDDR5X transfers 64 bytes per burst vs 32 for GDDR5. With equal command clocks, that means double the BW.

What is 'a bit over estimate' ?

this said the impact is more on the capacity than the peak access speed.

How so? A GDDR5X system with 8GB will still have the same capacity as a GDDR5 system with 8GB.

lanek · Mar 24, 2016

silent_guy said:
GDDR5X transfers 64 bytes per burst vs 32 for GDDR5. With equal command clocks, that means double the BW.

What is 'a bit over estimate' ?

How so? A GDDR5X system with 8GB will still have the same capacity as a GDDR5 system with 8GB.

Outside professional gpu`s have we seen a lot of 8GB gpu's ? .. No.. and specially not on 256bit low end gpu's .. This is why i speak about capacity memory impact... take a low end gpu's, with a small bus and put on it an high capacity storage.

As for the access, over estimate because you will rarely see the peak bandwith achieved... not on the state it is done. I hope im wrong..

silent_guy · Mar 24, 2016

CSI PC said:
I get the feeling everyone is looking at GDDR5X from the POV of its theoretical maximum or close to it capabilities of just under 16Gbps being achievable at launch.

Also wonder what impact/considerations if any there is using this memory in its most efficient way when considering gaming/visual developments/etc.
Cheers

You're right to be a little bit suspicious: HBM didn't exactly set the world on fire.

But we're long past the era of explosive growth and steady improvement is where we are today. GDDR5X fills that slot nicely. Good enough to support a class up in resolution.

I expect as much impact as there was with HBM: nothing at all. Real visual changes need to come from improved algorithms.

CSI PC · Mar 24, 2016

lanek said:
I think, but thats my own opinion, that GDDR5x are a bit over estimate at this moment. you take peak bandwith, but double the access bit will not do miracle. this said the impact is more on the capacity than the peak access speed. Thats maybe due to the "HBM effect".. Peoples seems think that every new gen of memory are somewhat revolutionnary ( but we are really really far of HBM there ).

In fact, i dont understand why it have not been done before . double the access bit to 64 is not really something who was not feasible with even GDDR3.. You will have not need a " 512bit memory contoller " like on Hawaii.

I still have to see real world benchmark for see what is capable to do this GDDr5x...

Yeah would be good to know why before we have not seen 64-bit on GDDR5 for GPU as interface supports it; maybe relating to size considerations?
Cheers

Adored · Mar 24, 2016

silent_guy said:
Ok. You're now weaseling out of your 1.2GHz claim.

No I am not. I believe 1.2GHz is about right for Polaris and a good estimation of where 14FF will top out in perf/Watt for GPU.

Nvidia might go higher with boost clocks or through throwing out perf/Watt. They might lose some top-end due to mixed precision or improved async capability. Did anyone ever consider that AMD's ACE's could be an issue with their clocks I wonder...

All I ever said was that people shouldn't assume a "normal" clock increase for Pascal. If it happens, great. They surely deserve it if they can continue on with the excellence of Maxwell over two nodes.

When one company has similar or better absolute performance, similar or better perf/mm2, and a huge advantage in perf/W on the same process, is it that unreasonable to consider that a case one out engineered the other? But again, irrelevant, since I've always expected AMD to fix most, if not all, of that for the next generation.

I never disagreed that Nvidia is far ahead in every metric on 28nm either. It's actually something I point out quite a lot. I do however believe it's far more complex than your reasoning of "better architecture" alone.

silent_guy · Mar 24, 2016

lanek said:
Outside professional gpu`s have we seen a lot of 8GB gpu's ? .. No.. and specially not on 256bit low end gpu's .. This is why i speak about capacity memory impact... take a low end gpu's, with a small bus and put on it an high capacity storage.

It's just the logical next step. We already have 12GB for TitanX, 8 for 390 and 6 for 980Ti.

As for the access, over estimate because you will rarely see the peak bandwith achieved... not on the state it is done. I hope im wrong..

How many times don you see peak BW achieved on today's GPU? How about 'never', because it's not something that people actively monitor. It will be just like this for GDDR5X.

lanek · Mar 24, 2016

CSI PC said:
Yeah would be good to know why before we have not seen 64-bit on GDDR5 for GPU as interface supports it; maybe relating to size considerations?
Cheers

Nothing to do with size ( i mean on a physical level ) ... why it have not been done before? thats plenty illogic

silent_guy said:
It's just the logical next step. We already have 12GB for TitanX, 8 for 390 and 6 for 980Ti.

How many times don you see peak BW achieved on today's GPU? How about 'never', because it's not something that people actively monitor. It will be just like this for GDDR5X.

ANd 32GB on FirePro .. 24GB and 16 on Nvidia Tesla ...

With your last phrase, You have just just conclude to what i was saying... memory capacity will have more impact than the peak bandwith ... Double the memory ( 4 to 8 ) have never got a big impact so far on gaming gpu''s, somewhat, double the access bit to 64, GDDR5x solve this problem.. hence why i speak aboout memory size instead of bandwith ..

with the same bus, with the same gpu's before having the 4 or 8GB was doing no difference, but now with GDDR5x, doubling the size of memory of a 4GB gpu will provide a difference. Even with a small bus ..

And the question remain, if it was so easy, why it have not been done before ? There's no physical constrain, and we are far of the problem who can been made by using something like TSV and interposer with HBM..

More seriously, im really impatient to test it on OpenCL raytracing for see the real impact. ( as memory bandwith, size, speed have there a real deep impact on performance )

silent_guy · Mar 24, 2016

Adored said:
No I am not. I believe 1.2GHz is about right for Polaris and a good estimation of where 14FF will top out in perf/Watt for GPU.

"There are for sure reasons to believe that Nvidia may not increase on Maxwell's clock speeds, or not by much. 1.2GHz is the magic number I believe."

How dumb of me to think that you were talking about 1.2GHz for Maxwell.

I never disagreed that Nvidia is far ahead in every metric on 28nm either.

By the standard of your previous post, you're showing some serious Nvidia fanboy tendencies there.

I do however believe it's far more complex than your reasoning of "better architecture" alone.

The only thing you've brought up so far is plain old process maturity (which should have benefited GCN just as much.)

What else do you have?

Nvidia Pascal Speculation Thread

Similar threads