NVIDIA Maxwell Speculation Thread

204 taped out in early April and 200 in June; considering the latter is quite a bit more complicated I wouldn't expect it within this year.

My info is that GM200 taped out in July, but either way, I was told not to expect a release before Q1'15.
A 20nm shrink would not bring costs down mid-term. And performance may not be significantly faster either. What'd be the point of it?

Exactly..the per transistor costs right now are higher for 20nm and density aside, you only get a bit lower power. I said this more than six months back..and I was met with a lot of skepticism ;) (That aside, as per my armchair CEO speculation, 20nm would have made sense for GM200 as it isn't as cost sensitive. Really curious as to why it is still on 28nm)

Also..from what I've heard, there will be no 20nm shrinks either..NV will move straight to Pascal on 16FF sometime in H2'15.
Everyone can speculate: http://www.forum-3dcenter.org/vbulletin/showpost.php?p=10364414&postcount=6031

....let's see how fast any happygomerry plagiarizers will pick it up as a "fact".

Sounds about right to me ;) Quoting my post from earlier in this thread
GM200 is not planned for release before Q1'15 as per my info. Die size is ~560 mm2.
I was quite blown away by GM204's performance...and am seriously impressed by what NV has been able to do on 28nm. Just imagine if they'd had access to a proper 14nm FINFET process. It's quite sad that GPU's are lagging behind on process nodes by like 2-3 years compared to Intel's CPU's. Pascal on 16FF should close that gap somewhat.

Also going by GM204's performance, GM200 should be quite a chip. And consider is that this time around, GM200 need not be clocked lower than GM204 as it wont be power constrained. (historically the big die GPU's have always been clocked lower than the mid-range ones as they were constrained by power). GM200 could be clocked as high as GM204 and still be well under 250W.
 
Last edited by a moderator:
My info is that GM200 taped out in July, but either way, I was told not to expect a release before Q1'15.


Exactly..the per transistor costs right now are higher for 20nm and density aside, you only get a bit lower power. I said this more than six months back..and I was met with a lot of skepticism ;) (That aside, as per my armchair CEO speculation, 20nm would have made sense for GM200 as it isn't as cost sensitive. Really curious as to why it is still on 28nm)

Also..from what I've heard, there will be no 20nm shrinks either..NV will move straight to Pascal on 16FF sometime in H2'15.


Sounds about right to me ;) Quoting my post from earlier in this thread
I was quite blown away by GM204's performance...and am seriously impressed by what NV has been able to do on 28nm. Just imagine if they'd had access to a proper 14nm FINFET process. It's quite sad that GPU's are lagging behind on process nodes by like 2-3 years compared to Intel's CPU's. Pascal on 16FF should close that gap somewhat.

Also going by GM204's performance, GM200 should be quite a chip. And consider is that this time around, GM200 need not be clocked lower than GM204 as it wont be power constrained. (historically the big die GPU's have always been clocked lower than the mid-range ones as they were constrained by power). GM200 could be clocked as high as GM204 and still be well under 250W.

Straight to Pascal in H2 2015 ? ... so you suggest they will release GM200 Maxwell in H1 2015 and move directly to pascal in the end of the year, some months later ? you mean they will advance their roadmp of 1 year ? Im really doubtfull about this for be honest.
 
Straight to Pascal in H2 2015 ? ... so you suggest they will release GM200 Maxwell in H1 2015 and move directly to pascal in the end of the year, some months later ? you mean they will advance their roadmp of 1 year ? Im really doubtfull about this for be honest.

What if GM200 is not an DP oriented chip, but just a gaming chip like GM204? After all its still GM20x and not GM21x. In that case they would need Pascal ASAP for their HPC market. And further exploring this possibility, want makes you think we would see Pascal GeForce right away? Maxwell is already at a node shrink performance without a node shrink.

Its true that nVidia promised a jump in DP per watt for Maxwell, but that does not necessarily mean that total DP performance will evolve by leaps and bounds ;)
 
I just respond to his post.. i dont think we will see Pascal before end of 2016.. ( and if the trend continue this will mean not the big Pascal before 2017 )..
 
Kepler ~6
Maxwell ~11
Pascal ~19

Never really twice as much.

That projection shows single precision SGEMM efficiency.

Kepler GK104 has theoretical FLOPs of 3.1 GFlops at 230 watts. In SGEMM, its hand-tuned maximum performance is an inefficient 2.4 GFlops. This is mostly due to register throughput limiting the number of FMA operations.

Maxwell GM104 has a theoretical FLOPS of 4.6 GFlops at 165 watts. In SGEMM, its hand-tuned maximum performance is nearly perfect at 4.5 GFlops.

So GM104 is 2.6 times as efficient as GK104 in single precision throughput. NVidia greatly exceeded their own projections.
 
That projection shows single precision SGEMM efficiency.

Kepler GK104 has theoretical FLOPs of 3.1 GFlops at 230 watts. In SGEMM, its hand-tuned maximum performance is an inefficient 2.4 GFlops. This is mostly due to register throughput limiting the number of FMA operations.

Maxwell GM104 has a theoretical FLOPS of 4.6 GFlops at 165 watts. In SGEMM, its hand-tuned maximum performance is nearly perfect at 4.5 GFlops.

So GM104 is 2.6 times as efficient as GK104 in single precision throughput. NVidia greatly exceeded their own projections.

GK104's TDP isn't that high.
 
That projection shows single precision SGEMM efficiency.

Kepler GK104 has theoretical FLOPs of 3.1 GFlops at 230 watts. In SGEMM, its hand-tuned maximum performance is an inefficient 2.4 GFlops. This is mostly due to register throughput limiting the number of FMA operations.

Maxwell GM104 has a theoretical FLOPS of 4.6 GFlops at 165 watts. In SGEMM, its hand-tuned maximum performance is nearly perfect at 4.5 GFlops.

So GM104 is 2.6 times as efficient as GK104 in single precision throughput. NVidia greatly exceeded their own projections.

Yes they're obviously for SGEMM I can read it on the left side of the graph, but there's always a certain analogy between SP and DP efficiency. But if you really want to split hair:

* It's called GM204 and not 104.
* Graphs like that are usually comparing the biggest core of each family and it's obviously not an average value across all cores of each family.
* Performance or lower cores traditionally have extremely low FP64 unit amounts in the case of GM204 and GM107 4 SPs/SMM. On GM200 it could be 16x times as many per cluster.
* Big cores have obviously BIG TDPs.

On a Tesla K40 we have now 1.43 TFLOPs with a 235W TDP which would be 6 GFLOPs DP/W and no that's not obviously connected to any of the former.

gtx680 does 3.1TF

True.

GTX680 3.1 at 195W
GTX770 3.2 at 230W
 
Maxwell GM104 has a theoretical FLOPS of 4.6 GFlops at 165 watts. In SGEMM, its hand-tuned maximum performance is nearly perfect at 4.5 GFlops.

So GM104 is 2.6 times as efficient as GK104 in single precision throughput. NVidia greatly exceeded their own projections.

The test duration is 7.xx seconds, are thoses 4.6gflops sustained or is it bursts?

I dont believe that 400MHz frequency uplift will increase TDP by only 12%.

GPU Clock: GPU Clock: 1640 (+400 over default!)
TDP: 112%
Temp: 72C
Volts: 1.225 (default)

Such extraordinary claims require a little more than a post in whatever site, the poster say that he thinks that the TDP will be 165W, this has no scientifical value, he better had used a 10-15$ power meter if he wanted to be credible.

....Hi to everybody at B3D.
 
Also..from what I've heard, there will be no 20nm shrinks either..NV will move straight to Pascal on 16FF sometime in H2'15.

Nothing would please me more, and I'd think NV would be worried about what AMD is doing with HBM, so they may be willing to err on the side of aggressiveness. That said, I'd more expect Erista to ship in 2H15 than Pascal (maybe you were hearing about Parker, not Pascal?).

It's quite sad that GPU's are lagging behind on process nodes...

...and that CPUs are lagging behind on performance...

GM200 need not be clocked lower than GM204 as it wont be power constrained.

Excellent point, hadn't thought about that.
 
Like everyone else and his dog I finally caved in to "a bargain too good to miss" as the advertising folk say, and I am really impressed. I think nvidia have shot themselves in the foot though with the 980, the pricing strategy is just wrong, Of which more later.

Anyhow, getting back to more technical stuff. These new cards are very interesting in overclocking on what is set in the parameters. Core temp is no longer an issue but TDP is.

Here are some testing results, I did it from a historical perspective.

results4.jpg


Back in the old Geforce 4 series days 3dmark nature was the first DX8 test that really stressed video cards. Now it only makes the card do 64% of TDP and the temp is so low. I got 1000 fps for the first time ever this week for it, how times change.

Looking at more modern Futuremark benches 03 nature is DX9 I recall and so is canyon in 06, and they are not that far apart. The TDP is not getting close to even 100%. The MSI card I run has a max of 110%.

Now moving to more modern DX11 stuff it seems like 3dmark 11 GT4 and 3dmark Skydiver both push the card more than the much vaunted Firestrike. Skydiver was the first test to get throttling on the GPU. Seems like a good all round bench for modern cards and the limit.

Finally Furmark. You have to hand it to whoever wrote it. How can a boring old fluffy doughnut cause so much mayhem ! It is a real stress test. I knew I had to downclock it and even so it went below my setting my the biggest margin. Interestingly the TDP is only 109 so did not reach the limit.

But it still downclocked, even though temps were high but not 80C or such either. So that is interesting. Is it nvidia putting a cap on Furmark?

I'll test some games soon.

I have to say though I love this card, it reminds me of the Geforce 4200Ti ... real bargain per buck.

It's funny because the real standout cards in the last 15 years have been the ATi 9700/9800 and the nvidia 8800 GTX, both which blew everyone away with their performance and feature set. But here is a card that does neither, but has such a good combination of speed, overclocking, power consumption, temps and noise.

And my one does 1600Mhz fully stable. More than 1.5Ghz fully stable. Apart from furmark :) Wasn't too long back we were paying $499 for GigaHz versions of cards. How time goes.

The GTX970 is so good though Nvidia underpriced it, and overpriced the 980 I feel.

Mind you, maybe they just wanted to kick AMD in the nads :D

So in summary a really good card to persuade all those people with slightly older cards to pump the credit.
 
Ghandar, i highly suspect theres a driver cap put for Furmark.. in most review i have seen Furmark TDP was lower than their gaming test ... Its not a problem because, it was allready the case before for what i can remind me and well this is Furmark...

For be honest, the 970-980 remind me more the 5850/5870 and 4870.. low TDP, low heat, extreme overclocking.
 
Last edited by a moderator:
Back in the old Geforce 4 series days 3dmark nature was the first DX8 test that really stressed video cards. Now it only makes the card do 64% of TDP and the temp is so low. I got 1000 fps for the first time ever this week for it, how times change.


I'm not sure why that's surprising. Graphics hardware has far outpaced CPUs over the years so we should expect older software to become more CPU limited over time and place less load on the GPU.
 
I'm not sure why that's surprising. Graphics hardware has far outpaced CPUs over the years so we should expect older software to become more CPU limited over time and place less load on the GPU.

Clearly. I think every benchmark released by Futuremark before let say Vantage could be forget.. 2005 was still not multithreaded, and only on the CPU tests for the 2006 version one.

I remember been able to double the fps count at some place on the first test of 3Dmark 2005 only by overclocking the CPU ( 3.8 to 5.2ghz )..
 
See to be pretty much out of stock everywhere. Trying to resist buying a 970 but something tells me AMD won't have anything comparable until early-mid next year and I want something to drive my triple screens that doesn't choke.
 
These things must be selling like hotcakes. Stock was good at least for the first few days after launch.
 
Back
Top