NVIDIA Maxwell Speculation Thread

The problem with luxmark is, just like lots other benchmarks, it only support Open CL, and NVIDIA's Open CL support is very poor, so its not a very good performance indicator for nvidia products thus shouldnt be a benchmark for cross-platform comparsion.

Whilst Folding@home support both Open CL and CUDA routines, thats why I picked it as a performance indicator for cross-platform comparisions.

For Luxmark, i dont think the performance gap is only due to bad driver optimisation.. with bad driver optimisation you can loose 10-15%... maybe 20% in extreme case, but in Luxmark we speak more about an engine who seems suit better the AMD architecture choice at the moment.

nothing tell us, next architecture, or even high end maxwell, will not suit better Luxmark.

On the other part, Folding@home have been designed at the start for Nvidia GPU's, and Nvidia have allways have the upper hand here if i remember well ( by short )

Reviewers cant choose only the benchmark who benefit AMD cards when review an AMD gpu's, or use only CUDA based benchmark for Nvidia gpu's when they review one. But i admit, today,
choosing benchmarks for GPGPU reviews is not a easy task, the benchmark is offtly linked to a software, and developpers have do choice on what method they have favor in the code who can have ofc a big impact on the gpu used.
 
For Luxmark, i dont think the performance gap is only due to bad driver optimisation.. with bad driver optimisation you can loose 10-15%... maybe 20% in extreme case, but in Luxmark we speak more about an engine who seems suit better the AMD architecture choice at the moment.
Nope. Nvidia left the OCL1.1 run-time half-baked long time ago. The architecture itself is more than capable to match or exceed AMD's offerings in this field. For now, Nvidia's approach is CUDA or the highway.
 
Nope. Nvidia left the OCL1.1 run-time half-baked long time ago. The architecture itself is more than capable to match or exceed AMD's offerings in this field. For now, Nvidia's approach is CUDA or the highway.

Regardless of the reasoning, one shouldn't disregard OpenCL benchmarks just because one IHV has decided to support it as little as possible
 
Actually there is a quite popular hetegenous lapack package for Open CL, targetting AMD devices, the same team also wrote a lapack with CUDA, I believe its a quite good performance indicators, especially considering some of the developers also wrote the original lapack.

http://icl.cs.utk.edu/magma/
 
For Luxmark, i dont think the performance gap is only due to bad driver optimisation.. with bad driver optimisation you can loose 10-15%... maybe 20% in extreme case, but in Luxmark we speak more about an engine who seems suit better the AMD architecture choice at the moment.

nothing tell us, next architecture, or even high end maxwell, will not suit better Luxmark.

On the other part, Folding@home have been designed at the start for Nvidia GPU's, and Nvidia have allways have the upper hand here if i remember well ( by short )

Reviewers cant choose only the benchmark who benefit AMD cards when review an AMD gpu's, or use only CUDA based benchmark for Nvidia gpu's when they review one. But i admit, today,
choosing benchmarks for GPGPU reviews is not a easy task, the benchmark is offtly linked to a software, and developpers have do choice on what method they have favor in the code who can have ofc a big impact on the gpu used.

LuxMark has a several thousands of line long kernel, how big is Folding@home kernel ? You are probably comparing two very different kind of applications (not matter if written in OpenCL or CUDA).

Most people seems also to forget that NVIDIA was dominating LuxMark scores with GTX480/580 however they have nearly cut in half their performance with the release of CUDA 4.0. Kepler has also been a step backward compared to Fermi in term of performance for big/complex kernel.

You can check recent CarstenS's article for a proof (i.e. a comparison of LuxMark performance with different NVIDIA driver versions and GPU architectures).
 
Most people seems also to forget that NVIDIA was dominating LuxMark scores with GTX480/580 however they have nearly cut in half their performance with the release of CUDA 4.0. Kepler has also been a step backward compared to Fermi in term of performance for big/complex kernel

need to find an explanation why it happened like that :LOL:
 
Last edited by a moderator:
LuxMark has a several thousands of line long kernel, how big is Folding@home kernel ? You are probably comparing two very different kind of applications (not matter if written in OpenCL or CUDA).

Most people seems also to forget that NVIDIA was dominating LuxMark scores with GTX480/580 however they have nearly cut in half their performance with the release of CUDA 4.0. Kepler has also been a step backward compared to Fermi in term of performance for big/complex kernel.

You can check recent CarstenS's article for a proof (i.e. a comparison of LuxMark performance with different NVIDIA driver versions and GPU architectures).

Based on my programming experience, I dont see any proof suggesting Kepler cannot handle large kernels as good as Fermi, if not better.

Actually Nvidia has a relatively good support for Open CL early, they just drop the support in favor of CUDA later, so its not suprising if relatively speaking Nvidia's older generations of cards can handle luxmark better.
 
Last edited by a moderator:
GTX 860M(GM107) would be quite a bit faster than the GTX 765M(GK106) that went into thin gaming notebooks like the Razer Blade 14 while using a lot less power.

This year's thin gaming notebooks would be very power efficient if Intel didn't delayed Broadwell.
 
Question: What part of the GPU made Fermi unworkable and was kinda-fixed that allowed fixed Fermi to have over 10% more cores at over 10% higher clocks at less power?
Answer: The interconnect

Question: Does GM107 have an interconnect?
Answer: No.

Question: Does GK107 have an interconnect?
Answer: Yes.

In short, GM107 is the best case for Maxwell (outside of Double Percision, NVidia could make that better perf/watt pretty easy I Think). It still is impressive, but not nearly as amazing when you consider the interconnect.

________________________________________________________________________
Onto nodes and process.

you are missing something in the 28 --> 20nm conversion...

20nm "SoC" (ala LPM) will consume 25% BEST CASE, FOR A MOBILE SOC. For higher voltage and clockspeed and size parts, I expect it to be 15% at best.
Now, that 15% is OVER 28NM LPM. Guess what GPUs are currently built on? Oh, yes, HP (or, in some cases, HPM).
TSMC promises up to 30% faster, once more, that is BEST CASE for a MOBILE SOC. More realistically for a higher voltage, clockspeed and size part, you are looking at 15-20%.

Of course, once more, GPUs are typically made on a HP process... Onto few questions and answers:

Question: What is the performance decrease from moving to HP to LPM?
Answer: Nothing exact, but it is noticeable.10%+

Question: How long will 20nm be the leading process from TSMC?
Answer: under a year, probably under 3 quarters.

Question: What will the development costs for 20nm and its successor be?
Answer: About the same

Question: What will the production costs for 20nm and its successor be?
Answer: About the same

Question: What will the improvements from 20nm to its successor be?
Answer: Just as large as the improvements from 28nm --> 20nm, but also with FinFETs coming into play.

Question: Who has complained the most about the cost of 20nm?
Answer: Nvidia.

ALL OF THESE THINGS LEADS ME TO BELIEVE THAT MAXWELL WILL NOT BE ON 20NM FOR ANY CONSUMER LEVEL PARTS. If it is, they will be priced Titan range or above. Any 20nm parts NVidia does (if they do any) will be Tegra, or Professional.

MY OPINION.

from the comments below http://videocardz.com/49824/nvidia-geforce-gtx-860m-performance-leaked

So, now, can anyone say whether that is true 100%, partly true, or false at all?
 
Nvidia said, 2H '14 for "2nd Gen" Maxwell.
Where did they say this?
I think GM108 should be released soon-ish. If anything, I'm surprised it wasn't the first Maxwell chip.
How do you even know there is going to be a GM108?
GTX 860M(GM107) would be quite a bit faster than the GTX 765M(GK106) that went into thin gaming notebooks like the Razer Blade 14 while using a lot less power.
This year's thin gaming notebooks would be very power efficient if Intel didn't delayed Broadwell.
Just because Broadwell is delayed does not mean there will not be a laptop/notebook refresh cycle this year. Usually there is always a refresh around "back to school" season.
from the comments below http://videocardz.com/49824/nvidia-geforce-gtx-860m-performance-leaked

So, now, can anyone say whether that is true 100%, partly true, or false at all?
My responses below
20nm "SoC" (ala LPM) will consume 25% BEST CASE, FOR A MOBILE SOC. For higher voltage and clockspeed and size parts, I expect it to be 15% at best.
Now, that 15% is OVER 28NM LPM. Guess what GPUs are currently built on? Oh, yes, HP (or, in some cases, HPM).
TSMC promises up to 30% faster, once more, that is BEST CASE for a MOBILE SOC. More realistically for a higher voltage, clockspeed and size part, you are looking at 15-20%.
I'm assuming he meant 20SOC will consume 25% less power, and I think that is more or less correct. The rest of the sentences are phrased rather poorly so cant make much out of them. Anyway TSMC's stated figures for 20SoC vs 28HPM are either 15% higher speed at same power, or 30% lower power at same speed.

Source - http://www.eetimes.com/document.asp?doc_id=1319679&page_number=4&piddl_msgpage=4#msgs
Question: What is the performance decrease from moving to HP to LPM?
Answer: Nothing exact, but it is noticeable.10%+
Not sure what exactly he meant here. But if he is talking about 28HP to 20SoC. I'd be surprised if there was a decrease in performance. From the info posted above, TSMC claims 20SoC is 15% faster than 28HPM. Now I do not know if 28HP is faster than 28HPM or if the are about the same. But even if 28 HP is 15% faster, that means 28HP and 20SoC would roughly be the same speed.
Question: How long will 20nm be the leading process from TSMC?
Answer: under a year, probably under 3 quarters.
Quite possible, 16nm is scheduled for volume production in Q4'14
Question: What will the development costs for 20nm and its successor be?
Answer: About the same
Development costs for 16nm should be much lower than for 20nm as 16nm is based on the backend of 20nm.
Question: What will the production costs for 20nm and its successor be?
Answer: About the same
Not true. Wafer cost for 16nm will be higher than 20nm. And as density is only ~5% higher, the cost per transistor will be higher.
Question: What will the improvements from 20nm to its successor be?
Answer: Just as large as the improvements from 28nm --> 20nm, but also with FinFETs coming into play.
Again, dont think this is entirely true. As posted above, TSMC's stated figures for 20SoC vs 28HPM are either 15% higher speed at same power, or 30% lower power at same speed. And for 16FF vs 20SoC, they state that either 20% extra speed gain at same power, or 35% lower power at same speed. So seems like the jump from 20nm->16nm is a bit better.
Question: Who has complained the most about the cost of 20nm?
Answer: Nvidia.
Its not only Nvidia. Maybe Nvidia is the one who has complained the most publicly, but everybody isnt happy about increasing wafer costs.
ALL OF THESE THINGS LEADS ME TO BELIEVE THAT MAXWELL WILL NOT BE ON 20NM FOR ANY CONSUMER LEVEL PARTS. If it is, they will be priced Titan range or above. Any 20nm parts NVidia does (if they do any) will be Tegra, or Professional.
MY OPINION.
Like I have stated in my previous posts, this is what I have heard from a source as well.
 
Where did they say this?
WCCFtech claims that "NVIDIA also confirmed during the conference that they are planning to introduce the GeForce 800 series which is fully based on the Maxwell architecture in second half of 2014." However I haven't found a statement or quote directly from NVIDIA which says so.

GM108 has been mentioned in GPU-Z and ES drivers but that might not be a confirmation.
 
Not sure what exactly he meant here. But if he is talking about 28HP to 20SoC. I'd be surprised if there was a decrease in performance. From the info posted above, TSMC claims 20SoC is 15% faster than 28HPM. Now I do not know if 28HP is faster than 28HPM or if the are about the same. But even if 28 HP is 15% faster, that means 28HP and 20SoC would roughly be the same speed.


Well, I would guess he meant "from HP to LPM", thus meaning the difference between a "proper" 20 nm HP process and the 'unified' 20 nm SoC...
 
Back
Top