NVIDIA Maxwell Speculation Thread

I've always assumed that the new items in GK110 were in GK208 and Tegra K1 already, though lately I read somewhere you can't use "HyperQ".

/* edit : depends on what we refer to as GK110 items exactly?, stuff like more DP units and ECC caches everywhere would be "GK110 items" not in any other GPUs. */


Alright, Tegra K1 has only "compute capability" 3.2, versus 3.5 on GK110 and GK208, 3.0 on GK10x.
Maybe HyperQ is the one thing missing on Tegra else it's like GK110.
It's all listed there, with of course GM10x at 5.0 capability but now GM204 at 5.2, which I didn't know.

https://developer.nvidia.com/cuda-gpus

There's a strong "overview" of the capability levels here, seems to tell most things, except there's nothing about levels 3.2 and 5.2 (or the elusive 3.7). Anyway I've always thought a bigger number means a superset of the features of lower numbers.
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities
 
/* edit : depends on what we refer to as GK110 items exactly?, stuff like more DP units and ECC caches everywhere would be "GK110 items" not in any other GPUs. */

Drat, yes, I was thinking of ECC and DP rather than the addl features of GK110, as I assumed that the former would represent the vast majority of the additional area. Now that you force me to consider it, though, I'm not sure to what degree that represents reality.
 
Could GM204 have ECC paths everywhere? (where needed). If that makes sense for niche markets where you want a lot of HPC with correct results but where FP32 is enough..

Anyway, if GM200 has any new features, supposedly related to computing then some documentation somewhere should give it away as "capabilities 5.5" or similar.
 
I've read (can't find the link now) that some CUDA doc says GM200 will have half DP rate instead of 1/3 on GK110.
Can be possible that GM200 gets ride off FP64 units and use double pass FP32 ? If it's the case, it will be a big change from Kepler.
 
I've read (can't find the link now) that some CUDA doc says GM200 will have half DP rate instead of 1/3 on GK110.
Can be possible that GM200 gets ride off FP64 units and use double pass FP32 ? If it's the case, it will be a big change from Kepler.

If GM200 should have the same amount of DP units as GK110 per cluster, then it is actually a 1:2 ratio since each Maxwell SMM has 128SPs.
 
Apparently the Gigabyte GTX 980 Windforce and Gigabyte GTX 970 Windforce cards that Tom's tested have a much higher power consumption limit than the NVIDIA reference GTX 980 and reference GTX 970, which gets reflected in the GPGPU "Torture Test" (although, interestingly enough, doesn't appear to be noticeable in the Gaming Tests). Tom's has updated their article to reflect that:

http://www.tomshardware.com/reviews/nvidia-geforce-gtx-980-970-maxwell,3941-13.html

103-Overview-Power-Consumption-Torture.png
 
Apparently the Gigabyte GTX 980 Windforce and Gigabyte GTX 970 Windforce cards that Tom's tested have a much higher power consumption limit than the NVIDIA reference GTX 980 and reference GTX 970, which gets reflected in the GPGPU "Torture Test" (although, interestingly enough, doesn't appear to be noticeable in the Gaming Tests). Tom's has updated their article to reflect that:

http://www.tomshardware.com/reviews/nvidia-geforce-gtx-980-970-maxwell,3941-13.html
What "reference" GTX970s?
Forgot to mention that they removed it per Nvidia.
 
Tom's measured a reference GTX 980 but never actually measured a reference GTX 970 (they tried to simulate the perf. of a reference GTX 970 by downclocking a Gigabyte Windforce card, but the simulated results were nonsensical because consumption was more than 60w higher in the GPGPU "Torture Test" than the actual reference GTX 980 that they measured, so the simulated data was pulled while they wait to measure a GTX 970 that stays true to the actual reference design and reference power target spec.

Again, the moral of the story here is that the Gigabyte Windforce cards have a much higher power target than the reference cards, and this only becomes evident in the GPGPU "Torture Test" which pushes all CUDA compute cores to the power target limit.
 
Last edited by a moderator:
What he means is cards using the reference power target spec.
Exactly.
Reference clocks - core/boost/memory
Reference PCB
Reference TDP
Blower-shroud - a reduced cost alternative to the reference Titan cooler

Seems some people make a career out of maintaining an agenda.
 
Tom's measured a reference GTX 980 but never actually measured a reference GTX 970 (they tried to simulate the perf. of a reference GTX 970 by downclocking a Gigabyte Windforce card, but the simulated results were nonsensical because consumption was more than 60w higher in the GPGPU "Torture Test" than the actual reference GTX 980 that they measured, so the simulated data was pulled while they wait to measure a GTX 970 that stays true to the actual reference design and reference power target spec.

Again, the moral of the story here is that the Gigabyte Windforce cards have a much higher power target than the reference cards, and this only becomes evident in the GPGPU "Torture Test" which pushes all CUDA compute cores to the power target limit.

Not necessarily, in hardware.fr tests anno2070 had the same effect.

http://www.hardware.fr/articles/928-7/consommation-efficacite-energetique.html
 
Now while the GM204-A1 die is based on the 28nm process, and as I have previously mentioned before sources inside the industry tell me Nvidia is thinking about porting the circuit to 20nm in 2015.

wccftech.com/nvidia-geforce-gtx-980-ti-gtx-titan-x-coming/

How many chances that this happens, in your opinion guys?
 
Now while the GM204-A1 die is based on the 28nm process, and as I have previously mentioned before sources inside the industry tell me Nvidia is thinking about porting the circuit to 20nm in 2015.

wccftech.com/nvidia-geforce-gtx-980-ti-gtx-titan-x-coming/

How many chances that this happens, in your opinion guys?

well its a die they know well so it could happen. The performance on 28nm is great and a shrink while targeting the same performance should make it cheaper and use less power thus making the whole cost lower. So it would be a nice lower end replacement. Slot 20nm 970 into the $100 market and the 980 into the $200 market.
 
well its a die they know well so it could happen. The performance on 28nm is great and a shrink while targeting the same performance should make it cheaper and use less power thus making the whole cost lower. So it would be a nice lower end replacement. Slot 20nm 970 into the $100 market and the 980 into the $200 market.

I doubt those price points will be reached for quite some time.

They'll likely use 20nm for a higher end part with faster clocks at first while the fabbing costs are still high and move every part down a tier later in the year with the next series (1000 series?) and the release of 'mainstream' gm200, ala the 700 series.
 
Last edited by a moderator:
Now while the GM204-A1 die is based on the 28nm process, and as I have previously mentioned before sources inside the industry tell me Nvidia is thinking about porting the circuit to 20nm in 2015.

wccftech.com/nvidia-geforce-gtx-980-ti-gtx-titan-x-coming/

How many chances that this happens, in your opinion guys?

Only 2560-2816 CUDA cores? thats pretty weak for a card called Titan 2.

It could barely beat a Titan black at FP32 and for that to happen, the compute load must get lots FMA involved.

Althrough it may be a little better at FP64 if NVIDIA raise DP:SP ratio to 1:2 on Maxwell Tesla like their CUDA tool suggests.

Dont talk about efficiency stuff, most HPC guys can write efficient codes with Kepler already.

Intel's KNL will raise the Gflops to close to 4TGflops on KNL, so Nvidia will have a pretty steep hill to climb next year.

I know the story that Apple eat all the 20nm orders for their new ugly phone, but could not NVIDIA make their next generation computing cards on a 20nm node? afterall these chips are unlikely to be priced cheaply, so they can afford a little bit more expensive process to build them.
 
Last edited by a moderator:
Only 2560-2816 CUDA cores? thats pretty weak for a card called Titan 2.

GTX 980 has 2048 CUDA cores, but the they are faster than 2880 CUDA cores found in the GTX 780 Ti.

2560-2816 sounds on point for card that will be few hundred bucks more expensive than GTX980.
 
well its a die they know well so it could happen. The performance on 28nm is great and a shrink while targeting the same performance should make it cheaper and use less power thus making the whole cost lower. So it would be a nice lower end replacement. Slot 20nm 970 into the $100 market and the 980 into the $200 market.

Uh, not sure what you're thinking. If there's a $100 card based on 20nm Maxwell GPU, it would rather be a GM307 or GM217, roughly a shrink and slight update of GM107, with less than half the performance of GTX 970 at that price point. At best, a "geforce GTX 1050 non Ti" with 2GB and that may be stretching it.

A shrink with lower power is also now a competitive advantage : no reason to sell it cheaper. By your logic Ivy Bridge should be cheaper than Sandy Bridge but that wasn't the case. That a 20nm chip is cheaper than a 28nm one even is questionable.
 
Back
Top