NVIDIA Maxwell Speculation Thread

Blazkowicz · Feb 14, 2014

Not really what? Sure always over 50% of what I write on speculation threads ends up wrong.
I was thinking that in the past, GF100 ended up in laptops.

tviceman · Feb 14, 2014

xDxD said:
http://www.3dcenter.org/news/chipna...liche-spezifikationen-der-maxwell-grafikchips

guys what do you think of these speculations? would be realistic a GM200 and GM204 like those?

GK104 had 4x the cores of GK107. I think its as good a guess as any to say GM104 will follow that pattern. Not sure about memory bus and ROP's though.... GK104 really needed 7ghz vram to shinez and even then was still somewhat ROP limited. Maxwell's bus config may also depend on whether gddr6 is ready for use.

Picao84 · Feb 14, 2014

xDxD said:
http://www.3dcenter.org/news/chipna...liche-spezifikationen-der-maxwell-grafikchips

guys what do you think of these speculations? would be realistic a GM200 and GM204 like those?

For what is worth, I found that speculation rather unexciting for the type of efficiency that is being hyped up for Maxwell. If GM206, GM204 and GM200 would be 28nm parts sure, it looks nice, but with a die shrink to 20nm, not so much. Unless they are a complete revolution in power consumption (say GM200 with 200W TDP),having a GM200 performing between GTX Titan SLI and GTX 770 SLI would be par for the course, comparable to the jump between GF110 and GK110. The same thing for a GM204 performing like GK110. After all GM107 is only being deemed impressive because of its efficiency. If that efficiency does not scales up to other levels, as the table seems to not expect, Maxwell will be nothing special from the performance point of view.

DSC · Feb 14, 2014

iMacmatician · Feb 14, 2014

"Maxwell 1st Generation"? So maybe the core configuration will change in the 2nd generation?

DSC · Feb 14, 2014

Yeah, so Maxwell 20nm/16nm FINFET might have more performance per watt.

Power Connectors: None

fellix · Feb 14, 2014

So, the (sub)multiprocessor configuration is confirmed to be 128 ALU lanes?

Alexko · Feb 14, 2014

iMacmatician said:
"Maxwell 1st Generation"? So maybe the core configuration will change in the 2nd generation?

Or maybe just the process, which would still increase performance per watt.

fellix said:
So, the (sub)multiprocessor configuration is confirmed to be 128 ALU lanes?

Yes, at most (and it's unlikely to be less).

DavidGraham · Feb 14, 2014

The Maxwell block shows 128 core alright , but the Kepler block shows 256! .. why is that? I thought they should be 192? are they counting special units?

Blazkowicz · Feb 14, 2014

"Displayport 1.2 (Optional)"
I hope it gets common.

Ailuros · Feb 14, 2014

DavidGraham said:
The Maxwell block shows 128 core alright , but the Kepler block shows 256! .. why is that? I thought they should be 192? are they counting special units?

192 SPs FP32 + 64 SPs FP64 (GK110) = 256

DSC · Feb 14, 2014

Kepler already supported DisplayPort 1.2 but for these cards Nvidia is leaving it up to the AIBs instead of making them mandatory, sigh.

Would rather have 3 DP 1.2 + 1 HDMI 1.4b(not sure if Maxwell has HDMI 2.0 support) as standard rather than outdated DL DVI-D and VGA.

Blazkowicz · Feb 14, 2014

One DL-DVI-I is nice to have as you can use 2.5K DVI displays and VGA displays (CRT, LCD, projector) with no adapter or cheap passive adapter. Then DP can be used for a second such monitor (or even a third one with a MST hub).

Blazkowicz · Feb 14, 2014

Alexko said:
Yes, at most (and it's unlikely to be less).

In a scaled down mobile device, make one SMM with two sub-blocks instead of four?
I'm picturing such GPU with a dual core Cortex-A12, 16bit LPDDR4 or LPDDR3.

Maybe it doesn't make sense because at that point "control logic" and the front-end before it use a great deal of area already. But anyway some < 1W stuff for embedded, low end phones would be interesting and useful.
ROFL I guess that eventually something can sit on a bicycle handlebar, displaying Google Earth on an OLED or other display while being powered by the bike itself.

DavidGraham · Feb 15, 2014

Ailuros said:
192 SPs FP32 + 64 SPs FP64 (GK110) = 256

Thanks .. one more thing, the Maxwell block shows what seems to be an increase in control logic area .. we know Kepler had a (66%) hardware scheduling and (33%) software (just like GF104/114). Does that mean Maxwell will restore that back to 100%?

Novum · Feb 15, 2014

Your percentages are weird. Kepler had four Warp schedulers per SMX and each one could issue two instructions per clock if they are independent. So how it gets 6 instructions to issue is flexible.

Anyway, yes, it seems that Maxwell does away with co issue and each CU has one scheduler for 32 ALUs.

spworley · Feb 15, 2014

NVidia released CUDA 6.0 RC this week. Allanmac from the NVIDIA CUDA forums and I were poking through the new include files. An interesting addition is in the new cuda_occupancy.h include.

It shows that sm_50 increases the maximum number of resident blocks per SM to 32 (from sm_35's 16). We don't know if the upcoming GM107 is sm_50 or not, but it doesn't seem likely.

The more interesting detail is the reveal of a new architecture type, sm_37, with a different minimum shared memory size per SM of 80K (81920 bytes). Current sm_30 and sm_35 maximum shared memory is only 48K (and its minimum is 16K when you configure it to prefer L1). This sm_37 device is labeled as "GK210".

Finally, sm_50 may not have an L1/shared memory split. This is a tenuous conclusion based on the fact that in this include file, sm_50 does not use the L1/shared cache hints at all, unlike older architectures.

sm_50's shared memory size is not listed in the occupancy include file.

It's hard to interpret this, but such tenuous clues are great fodder for speculation threads such as this. And any sm_50 predictions are likely especially shaky.

CUDA 6.0 cuda_occupancy.h said:
#define MIN_SHARED_MEM_PER_SM (16384)
#define MIN_SHARED_MEM_PER_SM_GK210 (81920)

int sharedMemPerMultiprocessorLow = (properties->major==3 && properties->minor==7)
? MIN_SHARED_MEM_PER_SM_GK210
: MIN_SHARED_MEM_PER_SM ;

itaru · Feb 15, 2014

GK210=GK20A=tegra K1 ??

DSC · Feb 15, 2014

Does this mean Maxwell will have 96KB or 128KB configurable L1 cache?

GM107 is Maxwell, the slide clearly states it.

Ailuros · Feb 15, 2014

DavidGraham said:
Thanks .. one more thing, the Maxwell block shows what seems to be an increase in control logic area .. we know Kepler had a (66%) hardware scheduling and (33%) software (just like GF104/114). Does that mean Maxwell will restore that back to 100%?

What is really awkward in those charts is that they're comparing a GK110 cluster with a GM107 cluster; I do get the point the slide is trying to make even with 192 vs. 4*32. Are there no dedicated FP units in Maxwell or was some of the marketing guys just to overeager and thought 256 look "prettier" on the left?

NVIDIA Maxwell Speculation Thread

Blazkowicz

tviceman

Picao84

DSC

iMacmatician

DSC

fellix

Alexko

DavidGraham

Blazkowicz

Ailuros

Epsilon plus three

DSC

Blazkowicz

Blazkowicz

DavidGraham

Novum

spworley

itaru

DSC

Ailuros

Epsilon plus three

Similar threads