NVIDIA Maxwell Speculation Thread

According to sebbbi, FP16 makes sense to reduce memory bandwidth in some operations where the difference between using FP32 and FP16 is almost non-existent to the naked eye.
The fact that this "Maxwell 1.1" from TX1 can actually make two FP16 operations per core (if the operation is the same) is an added bonus.

So I'd say yeah, I bet future Maxwell iterations should have this feature too.
 
I guess FP16 on the desktop would make sense today where even desktop cards are power constrained - after all AMD's Tonga can already do that (though I don't know how and how fast exactly, AMD still hasn't published the ISA docs, I don't think it's used at all currently).
The problem is probably more one of APIs, namely Tegra is used in devices using GLES almost exclusively which has precision qualifiers (and in fact mediump which requires just fp16 is the default for fragment shaders). But obviously the desktop parts need to work in d3d environments and d3d10/11 (hlsl) does not have those precision qualifiers (not that nvidia honored required precision of apis in the past, but I doubt they want to go back there...). Theoretically you could try to convert some fp32 operations to fp16 where you can guarantee the results will stay the same but that's probably too limited in general.
 
NVIDIA has introduced the GTX 965M, which has 1024 CCs and a 128-bit bus. According to Videocardz, the part uses the GM204.
Videocardz took that info from a german review in Notebookcheck from a Clevo model with a GTX 965M.
With GM206 being just around the corner, I don't think nVidia would launch yet another mobile GPU with so many laser-cut units that falls below the lower-end chip's performance.

I'll rather believe that notebookcheck isn't aware that the GM206 was going to be made available so they just assumed it was yet another GM204.

That or the GM206 isn't ready yet and we'll see two versions of the of the GTX 965M: one with the severely cut GM204 and another with the GM206.
 
That or the GM206 isn't ready yet and we'll see two versions of the of the GTX 965M: one with the severely cut GM204 and another with the GM206.
Wouldn't be a first, they've even had mobile chips with exact same names with one using Kepler and one Maxwell
 
According to sebbbi, FP16 makes sense to reduce memory bandwidth in some operations where the difference between using FP32 and FP16 is almost non-existent to the naked eye.
The fact that this "Maxwell 1.1" from TX1 can actually make two FP16 operations per core (if the operation is the same) is an added bonus.
Practically nobody uses FP32 storage formats for pixel data (such as render targets or HDR textures). FP16 and FP11/10 (R11G11B10F) are the most common render target (and post processing buffer) formats. DirectX 11 added new compressed FP texture format (BC6H). It (obviously) doesn't even match FP16 in quality. Vertex positions are often nowadays stored as signed integers (16 bit) and UVs as 16 bit (FP16 or 16 bit int). Vertex tangents can be most efficiently stored as normalized quaternions (R10G10B10A2 signed normalized integer). Position transform math needs FP32 ALU. Normal/tangent transform math is fine with FP16. Most post processing is fine with FP16, and so is big parts of the lighting math (not all of it). FP16 obviously needs more development work (to ensure that quality is not lost).
 
I think the assumption of GM204 GTX 965M comes from the product images - it does use GM204 in those, just like all the other NVIDIA mobile GPU product images use the right GPUs
 
This gets interesting - are there so many partly-functioning GM204 chips out there that they chose to use it? It's the only reason I can really figure out this close to GM206 release
It doesn't have to be a lot of them: they could sell laptop GTX965M with either gm204 or gm206 and nobody would really care (except some who wouldn't buy them anyway.) It'd just be a slightly different MXM plugin board if MXM still exists.
I don't think they could do that kind of stuff with discrete desktop GPUs, at least not with relatively high-end ones.
 
I would think that a GM204-based GTX 965 for desktop could make for a nice bridge between the rumored GTX 960 and the 970 with a 192-bit mem-config and 10-11 SMM.
 
Practically nobody uses FP32 storage formats for pixel data (such as render targets or HDR textures). FP16 and FP11/10 (R11G11B10F) are the most common render target (and post processing buffer) formats. DirectX 11 added new compressed FP texture format (BC6H). It (obviously) doesn't even match FP16 in quality. Vertex positions are often nowadays stored as signed integers (16 bit) and UVs as 16 bit (FP16 or 16 bit int). Vertex tangents can be most efficiently stored as normalized quaternions (R10G10B10A2 signed normalized integer). Position transform math needs FP32 ALU. Normal/tangent transform math is fine with FP16. Most post processing is fine with FP16, and so is big parts of the lighting math (not all of it). FP16 obviously needs more development work (to ensure that quality is not lost).

But what is the current usage of FP16 on the desktop today? If Nvidia had introduced FP16 with desktop Maxwell as well..would it have resulted in higher performance in some cases?
This gets interesting - are there so many partly-functioning GM204 chips out there that they chose to use it? It's the only reason I can really figure out this close to GM206 release

That's what I'm wondering as well. Given that the 28nm process is so mature..are there really that many defects that they have to resort to cutting down half the chip? Possibly its just that the majority of chips will be GM206 and if they do have any GM204 chips which are defective and have to be cut down so much..NV will use those chips whenever they can.

The only other reason I can think of is that Intel just released Broadwell chips meant to be used in regular laptops and the refresh cycle hits this quarter. Maybe GM206 wasn't ready in time and Nvidia made this chip for the laptop manufacturers to design their refreshes and they would sell the cut down chips until GM206 is available.
 
But what is the current usage of FP16 on the desktop today? If Nvidia had introduced FP16 with desktop Maxwell as well..would it have resulted in higher performance in some cases?
I wouldn't expect big gains for the high end desktop GPU models that have lots of ALU to spare (compared to consoles), but laptop GPUs would definitely see noticeable performance gains (while at the same time reducing the power draw). Obviously if we would have 2x more ALU available (compared to BW and TMU) in the long run it would mean that spending ALU would be more beneficial for some operations where lookup tables are used right now. This would give bigger gains, but obviously would need new software (or patches to old software).
 
Could deskop compositing (i.e., the likes of Windows Aero) be done with FP16 instructions? There you'd get some small power savings.
 
One of the reasons F16 hardware acceleration didn't pop up in desktop parts before now was API support.
Minprec didn't arrive until DX11.1 / Windows 8 / Windows 10 which hasn't got a huge install base yet...

http://msdn.microsoft.com/en-us/library/windows/desktop/hh968108(v=vs.85).aspx
Oh when I mentioned missing API support I totally missed that indeed d3d11.1 has that so it's not quite as bad as I thought...

Could deskop compositing (i.e., the likes of Windows Aero) be done with FP16 instructions? There you'd get some small power savings.
The shaders used by compositing in windows 8 have no complexity at all, they are mostly of the sort sample (maybe a mul) - that's it. So yes it can certainly be done with FP16 accuracy, but OTOH they don't need to be done with explicit fp16 support (because both the power consumption used by alus is probably just about nothing compared to sample/output, and also the driver could trivially figure out fp16 precision is enough with rgba8 input and output and such a shader).
Win7 had some blur shaders which might benefit maybe (they are also sample heavy though) but they are gone with Win8.
That's at least for basic compositing, maybe there's other shaders used somewhere where this might help.
 
One of the reasons F16 hardware acceleration didn't pop up in desktop parts before now was API support.
Minprec didn't arrive until DX11.1 / Windows 8 / Windows 10 which hasn't got a huge install base yet...

http://msdn.microsoft.com/en-us/library/windows/desktop/hh968108(v=vs.85).aspx
DX9 had "half" type in HLSL for FP16 ALU (it was useful for 7000 series Nvidia cards). For some reason DX10 dropped the support. As you said FP16 is again supported in DX11.1 but it requires Windows 8.
 
Well for integer it definitely could work (int24 is faster than int32), but I don't think that any desktop GPU at the moment has hardware support for <FP32. Correct me if I'm wrong.
 
Back
Top