NVIDIA Maxwell Speculation Thread

Deleted member 13524 · Jan 7, 2015

According to sebbbi, FP16 makes sense to reduce memory bandwidth in some operations where the difference between using FP32 and FP16 is almost non-existent to the naked eye.
The fact that this "Maxwell 1.1" from TX1 can actually make two FP16 operations per core (if the operation is the same) is an added bonus.

So I'd say yeah, I bet future Maxwell iterations should have this feature too.

iMacmatician · Jan 7, 2015

NVIDIA has introduced the GTX 965M, which has 1024 CCs and a 128-bit bus. According to Videocardz, the part uses the GM204.

mczak · Jan 7, 2015

I guess FP16 on the desktop would make sense today where even desktop cards are power constrained - after all AMD's Tonga can already do that (though I don't know how and how fast exactly, AMD still hasn't published the ISA docs, I don't think it's used at all currently).
The problem is probably more one of APIs, namely Tegra is used in devices using GLES almost exclusively which has precision qualifiers (and in fact mediump which requires just fp16 is the default for fragment shaders). But obviously the desktop parts need to work in d3d environments and d3d10/11 (hlsl) does not have those precision qualifiers (not that nvidia honored required precision of apis in the past, but I doubt they want to go back there...). Theoretically you could try to convert some fp32 operations to fp16 where you can guarantee the results will stay the same but that's probably too limited in general.

Deleted member 13524 · Jan 7, 2015

iMacmatician said:
NVIDIA has introduced the GTX 965M, which has 1024 CCs and a 128-bit bus. According to Videocardz, the part uses the GM204.

Videocardz took that info from a german review in Notebookcheck from a Clevo model with a GTX 965M.
With GM206 being just around the corner, I don't think nVidia would launch yet another mobile GPU with so many laser-cut units that falls below the lower-end chip's performance.

I'll rather believe that notebookcheck isn't aware that the GM206 was going to be made available so they just assumed it was yet another GM204.

That or the GM206 isn't ready yet and we'll see two versions of the of the GTX 965M: one with the severely cut GM204 and another with the GM206.

Kaotik · Jan 7, 2015

ToTTenTranz said:
That or the GM206 isn't ready yet and we'll see two versions of the of the GTX 965M: one with the severely cut GM204 and another with the GM206.

Wouldn't be a first, they've even had mobile chips with exact same names with one using Kepler and one Maxwell

sebbbi · Jan 7, 2015

ToTTenTranz said:
According to sebbbi, FP16 makes sense to reduce memory bandwidth in some operations where the difference between using FP32 and FP16 is almost non-existent to the naked eye.
The fact that this "Maxwell 1.1" from TX1 can actually make two FP16 operations per core (if the operation is the same) is an added bonus.

Practically nobody uses FP32 storage formats for pixel data (such as render targets or HDR textures). FP16 and FP11/10 (R11G11B10F) are the most common render target (and post processing buffer) formats. DirectX 11 added new compressed FP texture format (BC6H). It (obviously) doesn't even match FP16 in quality. Vertex positions are often nowadays stored as signed integers (16 bit) and UVs as 16 bit (FP16 or 16 bit int). Vertex tangents can be most efficiently stored as normalized quaternions (R10G10B10A2 signed normalized integer). Position transform math needs FP32 ALU. Normal/tangent transform math is fine with FP16. Most post processing is fine with FP16, and so is big parts of the lighting math (not all of it). FP16 obviously needs more development work (to ensure that quality is not lost).

Kaotik · Jan 7, 2015

I think the assumption of GM204 GTX 965M comes from the product images - it does use GM204 in those, just like all the other NVIDIA mobile GPU product images use the right GPUs

Kaarlisk · Jan 7, 2015

NV confirmed GM204.
http://techreport.com/news/27626/geforce-gtx-965m-quietly-joins-nvidia-mobile-lineup

We asked Nvidia to tell us more about the GTX 965M, and it turns out this part is based on the same GM204 GPU that powers the rest of the GTX 900M series, along with the desktop GTX 970 and 980. Nvidia simply disabled some of the GM204's functional units to achieve the reduced SP count and narrower memory interface on the GTX 965M.

Kaotik · Jan 8, 2015

Kaarlisk said:
NV confirmed GM204.
http://techreport.com/news/27626/geforce-gtx-965m-quietly-joins-nvidia-mobile-lineup

This gets interesting - are there so many partly-functioning GM204 chips out there that they chose to use it? It's the only reason I can really figure out this close to GM206 release

silent_guy · Jan 8, 2015

Kaotik said:
This gets interesting - are there so many partly-functioning GM204 chips out there that they chose to use it? It's the only reason I can really figure out this close to GM206 release

It doesn't have to be a lot of them: they could sell laptop GTX965M with either gm204 or gm206 and nobody would really care (except some who wouldn't buy them anyway.) It'd just be a slightly different MXM plugin board if MXM still exists.
I don't think they could do that kind of stuff with discrete desktop GPUs, at least not with relatively high-end ones.

CarstenS · Jan 8, 2015

I would think that a GM204-based GTX 965 for desktop could make for a nice bridge between the rumored GTX 960 and the 970 with a 192-bit mem-config and 10-11 SMM.

Erinyes · Jan 8, 2015

sebbbi said:
Practically nobody uses FP32 storage formats for pixel data (such as render targets or HDR textures). FP16 and FP11/10 (R11G11B10F) are the most common render target (and post processing buffer) formats. DirectX 11 added new compressed FP texture format (BC6H). It (obviously) doesn't even match FP16 in quality. Vertex positions are often nowadays stored as signed integers (16 bit) and UVs as 16 bit (FP16 or 16 bit int). Vertex tangents can be most efficiently stored as normalized quaternions (R10G10B10A2 signed normalized integer). Position transform math needs FP32 ALU. Normal/tangent transform math is fine with FP16. Most post processing is fine with FP16, and so is big parts of the lighting math (not all of it). FP16 obviously needs more development work (to ensure that quality is not lost).

But what is the current usage of FP16 on the desktop today? If Nvidia had introduced FP16 with desktop Maxwell as well..would it have resulted in higher performance in some cases?

Kaotik said:
This gets interesting - are there so many partly-functioning GM204 chips out there that they chose to use it? It's the only reason I can really figure out this close to GM206 release

That's what I'm wondering as well. Given that the 28nm process is so mature..are there really that many defects that they have to resort to cutting down half the chip? Possibly its just that the majority of chips will be GM206 and if they do have any GM204 chips which are defective and have to be cut down so much..NV will use those chips whenever they can.

The only other reason I can think of is that Intel just released Broadwell chips meant to be used in regular laptops and the refresh cycle hits this quarter. Maybe GM206 wasn't ready in time and Nvidia made this chip for the laptop manufacturers to design their refreshes and they would sell the cut down chips until GM206 is available.

sebbbi · Jan 8, 2015

Erinyes said:
But what is the current usage of FP16 on the desktop today? If Nvidia had introduced FP16 with desktop Maxwell as well..would it have resulted in higher performance in some cases?

I wouldn't expect big gains for the high end desktop GPU models that have lots of ALU to spare (compared to consoles), but laptop GPUs would definitely see noticeable performance gains (while at the same time reducing the power draw). Obviously if we would have 2x more ALU available (compared to BW and TMU) in the long run it would mean that spending ALU would be more beneficial for some operations where lookup tables are used right now. This would give bigger gains, but obviously would need new software (or patches to old software).

Deleted member 2197 · Jan 8, 2015

Slight OT ...

MSI GT80 Notebook with 2 GTX980M GPU's .... MSI promises upgradability for new 2 GPU generations.
http://www.kitguru.net/laptops/zardon/msi-gt80-titan-laptop-internal-shots-from-pre-retail-sample/

PixResearch · Jan 8, 2015

One of the reasons F16 hardware acceleration didn't pop up in desktop parts before now was API support.
Minprec didn't arrive until DX11.1 / Windows 8 / Windows 10 which hasn't got a huge install base yet...

http://msdn.microsoft.com/en-us/library/windows/desktop/hh968108(v=vs.85).aspx

Blazkowicz · Jan 8, 2015

Could deskop compositing (i.e., the likes of Windows Aero) be done with FP16 instructions? There you'd get some small power savings.

mczak · Jan 9, 2015

PixResearch said:
One of the reasons F16 hardware acceleration didn't pop up in desktop parts before now was API support.
Minprec didn't arrive until DX11.1 / Windows 8 / Windows 10 which hasn't got a huge install base yet...

http://msdn.microsoft.com/en-us/library/windows/desktop/hh968108(v=vs.85).aspx

Oh when I mentioned missing API support I totally missed that indeed d3d11.1 has that so it's not quite as bad as I thought...

Blazkowicz said:
Could deskop compositing (i.e., the likes of Windows Aero) be done with FP16 instructions? There you'd get some small power savings.

The shaders used by compositing in windows 8 have no complexity at all, they are mostly of the sort sample (maybe a mul) - that's it. So yes it can certainly be done with FP16 accuracy, but OTOH they don't need to be done with explicit fp16 support (because both the power consumption used by alus is probably just about nothing compared to sample/output, and also the driver could trivially figure out fp16 precision is enough with rgba8 input and output and such a shader).
Win7 had some blur shaders which might benefit maybe (they are also sample heavy though) but they are gone with Win8.
That's at least for basic compositing, maybe there's other shaders used somewhere where this might help.

sebbbi · Jan 10, 2015

PixResearch said:
One of the reasons F16 hardware acceleration didn't pop up in desktop parts before now was API support.
Minprec didn't arrive until DX11.1 / Windows 8 / Windows 10 which hasn't got a huge install base yet...

http://msdn.microsoft.com/en-us/library/windows/desktop/hh968108(v=vs.85).aspx

DX9 had "half" type in HLSL for FP16 ALU (it was useful for 7000 series Nvidia cards). For some reason DX10 dropped the support. As you said FP16 is again supported in DX11.1 but it requires Windows 8.

Jawed · Jan 10, 2015

Has anyone seen any evidence that desktop cards are faster in games with FP16 shader/compute code?

Novum · Jan 10, 2015

Well for integer it definitely could work (int24 is faster than int32), but I don't think that any desktop GPU at the moment has hardware support for <FP32. Correct me if I'm wrong.

NVIDIA Maxwell Speculation Thread

Deleted member 13524

Guest

iMacmatician

mczak

Deleted member 13524

Guest

Kaotik

Drunk Member

sebbbi

Kaotik

Drunk Member

Kaarlisk

Kaotik

Drunk Member

silent_guy

CarstenS

Moderator

Erinyes

sebbbi

Deleted member 2197

Guest

PixResearch

Blazkowicz

mczak

sebbbi

Jawed

Novum

Similar threads