NVIDIA Maxwell Speculation Thread

Discussion in 'Architecture and Products' started by Arun, Feb 9, 2011.

Tags:
  1. According to sebbbi, FP16 makes sense to reduce memory bandwidth in some operations where the difference between using FP32 and FP16 is almost non-existent to the naked eye.
    The fact that this "Maxwell 1.1" from TX1 can actually make two FP16 operations per core (if the operation is the same) is an added bonus.

    So I'd say yeah, I bet future Maxwell iterations should have this feature too.
     
  2. iMacmatician

    Regular

    Joined:
    Jul 24, 2010
    Messages:
    797
    Likes Received:
    223
    NVIDIA has introduced the GTX 965M, which has 1024 CCs and a 128-bit bus. According to Videocardz, the part uses the GM204.
     
  3. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    I guess FP16 on the desktop would make sense today where even desktop cards are power constrained - after all AMD's Tonga can already do that (though I don't know how and how fast exactly, AMD still hasn't published the ISA docs, I don't think it's used at all currently).
    The problem is probably more one of APIs, namely Tegra is used in devices using GLES almost exclusively which has precision qualifiers (and in fact mediump which requires just fp16 is the default for fragment shaders). But obviously the desktop parts need to work in d3d environments and d3d10/11 (hlsl) does not have those precision qualifiers (not that nvidia honored required precision of apis in the past, but I doubt they want to go back there...). Theoretically you could try to convert some fp32 operations to fp16 where you can guarantee the results will stay the same but that's probably too limited in general.
     
  4. Videocardz took that info from a german review in Notebookcheck from a Clevo model with a GTX 965M.
    With GM206 being just around the corner, I don't think nVidia would launch yet another mobile GPU with so many laser-cut units that falls below the lower-end chip's performance.

    I'll rather believe that notebookcheck isn't aware that the GM206 was going to be made available so they just assumed it was yet another GM204.

    That or the GM206 isn't ready yet and we'll see two versions of the of the GTX 965M: one with the severely cut GM204 and another with the GM206.
     
  5. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,245
    Likes Received:
    4,465
    Location:
    Finland
    Wouldn't be a first, they've even had mobile chips with exact same names with one using Kepler and one Maxwell
     
  6. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    Practically nobody uses FP32 storage formats for pixel data (such as render targets or HDR textures). FP16 and FP11/10 (R11G11B10F) are the most common render target (and post processing buffer) formats. DirectX 11 added new compressed FP texture format (BC6H). It (obviously) doesn't even match FP16 in quality. Vertex positions are often nowadays stored as signed integers (16 bit) and UVs as 16 bit (FP16 or 16 bit int). Vertex tangents can be most efficiently stored as normalized quaternions (R10G10B10A2 signed normalized integer). Position transform math needs FP32 ALU. Normal/tangent transform math is fine with FP16. Most post processing is fine with FP16, and so is big parts of the lighting math (not all of it). FP16 obviously needs more development work (to ensure that quality is not lost).
     
    Grall likes this.
  7. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,245
    Likes Received:
    4,465
    Location:
    Finland
    I think the assumption of GM204 GTX 965M comes from the product images - it does use GM204 in those, just like all the other NVIDIA mobile GPU product images use the right GPUs
     
  8. Kaarlisk

    Regular Subscriber

    Joined:
    Mar 22, 2010
    Messages:
    293
    Likes Received:
    49
    NV confirmed GM204.
    http://techreport.com/news/27626/geforce-gtx-965m-quietly-joins-nvidia-mobile-lineup
     
  9. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,245
    Likes Received:
    4,465
    Location:
    Finland
  10. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,382
    It doesn't have to be a lot of them: they could sell laptop GTX965M with either gm204 or gm206 and nobody would really care (except some who wouldn't buy them anyway.) It'd just be a slightly different MXM plugin board if MXM still exists.
    I don't think they could do that kind of stuff with discrete desktop GPUs, at least not with relatively high-end ones.
     
  11. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    I would think that a GM204-based GTX 965 for desktop could make for a nice bridge between the rumored GTX 960 and the 970 with a 192-bit mem-config and 10-11 SMM.
     
  12. Erinyes

    Regular

    Joined:
    Mar 25, 2010
    Messages:
    808
    Likes Received:
    276
    But what is the current usage of FP16 on the desktop today? If Nvidia had introduced FP16 with desktop Maxwell as well..would it have resulted in higher performance in some cases?
    That's what I'm wondering as well. Given that the 28nm process is so mature..are there really that many defects that they have to resort to cutting down half the chip? Possibly its just that the majority of chips will be GM206 and if they do have any GM204 chips which are defective and have to be cut down so much..NV will use those chips whenever they can.

    The only other reason I can think of is that Intel just released Broadwell chips meant to be used in regular laptops and the refresh cycle hits this quarter. Maybe GM206 wasn't ready in time and Nvidia made this chip for the laptop manufacturers to design their refreshes and they would sell the cut down chips until GM206 is available.
     
  13. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    I wouldn't expect big gains for the high end desktop GPU models that have lots of ALU to spare (compared to consoles), but laptop GPUs would definitely see noticeable performance gains (while at the same time reducing the power draw). Obviously if we would have 2x more ALU available (compared to BW and TMU) in the long run it would mean that spending ALU would be more beneficial for some operations where lookup tables are used right now. This would give bigger gains, but obviously would need new software (or patches to old software).
     
  14. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,894
    Likes Received:
    4,548
  15. PixResearch

    Regular

    Joined:
    May 20, 2010
    Messages:
    187
    Likes Received:
    47
    Location:
    London, UK
  16. Blazkowicz

    Legend

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    Could deskop compositing (i.e., the likes of Windows Aero) be done with FP16 instructions? There you'd get some small power savings.
     
  17. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    Oh when I mentioned missing API support I totally missed that indeed d3d11.1 has that so it's not quite as bad as I thought...

    The shaders used by compositing in windows 8 have no complexity at all, they are mostly of the sort sample (maybe a mul) - that's it. So yes it can certainly be done with FP16 accuracy, but OTOH they don't need to be done with explicit fp16 support (because both the power consumption used by alus is probably just about nothing compared to sample/output, and also the driver could trivially figure out fp16 precision is enough with rgba8 input and output and such a shader).
    Win7 had some blur shaders which might benefit maybe (they are also sample heavy though) but they are gone with Win8.
    That's at least for basic compositing, maybe there's other shaders used somewhere where this might help.
     
  18. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    DX9 had "half" type in HLSL for FP16 ALU (it was useful for 7000 series Nvidia cards). For some reason DX10 dropped the support. As you said FP16 is again supported in DX11.1 but it requires Windows 8.
     
  19. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Has anyone seen any evidence that desktop cards are faster in games with FP16 shader/compute code?
     
  20. Novum

    Regular

    Joined:
    Jun 28, 2006
    Messages:
    335
    Likes Received:
    8
    Location:
    Germany
    Well for integer it definitely could work (int24 is faster than int32), but I don't think that any desktop GPU at the moment has hardware support for <FP32. Correct me if I'm wrong.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...