GP102/GP100 FP16 support vs. Performance in old DirectX 8 and DirectX 9

Discussion in 'Architecture and Products' started by agent_x007, Jul 23, 2016.

Tags:
  1. agent_x007

    Newcomer

    Joined:
    Dec 2, 2014
    Messages:
    19
    Likes Received:
    0
    Hello

    Basicly : Reading great (and late) article on Anandtech about Pascal (LINK), I saw that new GP102 core is/should be build from FP32/2xFP16 capable Cuda Cores (GP104 is mostly FP32 only).
    Then I remembered that old DirectX's (up to DX9.0), work or can work, on half precision numbers (FP16).

    Does this mean that new Titan X (or Titan XP*) is better suited for old DirectX 8.X and DirectX 9 games, than new GPU's since DX10 (2006) up to this point (ie. GP104) ?
    I know this is pure theory (since GP102 didn't lauch yet), but I found it interesting if it can be better utilised in them because of native FP16 support.

    What do you guys think ?
    I know this is really irrelevant by this point (since you can probably get 999+ FPS in those programs/games by now), but still... I'm curious if my thinking is correct.
    I don't program shaders, so I wanted someone to clarify this for me.

    Thank you for your time.

    PS. That old NV30 path (mixed FP16/FP32), should work great on Titan XP* :)
    *"Titan XP" was first used on LinusTechTips latest "WAN Show" video stream.
     
  2. homerdog

    homerdog donator of the year
    Legend Veteran Subscriber

    Joined:
    Jul 25, 2008
    Messages:
    6,124
    Likes Received:
    902
    Location:
    still camping with a mauler
    Are you so sure GP102 will have the double rate FP16? I thought it was supposed to be a larger GP104 rather than a smaller GP100.
     
  3. agent_x007

    Newcomer

    Joined:
    Dec 2, 2014
    Messages:
    19
    Likes Received:
    0
    GP100/GP102 = Big Pascal (Tegra P100/"Titan XP"),
    GP104 = High End Pascal (GTX 1080/1070),
    GP106 = Mid End Pascal (GTX 1060).

    I think FP16 may be enabled as option in GeForce GP10x based cards as well, but it's too early to tell.
     
    #3 agent_x007, Jul 23, 2016
    Last edited by a moderator: Jul 23, 2016
  4. agent_x007

    Newcomer

    Joined:
    Dec 2, 2014
    Messages:
    19
    Likes Received:
    0
    There is also this quote :
    Source : Anandtech article about Pascal linked before.

    PS. Is there a way to edit my earlier post or do I need to write 10 posts in total to see that option ?
    Would love to add "a" in "Pascl" :)
     
  5. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
    Well GP 102 is not GP100 ... All we know about GP102, is, it have a different memory controller obviously of the GP100 ( GDRR5x vs HBM2 on GP100) and it have full INT8 support, no FP64.. As Nvidia have not put in their marketing stuff anything about FP16 rate, only Int8, i can imagine there's no FP16 support too similar of the GP104 in this way.

    theres a 25% decrease in transistor count for GP102 vs GP100..

    http://www.anandtech.com/show/10510...-titan-x-video-card-1200-available-august-2nd

    Nvidia is not completely crazy,i dont think they want to shoot out their own P100 Tesla solution,
    Deep learning research are anyway not mean to be run by common folk at home, so maybe we could see full FP16 support ( even if for know nothing is indicate it and i relly dont think it is the case.).. , you have students on university who can engage to learn about it,
    and then its a good opportunity for Nvidia to push their ecosystems of software and tools through a gpu as Titan.

    Honestly, at this point i think GP102 is way closer of a GP104 with same SM count of the GP100.

    Theres 2 possibilities i see.

    1) Nvidia wanted till the start completely separate the " gaming " skus from the P100 as stated by Anandtech for finally separate the professional skus from the consumers one. And have from the start prepare 2 variant with GDDr5x and HBM2 .

    2) HBM2 was available later than expected, and Nividia have start early to move their skus outside P100 to a standard GDDR5 (X) memory style for release their gpu's early. ( at contrario of AMD who seems have opt for wait the HBM2. )
     
    #5 lanek, Jul 23, 2016
    Last edited: Jul 23, 2016
  6. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    6,955
    Likes Received:
    3,037
    Location:
    Pennsylvania
    I'll finally be able to run Half Life at 8k and 1 million fps?
     
  7. agent_x007

    Newcomer

    Joined:
    Dec 2, 2014
    Messages:
    19
    Likes Received:
    0
    True... not enough information. INT8 is interesing, didn't catch that first time I read that update :)
    But since it's a Titan (and FP16 sits inside of CudaCores), let's assume for now that it does have full FP16 support.

    This still is a purely theoretical question (assuming it has full FP16 support (and double FLOPs from FP32) :
    Can new Titan X be better suited to run old DirectX's INT8/FP16 mode (ie. have better hardware utilisation), than hardware from late 2006 (DX10) up to this poin ?
     
  8. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    I do not think it is guaranteed to have the GP100 mixed precision Cuda core, however the GP100 misses out on some useful operations and that would be a good reason to create another mixed-precision GPU below the flagship, so logically it would make sense for the GP102 to be mixed-precision FP32/FP16 and also with the Int8/dp4a operation.
    IMO it would still make sense for such a card to have somewhere between 1.2Tflops to 1.8 Tflops FP64 (disabled/reduced to maybe 1:8 or 1:24 or 1:32 for Pascal Titan), but then only so much can be crammed onto a smaller die.
    This would make the GP102 viable across all segments and a more cost-effective price to the GP100, and this is important as some sources reported from Cray (who sell both Intel and Nvidia solutions) are suggesting Knights Landing is winning more large scale projects than P100.

    But then with the Pascal Titan info so far there is no emphasis of mixed precision and just the int8, which aligns more with the GP104.
    This makes it a confusing product because from a Deep Learning perspective (and this is how they present the Pascal Titan) it is limited in the functions-operations it is capable of without the mixed-precision FP32/FP16 Cuda core, and from a cost/complexity perspective not many would want to have multiple different dedicated GPUs where some workloads can be shared on a device.

    So IMO it is too early to say exactly what Cuda cores are in the GP102 until they release more information.
    Cheers
     
  9. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
    With a bit of luck, we should have more informations soon during Siggraph, or in some weeks if some sites got them. But for be honest, i dont wait much on features side for it ( its just good marketing ).

    High margin, low availability ( only from Nvidia shop ). This seems to me a good way for just get the gpu's just "there"...
     
    #9 lanek, Jul 23, 2016
    Last edited: Jul 23, 2016
  10. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,171
    Location:
    La-la land
    Not a games programmer, but I could imagine that the shaders used in the NV30/DX9 time era are so simple that today's GPUs would bottleneck elsewhere long before FP16 had a chance to make an impact performance-wise. All you'd see is your graphics shading getting coarser due to not using 32bpp precision for its calculations... :)
     
    pharma, Lightman, Razor1 and 3 others like this.
  11. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,720
    Likes Received:
    2,460
    Why would that matter? a modern GPU can run the original F.E.A.R. (a very taxing DX9 game for it's time) at more than 400fps @1080p! I bet a Pascal Titan X can pump the same performance or more @4K!
     
  12. agent_x007

    Newcomer

    Joined:
    Dec 2, 2014
    Messages:
    19
    Likes Received:
    0
    @Grall is probably right.
    ROP/TMU/VRAM bandwidth (or CPU/RAM), would limit max. performance faster (but to the point of FP16 not relevant anymore... don't know).

    I think FP16 could have helped in GPU's utilisation (and power usage), since it would enable to extract more performance from smaller ammount of resources.

    Still - I don't know if FP16 in Pascal has anything to do with FP16 in old Shader Models - ie. is this even possible ?

    @up I does not matter - true, but wouldn't that "be cool" if true :)
     
  13. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
    I think the CPU will be the bottleneck :)

    Unless ya game on like 3 4k monitors !
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...