Nvidia Pascal Speculation Thread

Discussion in 'Architecture and Products' started by DSC, Mar 25, 2014.

Tags:
Thread Status:
Not open for further replies.
  1. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,237
    Likes Received:
    4,260
    Location:
    Guess...
    It's the new version of OpenGL which is closer to the metal like DX12. Basically the DX12 version of OpenGL.
     
  2. dnavas

    Regular

    Joined:
    Apr 12, 2004
    Messages:
    375
    Likes Received:
    7
    (Whoops, scooped by pjb :) )
    Runs on Windows (including Windows 7 & 8 as well as 10 if you somehow avoid Microsoft's aggressive upgrade strategy), Linux, and Android
     
  3. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,245
    Likes Received:
    4,465
    Location:
    Finland
    Not really. While it's maintained and developed by Khronos, just like OpenGL, it was built on AMD's Mantle rather than being from scratch or based on older OpenGL
     
  4. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,237
    Likes Received:
    4,260
    Location:
    Guess...
    Yeah all I mean by that is that it replaces OpenGL, not that it necessarily is the same API with some add ons.
     
    pharma and Razor1 like this.
  5. Ext3h

    Regular

    Joined:
    Sep 4, 2015
    Messages:
    428
    Likes Received:
    497
    Speculation from me, no backed yet:

    What's the chance that the first Pascal GPUs are going to be shipped as Quadro or Tesla, not GeForce series?

    So far, all marketing material published by NV has revolved around the topic "neural networks" and similar scientific applications, but nothing about where to place Pascal performance wise in games.

    There is also a good chance that 16nm FF+ turned out to have worse yields and therefor higher costs than initially expected, making it quite possible that Pascal will only be rentable in the middle 4-diggit price range in the beginning.

    Unfortunately, this would also mean that Maxwell isn't going to phase out as fast as expected. Not as long the Pascal cards in the same performance range are not reaching the same profits.

    And I'm not just talking about HBM2 being reserved to high end models for now, I have concerns that Pascal won't be cost efficient in general for quite some time.
     
  6. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
    nV has launched professional cards first in the past, so I wouldn't be too surprised if they do but I think its going to be based on the dynamic of the marketplace when they are ready to release they do have pressure from Intel next gen Phi and also the gaming market so if they need to get the gaming cards out, they might do them at the same time.
     
  7. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    It is the same strategy they used for the Fermi reveal and presentation first in 2009 and for the big Kepler in 2012 -- both architectures with emphasis on HPC, while consumer features were trailing behind. And since Maxwell was clearly consumer-first architecture (with some cloud-induced applications), now it's only logical that Pascal would be propped for the next wave of HPC, that the previous generation took a rest from, mostly thanks to the extended life of the 28nm process.
     
  8. xpea

    Regular

    Joined:
    Jun 4, 2013
    Messages:
    552
    Likes Received:
    786
    Location:
    EU-China
    I don't agree. At Pascal launch, I believe that FF16+ will have much better yields than 28nm at beginning. For months, Apple is already making millions of A9 on FF16+ and the process is derived from 20nm that is used for years.
    On other side, HBM2...
     
    Razor1 and Grall like this.
  9. Ext3h

    Regular

    Joined:
    Sep 4, 2015
    Messages:
    428
    Likes Received:
    497
    The A9 isn't directly comparable. From what I understood, FF16+ allows to tune for two different characteristics, either transistor density and switching time or power consumption.

    Apple obviously went for the latter with the A9, since the A9 is still reasonable small in terms of transistor count while battery live matters.

    While Pascal has once again doubled transistor count over Maxwell, meaning they went for full density as I'm not aware that they could push die size much further.
     
  10. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    Apple is dual sourcing manufacturing for the A9. If yields would be that great they could have avoided the trouble and cost to dual source.

    Further than that where the heck does it come from that 16FF+ is a"derivative from 20SOC?. In other news Maxwell isn't just a coffee brand and Vulkan not just a hole in the ground that spits fire...
     
    Ext3h likes this.
  11. Nakai

    Newcomer

    Joined:
    Nov 30, 2006
    Messages:
    46
    Likes Received:
    10
    The chance is pretty high, as 16FF and HBM2 are new technologies for NV. If GP100 features over 17 billion transistors, the density (transistors per mm²) will be higher than expected. Because HBM2 delivers smaller PHYs (and NVs MI PHYs were always big), the general density could be higher than expected. As well I dont think that GP100 will be as big as GM200 (~600mm²), as a bigger die needs a bigger interposer. Although NV has far more financial foothold than AMD, going with Finfet and HBM2 will be huge task (AMD worked for ~10 years). I speculate the die around 500~550mm² and with the 17billion transistors (leaked), we get a density of around 30~34 million transistors/mm².
    I also think that NV will increase the cache sizes and register files throughout the architecture. I wouldn't be surprised if each SM features a huge register file (at least twice as big as GM200) and increased cache sizes (192KB+/48KB+), as well a big L2 Cache (4MB+) located at the memory interfaces. Greater caches could have a positive impact on density.

    Another problem is the DP throughput. If NV goes for Mixed Precision (Half : Single : Double => 4:2:1), i would speculate ~6000 SPs at max. IMO i would go for ~5000SPs.
     
  12. Ext3h

    Regular

    Joined:
    Sep 4, 2015
    Messages:
    428
    Likes Received:
    497
    2:1? No way.

    They will most likely go with mixed precision ALUs instead of dedicated DP and SP units this time. And for that, the rate is about 4:1 for doubled data width for multiplication. So 16:4:1, best case. I think half precision is additionally going to be implemented as VLIW4 or at least VLIW2 (to keep the architecture 32bit wide) at SP latency, double precision as multi-cycle.
     
  13. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,176
    Location:
    La-la land
    Apple's volumes are ginormous. If NV were to sell like 80 million GPUs every quarter they would be shitting their pants of sheer surprise and excitement. Alas, that's not the case though.
     
  14. iMacmatician

    Regular

    Joined:
    Jul 24, 2010
    Messages:
    797
    Likes Received:
    223
    So I haven't seen this info on this thread so far:

    Maybe GP100 and GP102 are both large GPUs ( ≥ 450 mm^2 or so) but one of them has fast DP and the other one has slow DP?
     
    pharma and Razor1 like this.
  15. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
    maybe they are prepping the code names for a titan and then a gaming version like they did with the 7x0 series.

    Weird GV100 is in there, kind of a bit early for that I think.
     
  16. Nakai

    Newcomer

    Joined:
    Nov 30, 2006
    Messages:
    46
    Likes Received:
    10
    Well, Nvidia already had FP32:FP64 2:1 with Fermi. AMD had it with Hawaii. NV had 3:1 with GK100/110. NV has FP16:FP32 2:1 with GM20B (Tegra X GPU). The most interesting fact is, that Fermi didn't have dedicated FP64 units.

    For instance, if GP110 has 6144 SPs and only FP32:FP64-ratio of 4:1, it achieves only 1536 FP64-FMAs per cylce. AMDs Hawaii is capable to do 1408 FP64-FMAs per cycle. They will go for FP32:FP64-ratio of 2:1, for sure...
     
  17. Ext3h

    Regular

    Joined:
    Sep 4, 2015
    Messages:
    428
    Likes Received:
    497
    Out of these, Kepler was actually the one closest to optimum. It's just that a 64bit FMAD/FMUL costs 4x as much hardware resources as the corresponding 32bit operation, you can't cheat around that. There is some additional, mostly data width independent overhead for IEEE 754 specific edge case handling.

    If anything, SP to DP ratio is going to get worse than 4:1 in Pascal, not better. They might even be tempted to still handle DP with dedicated units, that would be the only way they could achieve a better ratio.

    Going 2:1 on a mixed mode FPU is effectively wasting resources while in SP mode - about 50% actually. Going 8:1 or worse is indicating "software emulation" (not actually, just running FP in multiple rounds through the integer ALU).

    And you didn't just called for the 295X2 as a reference for DP performance?

    That's a dual GPU card, and if you really want to go that way:
    The champion in terms of DP performance is still AMDs old Tahiti X2/New Zealand/Malta series, setting the bar at 2 SP or 1/2 DP per CU and cycle, with the 2 1/2 years old 7990 only gotten beaten recently by Intel's Knights Landing - and not even by much, just ~50%.
     
  18. EDIT: sorry, wrong thread.

    We really need a delete button.. :(
     
    #358 Deleted member 13524, Nov 10, 2015
    Last edited by a moderator: Nov 10, 2015
  19. kalelovil

    Regular

    Joined:
    Sep 8, 2011
    Messages:
    568
    Likes Received:
    104
    Although it may not see the light in PCs, perhaps being reserved for their Supercomputer contracts.
    I am reminded somewhat of ATI's R400/R500(Xenos) development lineage. Although it made its way into the Xbox360, more conservative designs (R420, R520) developed in parallel were used in the PC space until its R600 descendant.
     
  20. Psycho

    Regular

    Joined:
    Jun 7, 2008
    Messages:
    746
    Likes Received:
    41
    Location:
    Copenhagen
    Jawed likes this.
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...