Next Generation Hardware Speculation with a Technical Spin [2018]

Discussion in 'Console Technology' started by Tkumpathenurpahl, Jan 19, 2018.

Tags:
Thread Status:
Not open for further replies.
  1. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    10,889
    Likes Received:
    11,008
    Location:
    The North
    For me, I think it's important to be able to separate the two features - ML isn't necessary for ray tracing, it's effective at approximation for highly complex tasks. In this case, BVH is an actual requirement for ray tracing, the 2080TI can do ray traced games at 1080p60 fps without ML. Getting that to ~4K, however, would require some sort of ML algorithm.
    ML can be used for a variety of tasks, it's entirely possible using MI60 as a template, you could have next gen with something similar to it. Where ML helps improve games and move them forward from a variety of possibilities, and to be flexible enough to not force developers to have to use it.

    The 'multi' function cores are an interesting topic, because earlier we were discussing about the waste of silicon for fixed functions. But here we get significantly more flexibility, though with less performance in those areas, but all the silicon could be put to full use whether you decided to use the features or not. Tensor cores are very specific, they're specific to running tensor flow applications, I don't know how useful they are outside of tensor flow, but tensor cores are limited to 16bits. Some ML algorithms may require up to 64bit. So the answer is, I don't know, I don't know what games would require or how it would be used.
     
    Tkumpathenurpahl, BRiT and pharma like this.
  2. Tkumpathenurpahl

    Tkumpathenurpahl Oil my grapes.
    Veteran Newcomer

    Joined:
    Apr 3, 2016
    Messages:
    1,558
    Likes Received:
    1,401
    True, and we still don't have any solid metrics to determine whether denoising, AA, and upscaling are better done via ML or algorithm. Nvidia's DLSS is a perfect example: we know it provides solid results, but how comparable is it to the same mm2 of compute?

    I agree that it's important to distinguish the two, but given its inclusion in Nvidia's idealised RTX, I'm willing to believe it's likely that ML will play some role in RTRT. It might not, but you and others have posted enough compelling cases for me to believe it.

    And that's why I'm excited. Assuming BVH acceleration can be added to CU's, that's all that's needed to evolve the M160 into a capable RTRT GPU. I know that's a massive oversimplification, and that it would probably perform worse than Nvidia's RTX cards, but it still puts some form of RTRT hardware tantalisingly within the grasp of the next generation consoles.

    Exactly. Even just an M160 GPU would be interesting for a console. Hybrid algorithm and ML approaches to all kinds of things, most obviously the things I mentioned above: AA, upscaling, and denoising. I think you've posted a couple of videos in reference to other uses too.

    And the fact you've highlighted, of not forcing developers to use anything, is key for console success, which is another reason I'm quite excited by this development.

    I don't know either, but the important thing is: no-one knows. Who knows what amount of ML will be useful for every game, every genre, even? No-one can answer that, but put the control over that ratio in the hands of developers, and they can answer it on a per title basis.

    The same can be said for RTRT. Assuming the CU'S can be modified to accelerate BVH's, developers can do the same there too: some ratio of rasterisation, some ratio of ML, and some ratio of RT.

    In the PC space, Nvidia's fixed function approach is fine, and probably yields better results. People can just keep upgrading their rig, or changing and disabling per game settings.

    In the console space, they have to provide a platform that straddles the broadest markets for a minimum of 6 years, and developer flexibility is key to that. "Time to triangle" popped up time and again in PS4 launch interviews with Mark Cerny, and it's a philosophy that's served them well: it's undone the negative perception of PlayStation caused by the PS3, and secured them shed loads of content. At the same time, the flexibility of compute has granted one of their studios, MM, the chance to render in an entirely new way with SDF's.

    We have an indication with the M160 that the same approach is likely to be taken next generation. If BVH acceleration can be bolted on to the CU's, ML+BVH's may be that generation's compute.
     
    Michellstar likes this.
  3. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    10,889
    Likes Received:
    11,008
    Location:
    The North
    I believe the RT cores are bolted directly into the compute units. I could be wrong, but there's no suggestion that these are separated pieces of silicon.

    The only thing you need to consider is that bandwidth is where ML takes a lot more resources. You can look at say, MI60, and see that it's already 7nm... 300W.... 1 TB/s of bandwidth and approximately 14 TF of power on a dedicated GPU.

    That's not a great sign imo. It's not going to get any smaller for the consoles, and the power and bandwidth levels are too high. It's interesting to use as some insight what could be coming from AMD. But this particular design doesn't suit consoles well at all.
     
    vipa899 likes this.
  4. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    2,155
    Likes Received:
    1,046
    Location:
    Earth
    If someone took gpu's and made a graph with time on one axis and raw fp32 flops on other axis I bet the curve would look sad. Even more sad if flops were normalized per watt... Of course things could be viewed different if specific accelerators like ray tracing or tensors are accounted for or if we somehow could take efficiency/programmability of those flops into account.
     
  5. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,152
    Likes Received:
    3,055
    Location:
    Finland
    If we look at "modern GPUs" in the sense of GCN and Fermi and newer from NVIDIA FP32 looks like this (using reported Boost-clocks where applicable, only single GPU top models of each generation)
    Not normalized for perf/w

    AMD:
    upload_2018-11-7_21-57-44.png

    And NVIDIA
    upload_2018-11-7_21-43-16.png

    edit: updated AMD graph with MI60
     

    Attached Files:

    #3405 Kaotik, Nov 7, 2018
    Last edited: Nov 7, 2018
    Entropy, egoless, turkey and 2 others like this.
  6. function

    function None functional
    Legend Veteran

    Joined:
    Mar 27, 2003
    Messages:
    5,368
    Likes Received:
    2,886
    Location:
    Wrong thread
    MI60 may not have the full suite of power saving features that are planned for Navi integrated yet, and it may be pushing higher clocks on an architecture not fully tuned to take full advantage of TSMC's 7nm process (Vega and its predecessor Polaris were deigned for GF 14nm).

    Given architectures designed for and more evolved upon TSMC's tools and processes, I think we'll see better perf/watt and possibly even higher densities.

    I'm excited about the possibility of tiered consoles with the same off-the-shelf CPU chiplet connected to the main GPU/NB/SB/mem controller via infinity fabric. That might just lead to reduced costs for the customised main chip. Hell, with an organic substrate supporting low cost HBM you might just make having sharing the same main board and main memory arrangement possible ...
     
  7. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    2,155
    Likes Received:
    1,046
    Location:
    Earth
    For amd it looks like maybe every 3.5 years flops double. Assuming this holds true and applies also in console space that would give good estimate what to expect for next gen console assuming similar box size and cooling solution.
     
  8. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    10,889
    Likes Received:
    11,008
    Location:
    The North
    We'll likely be on the same node of 7nm for next gen. Clocks will be slower. The chip will have redundancy built in. What can we realistically expect here?
     
  9. msia2k75

    Regular Newcomer

    Joined:
    Jul 26, 2005
    Messages:
    326
    Likes Received:
    29
    It would mean a 9-10TF...
     
  10. function

    function None functional
    Legend Veteran

    Joined:
    Mar 27, 2003
    Messages:
    5,368
    Likes Received:
    2,886
    Location:
    Wrong thread
    Well I certainly can't say with any certainty, but I did suggest about power saving / clock boosting architectural features, and building for a given process line ....
     
  11. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    10,889
    Likes Received:
    11,008
    Location:
    The North
    So greater than 6 but less than 14 ;)
     
  12. Xbat

    Veteran Newcomer

    Joined:
    Jan 31, 2013
    Messages:
    1,458
    Likes Received:
    1,047
    Location:
    A farm in the middle of nowhere
    It depends on how big a step up Navi is going to be? Is it going to be more or less gcn architecture or is it going to be something next gen. I also feel 10 TFlops is most likely but if Navi has some "magic" we might get 12 TFlops.
     
    Heinrich4 and vipa899 like this.
  13. function

    function None functional
    Legend Veteran

    Joined:
    Mar 27, 2003
    Messages:
    5,368
    Likes Received:
    2,886
    Location:
    Wrong thread
    Something like that! :p
     
  14. vipa899

    Regular Newcomer

    Joined:
    Mar 31, 2017
    Messages:
    922
    Likes Received:
    354
    Location:
    Sweden
    Nvidia's fixed funtion today yes, by 2021 Nvidia probally comes with something more flexible as was the case with Geforce 3's fixed function vertex/pixel shaders.

    On console, developers will do that for you to gain performance.
     
    Tkumpathenurpahl likes this.
  15. OCASM

    Regular Newcomer

    Joined:
    Nov 12, 2016
    Messages:
    921
    Likes Received:
    874
    Crippling the speed of the hardware to accomodate a handful of experimental games with dubious commercial potential doesn't seem like a sound decision to me, specially in console space.
     
    BRiT likes this.
  16. Tkumpathenurpahl

    Tkumpathenurpahl Oil my grapes.
    Veteran Newcomer

    Joined:
    Apr 3, 2016
    Messages:
    1,558
    Likes Received:
    1,401
    Isn't "crippling" a bit excessive? Something like 10-12TF of performance seems a reasonable expectation, based on assumed die sizes for a console and the known die size of the M160. We'll have to see what performance it reaches at lower clockspeeds before we have a clear indication of an approximation of performance at console wattage, but the current performance of 14.8TF for a GPU that was meant as a 7nm pipe cleaner bodes well IMO. We'll have to wait and see, but hopefully there's truth to the reports of some Zen engineers moving over to work on Navi, and hopefully that pays off in the form of improved perf/watt.

    But let's be conservative and assume a 10TF GPU, which can have any ratio of performance dedicated to rasterisation, ML, and RT. Would that flexibility really hamper it so much?

    I can imagine that all 3 might be somewhat compromised, relative to fixed function hardware, but does the penalty to performance still result in a functional RTRT GPU? To my knowledge, that's an unknown.

    Does the penalty to the size of the CU's outweigh just having fixed function hardware? The M160 suggests that AMD might think so.

    And if the answer to both of those is yes, does it make it any cheaper to manufacture? Potentially so, if AMD are able to just deploy different amounts of differently clocked multi function CU's to satisfy all of their markets.

    RTRT is still nascent in gaming, but it can't hurt to have the entire core gaming industry beavering away on solutions to it. It'll just be the way it always is: lower quality on the consoles.

    And if any developers don't want to, that's fine, they can act as if they have a 10TF PS4/XB1 with plenty more memory and a way better CPU.

    Edit: I also think it's worth mentioning that Sony's new CEO mentioned ease of manufacturing as key for the PS5 - I'll try to find the source. Considering "ease of manufacturing," AMD's recent chiplet Zen 2 design, their continued push for Infinity Fabric, and their multi purpose CU's, I'm convinced that, if the CU's can be modified to be capable of RT, that's the approach AMD would want to take regardless of developer convenience.
     
    #3416 Tkumpathenurpahl, Nov 8, 2018
    Last edited: Nov 8, 2018
  17. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    10,889
    Likes Received:
    11,008
    Location:
    The North
    For the sake of this discussion going forward, we really should be calling it HRT for hybrid ray tracing. Real time Ray Tracing is imo an entirely ray traced image and probably not an accurate description of the technology itself.

    That being said, AMD has yet to reveal how it intends to accelerate ray tracing. The ML solution is interesting, it's strength lying with 32 and 64 bit precision. But on the 16f front, it's going to be beat by Tensor Cores. The 4 and 8 int is interesting, but perhaps unnecessary. These are all items that are useful for corporate space but I'm unsure of how useful in the game space.

    That being said if there is no need for INT 4/8 and 64 in the gaming space, perhaps the CUs can shrink.
     
    OCASM and Shifty Geezer like this.
  18. beyondtest

    Newcomer

    Joined:
    Jun 3, 2018
    Messages:
    58
    Likes Received:
    13
    PS5 Leak rumor



    isn't dlss nvidia though?

    6/12 ct
    11.2 tf
    16 gb gddr6
     
    Heinrich4 likes this.
  19. Samwell

    Newcomer

    Joined:
    Dec 23, 2011
    Messages:
    127
    Likes Received:
    154
    Yes, they wouldn't call it dlss, but there's a bigger point in the presentation clearly showing that it's a fake. 3640 shader cores, can't divide this with 64 to get an CU number. Someone put so much effort in this fake, but doesn't have basic architectural knowledge making such a fault.
     
    milk and chris1515 like this.
  20. goonergaz

    Veteran

    Joined:
    Jun 3, 2005
    Messages:
    3,634
    Likes Received:
    1,061
    You know, the only thing that makes me feel this is fake is that number 5...it's the wrong font. Otherwise this is a really good fake.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...