AMD: Navi Speculation, Rumours and Discussion [2019]

Discussion in 'Architecture and Products' started by Kaotik, Jan 2, 2019.

  1. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    7,688
    Likes Received:
    5,985
    That’s not how adoption works. No hardware base; no games will be coded for it.
    It has to be released; early adopters have always lead the charge for new features that eventually trickle down.

    There are certainly more games with RT than I have seen games use their other features.
     
  2. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    7,688
    Likes Received:
    5,985
    Though; this is what DXR is for however. I can’t see different implementations necessarily cause optimization issues. It really is just a call to project rays and to return intersected ones in layman’s view. Following that is running the shader against intersected triangles in which your usual shader optimizations would apply.
     
  3. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,414
    Likes Received:
    411
    Location:
    New York
    That was the case for Fury and Vega but with Navi it seems AMD is dialing back the raw flops in favor of efficiency. With the launch of Turing is AMD still considered to have a better compute architecture?
     
    xpea likes this.
  4. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,114
    Likes Received:
    1,813
    Location:
    Finland
    I can't see MS wanting custom solution over what AMD will put out on PC space, since they're going to run same APIs etc anyway, and with Sony going so much with "what devs want" last gen, I'm hoping they continue on the same line and will use the same thing, too.
     
    milk, iroboto and BRiT like this.
  5. JoeJ

    Regular Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    394
    Likes Received:
    475
    I don't know.
    Latest HW i have compared was FuryX vs. GTX1070. Fury was 1.6 times faster so 1 AMD TF was better than one NV TF. For me. But NV is catching up. (R9 280X was a whole 5 times faster than GTX670 although similar fps in games.)
    So i guess, looking at TF actually there is little difference because Turing has some interesting improvemnts on compute?
    But 5700XT has 9 TF and competitor RTX2070 has 7.5 TF. So even Navi has doubled the ROP count, personally i hope it's still as strong in compute as GCN.
     
  6. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    7,688
    Likes Received:
    5,985
    I don't necessarily know if they would know how to design it either.
    Even with forward knowledge of what was coming in DX12, they could not get XBO to FL 12_1 or XBX for that matter.
    They can probably request things to be put in, I'm not necessarily sure that means they have the engineers that would be able to do something like a custom RT solution; they would likely have to request more engineers and pay a greater expense to try to fit it in (if the architecture would even support that level of customization). I mean whole companies (imgtec) put the entire company to do it.

    I very much doubt building a custom solution for RT is very easy (in working well with the rest of the GPU in harmony)
     
    Kaotik, pharma and JoeJ like this.
  7. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,414
    Likes Received:
    411
    Location:
    New York
    RTX 2070 is end of life. The 5700XT's competition is the 9TF 2070 SUPER DUPER edition.
     
  8. snarfbot

    Regular Newcomer

    Joined:
    Apr 23, 2007
    Messages:
    512
    Likes Received:
    179
    Turing caught up to GCN somewhat in terms of async compute at least in graphics workloads. That said Navi has it's own improvements to graphics workloads with single cycle simd32.
     
  9. BRiT

    BRiT (╯°□°)╯
    Moderator Legend Alpha Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    12,162
    Likes Received:
    8,316
    Location:
    Cleveland
  10. Xbat

    Regular Newcomer

    Joined:
    Jan 31, 2013
    Messages:
    785
    Likes Received:
    441
    Location:
    A farm in the middle of nowhere
    Where did I say anything about the way it works? I was talking about it from a consumers point of view. Someone brought up that AMD might be competitive from a ray tracing point of view next year.

    Then DavidGraham made a post about he would rather have a CPU and GPU from 2025 which I then pointed out as being a bad comparison, it's not such a big stretch to wait a year for tech.

    Nowhere did I have a dig at Nvidia or say anything in regards to there tech or them releasing in 2018.
     
    CaptainGinger and DmitryKo like this.
  11. milk

    Veteran Regular

    Joined:
    Jun 6, 2012
    Messages:
    2,858
    Likes Received:
    2,384
    You guys just wait for it. Because I'm grabbing a gpu from 2050 and it will blow your 2025 gpus away!
     
  12. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    637
    Likes Received:
    478
    Location:
    55°38′33″ N, 37°28′37″ E
    AMD just needs to set the right price/performance ratio, all while providing decent (i.e. comparable) raytracing performance for current-gen titles.

    Though they'd probably have a bit better performance owing to updates/optimizations in the raytracing API specs... like (pure speculation here) ability to consume meshlets (introduced with the mesh shader geometry pipeline) for BVH generation, which may also be useful for implementing geometry shaders / geometry LOD...

    Here and now, they don't have DXR, take it or leave it. Please come back mid-2020 (or 2025, if you will).

    Maybe so, but it's still an APU part - so there will be compromises in die area, memory bandwidth and performance/watt in comparison to high-end desktop GPU part.

    I'd think AMD will use some form of heterogeneous integration to put multiple graphics dies, CPU dies, HBM3 dies, and flash memory dies on the same chip package, and scale the number of these dies according to price/performance point… oh right, yes it's only wishful thinking. :cool2:
     
    #1192 DmitryKo, Jul 4, 2019
    Last edited: Jul 4, 2019
    DavidGraham likes this.
  13. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,098
    Likes Received:
    2,815
    Location:
    Well within 3d
    The desktop market is in a bit of doldrums with respect to performance for the price paid these days. The tiers below enthusiast and high-end tend to be most cost-sensitive, while there is more opportunity to extract revenue beyond the performance improvement at the early-adopter and bleeding-edge market.
    Being competitive in a range below the leading edge means not distancing itself significantly from existing inventory or already satisfied demand by virtue of arriving after many buyers have already bought marginally lower-performing chips or have the opportunity to buy them discounted.
    AMD's not unique in this, but I think the timing and cost structure give it a less-forgiving position.
    Nvidia's numbers seem to be dropping as well and may be in part due to this--if that can be teased out from the pricing effect. Perhaps adding something else new like RT was part of a ploy to get the newer generation to differentiate itself more within the constraints of manufacturing and power at hand.


    That would seem to be the choice made by the compiler, but that doesn't point to the hardware needing this.
    A SIMD can host up to 10 wavefronts, which requires an average allocation of 24 or fewer registers per wavefront. The granularity at a minimum is 24 registers as far as what the hardware must be able to do, and AMD's documentation gives the actual granularity in 4 or 8.
    I'm not following what you mean by having 4x64 when discussing the register budget for a wavefront. A single wavefront can address up to 256 registers, and to match each SIMD has that many on its own.



    I ran across a link, perhaps on another board or reddit to something from Nvidia, which might be a better starting point than using AMD's decisions to speculate on Nvidia.
    http://www.freepatentsonline.com/y2016/0070820.html or perhaps http://www.freepatentsonline.com/9582607.html

    There's a stack-based traversal block with a block which does evaluate nodes and decide on traversal direction like AMD, but there's also additional logic that performs the looping that AMD's method passes back to the SIMD hardware.
    There may also be some memory compression of the BVH handled by this path.

    From the above, it seems like the SM's local cache heirarchy would be separate from the L0 in the traversal block.

    Possibly also more power efficiency for Nvidia. The AMD method has to re-expand its work every node back to the width of a wavefront and involves at a minimum several accesses to the full-width register file.

    At least for now, no clear replacements for the order guarantees or optimizations like the Z-buffer and other hardware present themselves. Nvidia is counting on the areas where rasterization is very efficient to remain very efficient, lest they lose the power/compute budget spare room that RT is being inserted into.

    At least from a feature perspective and dynamic allocation, I think Pascal might have had similar checkboxes to early GCN. There are some features that AMD touts for later generations, though how many are broadly applicable or noticeable hasn't been clearly tested. The latency-oriented ones seem to be focused on VR or audio, although I'm not sure recent Nvidia chips have garnered many complaints for the former and I'm not sure many care for the latter.
     
    shiznit, anexanhume, JoeJ and 4 others like this.
  14. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    7,688
    Likes Received:
    5,985
    Right sorry didn’t follow along how that was related. I thought you said he was being hypocritical for buying RTX now before there was any reason to when it’s clearly better to buy later when more software will be finally available.

    The topic you guys are on, can probably be solved using some formulas; Monte Carlo ones come to mind IIRC. It tends to come up in decision making algorithms, like discounting the value of future moves over present moves because so much can change between now and then.

    In your argument you guys sort of stand in different positions. If you don’t own ray tracing cards today then the present value is very high and continues to be more valuable as more games with RT is released. Especially if a game you want to play has RT and is released soonish.

    If you wait until there is vastly better RT later, you are left without it for a significant amount of time so the value degrades. Games tend to have the most value while they are fresh and new.

    There is some discounting of value having to wait a year, but not anywhere close to discounting 2025 hardware. I would agree that waiting for Navi is reasonable especially if the price point is low.
     
  15. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    855
    Likes Received:
    258
    My kernel is 8x8x1. We have 65536 VGPRs per CU, or 16384 per SIMD. This yields 256 VGPRs per lane (16k/64), with no other wavefront to run in addition. Or 128 for 2 wavefronts, etc.
    The 4x notation was a thought-mixup, I thought I can integrate the 4 step cadence/4-way banking into the register-"count", but that doesn't make much sense this way.
    I checked LDS utilization and I don't think that one triggers the compiler to bloat usage to 65. So odd.
     
  16. Per Lindstrom

    Newcomer Subscriber

    Joined:
    Oct 16, 2018
    Messages:
    18
    Likes Received:
    13
    JoeJ likes this.
  17. LordEC911

    Regular

    Joined:
    Nov 25, 2007
    Messages:
    788
    Likes Received:
    74
    Location:
    'Zona
  18. sniffy

    Newcomer

    Joined:
    Nov 14, 2014
    Messages:
    31
    Likes Received:
    20
    performance/gate is not a good way to compare architectures. None of those metrics are because it is foolish to try and explain something complicated with a single number. Nvidia have spent transistors in Turing to enable features that Navi is simply not capable of (or would do very slowly). It also ignores other properties of an architecture such as how well it scales upward (hopefully big Navi will fare better in this way). The only way to properly compare architectures is to compare how they are implemented in the real world, and that is looking like a clear win for Turing. I do agree that they have made up a lot of ground with Navi though. Vega was just horrific so I am pleased with what they've managed to do.
     
    #1198 sniffy, Jul 5, 2019
    Last edited: Jul 5, 2019
  19. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,738
    Likes Received:
    4,406
    Looks like AMD has been producing Navi 10 graphics cards like southeast asians grow rice.

    I don't remember OCUK ever claiming to have this amount of cards on day one.

    https://forums.overclockers.co.uk/posts/32841426/

     
    DmitryKo, Lightman and Per Lindstrom like this.
  20. eloyc

    Veteran Regular

    Joined:
    Jan 23, 2009
    Messages:
    1,894
    Likes Received:
    1,086
    I haven't see it posted before (sorry if this is repeated info):
    https://www.dsogaming.com/news/amd-...-ray-tracing-technology-shares-first-details/
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...