AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Discussion in 'Architecture and Products' started by Kaotik, Jan 2, 2019.

Thread Status:
Not open for further replies.
  1. T2098

    Newcomer

    Joined:
    Jun 15, 2020
    Messages:
    55
    Likes Received:
    115
    AMD's done exactly that a few times before and caused some significant heartache for their board partners - most recently:

    https://www.anandtech.com/show/15422/the-amd-radeon-rx-5600-xt-review/2
    https://www.pcgamer.com/amds-last-minute-5600-xt-bios-update-feels-like-a-bait-and-switch/

    You're right in that they can't change the silicon or board layout but there's an awful lot of latitude to make changes to V/F curves and clocks in the vBIOS.

    And not all cards were stable with the 'revised' vBIOS and higher clocks, which also lead to some unpleasantness for consumers who bought a 5600XT and expected it to perform exactly the way all the review samples did:

    https://www.igorslab.de/en/radeon-r...mits-and-benchmark-morepowertool-tutorial/12/
     
  2. andermans

    Newcomer

    Joined:
    Sep 11, 2020
    Messages:
    28
    Likes Received:
    43
    FWIW for the raytracing hardware I think you're still limited by the L0 cache only being able to deliver 1 cacheline per CU per cycle (unless they double that of course, but that sounds fairly costly) so raytracing and texture sampling compete there. Furthermore due to bad cache access patterns the raytracing might be trashing the cache.

    The other thing I'd not be so sure about is texture sampling and raytracing in the texture units at the same time. How it has been worded to use only minimal die area sounds to me like they might be reusing some of the texture sampler parts to be dual purpose, in which case you can't do them in parallel. Though I think the signal here is very weak.
     
  3. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,887
    Likes Received:
    4,534
    Any date for the RX 6000 reviews?
     
  4. Erinyes

    Regular

    Joined:
    Mar 25, 2010
    Messages:
    808
    Likes Received:
    276
    While true, that was a different case where it was a derivative product from cut down silicon, using the same 5700 PCB if I am not mistaken. Changing a vBIOS and power target is comparatively a lot easier in this scenario as compared to a completely new part. It was a reaction to the 2060 price cut after the 5600XT was announced and AMD really did do it a bit too last moment. Here, NVs cards are all laid out in advance.

    Either ways, the point being, AMD was very clear that it was aiming for the high end of the market and would have designed the product accordingly. Claiming that they changed it only as a reaction to Ampere is grasping at straws, for no reason.
    If Zen 3 is anything to go by, reviews would only be up by the date it is available for sale. With a rumored mid-november release for RX6000, I'd expect reviews around or just before launch day.
     
    Cuthalu and Lightman like this.
  5. pTmdfx

    Regular

    Joined:
    May 27, 2014
    Messages:
    415
    Likes Received:
    379
    Sampling does compete for write back data path with intersection, but if BVH traversal alone can keep the memory hierarchy busy (pointer chasing, being bandwidth bound), does this specific limitation matter?

    Also I would say traversal being cache thrashing is no stranger to GPUs; say use of higher definition textures can thrash L0 caches hard too.
     
  6. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    It matters in case each traversal steps needs to be taken on the CUs, i.e. the pointer chasing happens in shader code while the intersection HW/texture unit simply tells whether a ray intersected one or more nodes of the BVH. If this is the way it works then it requires a constant back and forth between CUs and texture/intersection units.

    (minified) high res textures don't trash the texture caches unless mip maps are absent or mip mapping is disabled.
     
  7. andermans

    Newcomer

    Joined:
    Sep 11, 2020
    Messages:
    28
    Likes Received:
    43
    I'm not talking about the write back path I'm talking about the L0 cache <-> texture unit path (i.e. already part of the memory hierarchy), or even further up if there is a cache miss. The write back to registers should indeed not be much of a problem.

    Also I don't think higher definition textures generally trash the cache badly, or at least not worse than low resolution textures as long as the textures have the appropriate mipmaps. (which pretty much everyone does these days)

    There is plenty other stuff that can thrash L0 caches though, especially in post processing steps that try to access close-by but not directly neighboring pixels.
     
  8. SimBy

    Regular

    Joined:
    Jun 21, 2008
    Messages:
    700
    Likes Received:
    391
    64CUs? That's a big cut. But kind of makes sense I guess. The gap between 40 and 72 was huge.
     
  9. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,240
    Likes Received:
    3,393
    It can't be smaller I think if there are 4 SEs - they can disable only the same amount of WGPs per SE.
     
  10. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    TMU is pipelined, too. So the texture unit will process texels or ray trace queries interleaved if necessary.

    There's a patent document that talks about intra-CU producer-consumer scheduling of wavefronts. I would hope that this is applied to ray-traversal and ray-query-result shaders...

    Though I still believe that you will not see ray tracing and pixel shading concurrently running on the GPU. These will be separate passes, so only an overlap phase will be seen as one spins down and the other spins up.

    So the fact that texturing hardware is "dual-function" is immaterial from the point of view of pixel shading. Pixel shading will consume one or more buffers in VRAM (UAVs) that were produced by ray tracing passes (shadowing, global illumination, reflections, caustics, etc.).

    There's a swarm of patent documents on the subject of cache friendly CU scheduling...

    Yes, another subject of AMD's patent documents.

    In general CU scheduling and cache-friendly operations (such as L0s being able to snoop each other passively) appear to be part of RDNA. How much of that is new for RDNA 2, I can't tell.
     
    Lightman and BRiT like this.
  11. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,109
    Location:
    New York
    Compute shaders read textures too and they can certainly overlap with RT.
     
  12. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    Compute shaders don't have the data per work item that pixel shaders have. So there's no way to map a texel to a triangle (there's no triangle) and there's no way to mip-map filter (because there's no triangle).

    Sure, a compute shader reads from memory, but that doesn't use the texture-processing/ray-intersection pipelines. I would expect ray-hit/miss (etc.) compute shaders to run on the same CU or same WGP as ray-intersection shaders. That's the producer-consumer model I was talking about earlier.
     
  13. andermans

    Newcomer

    Joined:
    Sep 11, 2020
    Messages:
    28
    Likes Received:
    43
    Actually in the Linux driver they recently added some code to deal with disabled SEs. Not sure how complete it is and won't guarantee there is a SKU that really does this but ...

    Doesn't DXR 1.1 allow raytracing in all shader stages though?

    A compute shader can definitely use the texture-processing pipelines. Either by just always selecting lod 0, explicitly passing a LOD, using explicit derivatives, or even just using implicit derivatives (all it needs is shader invocations to be arranged in a quad pattern. No geometry needed).

    Furthermore even without filtering, images always need format conversion which will use the texture-processing part of the texture unit. (tell-tale is that loads with a format always only have a throughput of 4 texels/cycle instead of up to 32 items/cycle for plain buffer loads on RDNA1)

    btw I'm interested in the talk about the intra-CU scheduling. Was that for tessellation or raytracing? Do you have some links to read up on it?
     
    Pete, BRiT, Jawed and 1 other person like this.
  14. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,240
    Likes Received:
    3,393
    Disabling a whole SE is not the same as disabling a different number of WGPs per each enabled SE though.
     
  15. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,109
    Location:
    New York
  16. del42sa

    Newcomer

    Joined:
    Jun 29, 2017
    Messages:
    208
    Likes Received:
    137
  17. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,240
    Likes Received:
    3,393
  18. SimBy

    Regular

    Joined:
    Jun 21, 2008
    Messages:
    700
    Likes Received:
    391
    Lightman likes this.
  19. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    Interesting idea!

    Excellent stuff. I'm clearly out of the loop on recent shader models!

    COOPERATIVE WORKGROUP SCHEDULING AND CONTEXT PREFETCHING

    I went through a year's worth of patent documents yesterday and decided there are too many interesting ones to bother linking/summarising. It seems, while I've been "lazy" these last few years, very little heed has been paid to patent stuff.
     
    BRiT likes this.
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...