AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Discussion in 'Architecture and Products' started by Kaotik, Jan 2, 2019.

Thread Status:
Not open for further replies.
  1. milk

    milk Like Verified
    Veteran

    Joined:
    Jun 6, 2012
    Messages:
    3,977
    Likes Received:
    4,101
    But can any brain dicern from accurate caustics for that particular ocean wave pattern, sun position, depth, etc, vs. a cheap aproximation like in GTAV for example?
     
  2. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,058
    Likes Received:
    3,116
    Location:
    New York
    I haven’t played GTA V but any reasonable approximation should be passable.
     
  3. eastmen

    Legend Subscriber

    Joined:
    Mar 17, 2008
    Messages:
    13,878
    Likes Received:
    4,724
    i mean you could do 8k with any card really. Just depends on the frame rates your okay with and the quality per pixel.
     
    digitalwanderer and Lightman like this.
  4. Pressure

    Veteran

    Joined:
    Mar 30, 2004
    Messages:
    1,655
    Likes Received:
    593
    *If it has the required DP 1.4 or HDMI 2.1 connector.
     
  5. entity279

    Veteran Subscriber

    Joined:
    May 12, 2008
    Messages:
    1,332
    Likes Received:
    500
    Location:
    Romania
    No, I think the brain discards as much information as possible so that one could take important decisions (e.g. dodging stuff thrown at you) as fast as possible
     
  6. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    Big load of AMD patents, some newer, some slightly older


    Includes:
    • Bandwidth saving architecture for scalable video coding - AMD
    • Real time on-chip texture decompression using shader processors
    • Matrix Multiplier With Submatrix Sequencing
    • Shared loads at compute units of a processor
    • Automatic configuration of knobs to optimize performance of a graphics pipeline
    • Pixel Wait Synchronization
    • Hint-based fine-grained dynamic voltage and frequency scaling in GPUs
    • Pipelined matrix multiplication at a graphics processing unit
    • Optimizing Primitive Shaders
    • Water tight ray triangle intersection without resorting to double precision
    • Graphics texture footprint discovery
    • Use of Workgroups in Pixel Shader
    • Efficient data path for ray triangle intersection
    • Robust Ray-triangle Intersection
    • Variable rate rendering based on motion estimation
    • Apparatus and method for providing workload distribution of threads among multiple compute units
    • Mechanism for supporting discard functionality in a ray tracing context
    • Merged data path for triangle and box intersection test in Ray Tracing
    • Variable Rate Shading
    • Raster Order View
    • Integration of variable rate shading and super-sample shading
    • Centroid selection for variable rate shading
     
    Newguy, Krteq, Lightman and 3 others like this.
  7. pTmdfx

    Regular

    Joined:
    May 27, 2014
    Messages:
    416
    Likes Received:
    379
    Hmm, a ROV patent finally. It looks like a software implementation though, and it does not resemble to me as the hardware(?) solution in Vega and Navi 10.
     
  8. Lurkmass

    Regular

    Joined:
    Mar 3, 2020
    Messages:
    565
    Likes Received:
    711
    I think we should give up on the idea of ROVs altogether and that's AMD's opinion on the matter as well ...

    To get even remotely acceptable level of performance on an immediate mode GPU architecture, it would involve storing/tracking the entire framebuffer/render target state in hardware which would mean implementing a lot of dedicated on-chip memory to store the entire framebuffer/render target. The other option is designing a tile-based GPU which will automatically come with a small amount of tile memory but I don't think the architects will find that to be an acceptable solution either since it would mean executing duplicated vertex shader invocations or potentially starving the amount of work on the GPUs shader execution units. Tile-based GPUs died out a decade ago on the desktop space for very good reasons ...

    Just to give you an idea, two 1080p render targets consisting of the colour+alpha (32 bits/4 bytes) and the depth (32 bits/4 bytes) would total out to 16.588MB worth of memory which is over 4x bigger than Navi 10's L2 cache. That's not even counting the stencil bits, MSAA case, higher resolutions, or needing multiple render targets/more bits per-pixel either. You'd have to spend enormous amounts of die space to make a robust solution for ROVs which could be used for better used elsewhere ...
     
  9. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    That’s absolutely not the case. I can’t get into details but on some IMRs it can be done efficiently (barring pathological cases) with little to no extra HW.
     
    milk, Lightman, pharma and 7 others like this.
  10. Lurkmass

    Regular

    Joined:
    Mar 3, 2020
    Messages:
    565
    Likes Received:
    711
    Would that be Intel HW since it's never used outside of their demos ?
     
  11. pTmdfx

    Regular

    Joined:
    May 27, 2014
    Messages:
    416
    Likes Received:
    379
    IIRC latest GPU architectures now all have tiled rasterizers, if not TBDR. Since ROV guarantees only the serialization at the same screen space pixel in API submission order, shouldn't the cost be capped by the rasterizer screen space tile size and the max prim concurrency of the executor (e.g. max 10 CUs in a shader array)? :???:
     
  12. Lurkmass

    Regular

    Joined:
    Mar 3, 2020
    Messages:
    565
    Likes Received:
    711
    Are you sure you aren't describing a tile-based GPU ? I'm pretty sure mobile GPUs shade screen space tiles and desktop GPUs don't do that at all. All of the best solutions for ROVs/programmable blending involve storing framebuffer state in the hardware. Mobile GPUs have this extremely low latency tile memory where they access/store a small portion of the framebuffer which makes it trivial to implement ROVs on their HW. No modern desktop GPU does tile shading or have tile memory. A comparable solution for non-tiling architectures would be is to have built-in memory storing all of the framebuffer rather than just a small portion of it but this has a huge implementation cost in HW ...

    I'm not even sure if Nvidia is all that happy about ROVs from a performance perspective either. Hence, why we should follow AMD's recommendation on giving up ROVs altogether because there seems to be little chance on making an acceptable implementation on discrete GPUs. Ultimately, the problem behind ROV performance is how well the HW is going to be able to track the framebuffer state. AMD HW tracks little to no state in their hardware so there's a huge performance cost for enabling ROVs regardless. Mobile GPUs can track some of this state for a given tile with reduced memory latency access but as a consequence this model is not compatible with immediate mode rendering. Then there's my proposal at the other extreme end of the spectrum where we give generous amounts of on-chip memory to be able to store multiple entire framebuffers worth of state so this can potentially work with IMRs but this model comes with it's own set of restrictions like adhering to the fixed budget of the finite amount of on-chip memory which will prove to be tricky when dealing with corner cases like MSAA and switching framebuffers will also have a significant performance impact as well. Even in this hypothetical restrictive IMR model it still shares a couple of limitations like we see on tilers ...

    I'm not sure where Intel HW or Nvidia HW falls in all of this but I heard from an Nvidia engineer that if you need more memory than a vec4 packing (128 bits/pixel), performance is expected to cliff while using ROVs ...
     
  13. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,058
    Likes Received:
    3,116
    Location:
    New York
    The tiling in modern desktop GPUs is just to facilitate work distribution for a single draw call. You would still need to allocate memory to hold state for all screen space tiles concurrently across draw calls otherwise you’re fetching lots of off-chip data.
     
  14. Love_In_Rio

    Veteran

    Joined:
    Apr 21, 2004
    Messages:
    1,627
    Likes Received:
    226
    mmm, the rumored Geometry Engine from PS5 to debut in RDNA3?.
     
  15. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    22,146
    Likes Received:
    8,533
    Location:
    ಠ_ಠ
    Well, I wonder how bandwidth limited it'll be if it's only +50% over Navi 10 (384 vs 256-bit). It seems a bit odd. If they've strapped 16Gbps to the bus that would be about +71% bandwidth, and that's even more power consumption with a mostly double sized Navi 10.

    Then again, the 5700 series ranges anywhere from 6.7TF (180W TDP) to 10.1TF (235W TDP)) with 448GB/s. A modest base clock in the 1600s would still give something in the 16TF area as a starting point.

    I'm more skeptical that there would be any sustained power at higher frequencies ( for 20TF, double Navi 10) just to keep things under 400W (perhaps something more reasonable 18TF boost?).
     
  16. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    They did.
    Uh, it has nothing to do with N10.
    Lul.
    What do you mean 400W?
    It's 275W so far.
     
  17. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    22,146
    Likes Received:
    8,533
    Location:
    ಠ_ಠ
    I'm not understanding this. 384-bit bus is 50% larger, and thus adds to the power consumption beyond just doubling the size of Navi 10.

    ??? Discuss. This is getting needlessly silly otherwise.

    I'm talking about TDP. Navi 10 has a range of TDP, which I mentioned (180-235W TDP). How do you propose doubling the size and power of Navi 10 with just another 20%?
     
    Cuthalu likes this.
  18. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    I'm still waiting for you to post even some speculation tweet or something to back your words, so far I've seen you post nothing to back your words despite always wording your posts like they're facts o_O
     
    Lodix, Cuthalu, xpea and 3 others like this.
  19. SimBy

    Regular

    Joined:
    Jun 21, 2008
    Messages:
    700
    Likes Received:
    391
    I mean if AMD put effort and silicon into 'multiple GHz' clock speed architecture (they specifically mention this), why on earth would they run it at modest 1600MHz. That's just silly. Expecting anything less than what PS5 clocks at doesn't make much sense.
     
  20. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    Yeah, kinda.
    There's not much left to discuss, the product's soon(tm), along with N22 after.
    That thing called engineering.
    This one is smart.

    I know you guys have your reasons to be wary, but soon(tm).
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...