AMD: Navi Speculation, Rumours and Discussion

Discussion in 'Architecture and Products' started by Jawed, Mar 23, 2016.

  1. 3dilettante

    Legend Alpha

    Sep 15, 2003
    Likes Received:
    Well within 3d
    This at least seems to indicate there may be instances where UAV reasources are able to use delta color compression.
    Per, GCN disables compression for UAVs.

    There is is a subset of scenarios where ordering can be relaxed, with GPUopen listing scenarios where some kind of coverage and/or saturating target (non-blending G-buffer setup and depth-only rendering, respectively).
    Correct, consistent, or tractable for human understanding behaviors make API ordering important elsewhere.

    OIT is something AMD specifically cites as using ordering guarantees, which seems to make sense in scenarios where the GPU may discard different primitives from buffers on a per-tile basis.
    I would need clarification on why losing ordering guarantees is beneficial for TBDR, which already has a significant synchronization point built into waiting for all primitives to be submitted before transitioning to the screen-space portion, and how losing ordering guarantees allows tiles to give consistent results for geometry that straddles their boundaries.

    The patent's scenario places a premium on having strong ordering. The distributed processing method used by the work distributors relies on them calculating the same sequencing and target hardware, with the same ordering counts generated and assigned. In the scenarios where out of order rasterization makes sense in existing GPUs, it may devolve into a set of additional barriers between the fully ordered and safely unordered modes (entering and leaving), where the arbiters' counters are partially ignored or possibly frozen at a fixed value.

    The ordering starts to matter early in the pipeline. How index buffers are chunked, which FIFOs are broadcast to and read from, and which units are locally selected or presumed by the distributor to be handled by a different GPU, are based on the sequentially equivalent behavior of the distributor blocks and their arbiters. The chunking of the primitive stream and handling of primitives that span screen space boundaries can be affected by what each GPU calculates is its particular chunk or FIFO tag. If a hull shader's output is broadcast by GPU A to a FIFO and tagged with ordering number 1000, it doesn't help if GPU B was expecting it at 1001.
    Deciding which primitives can be discarded in an overlapping scenario can cause inconsistencies if different tiles do not agree on the order.
  2. Anarchist4000

    Veteran Regular

    May 8, 2004
    Likes Received:
    Agreed, but in the case of mGPU the performance deltas would be far more substantial. Emphasis on quickly frustum culling triangles. More than likely entire draw calls could be culled from some sections of screen space so there would be a need to push ahead. That early culling pass would be clearing a lot of geometry.

    That is still an AMD extension, but should work for everyone easily enough.

    OIT is ordered, but as nothing is discarded the blending will be deferred until all samples are present. Exception for PixelSync or compression mechanisms discarding least relevant samples which are presumed lossy anyways. In application, any error from ordering should be falling into the inconsequential category of samples that gets compressed or discarded. With programmable blending it would be up to the developer to decide how to manage it. Frames likely wouldn't be reproducible, but the difference already determined to be inconsequential. Or all samples held and accuracy ensured at a significant performance cost.

    Not beneficial so much as irrelevant as overdraw should be very limited. TBDR has a sync point, but the execution can be overlapped with other frames and/or compute tasks. Even if stalling at a sync point utilization should remain high with async compute or rendering tasks.

    The patent was also assuming an ordering requirement as the status quo. Relaxing the restriction should eliminate the need for the patent in the first place.

    Order shouldn't matter given a developer flagging a relaxed state. In which case a FIFO wouldn't need to exist and the front-end operating with parallel pipelines. Given a relaxed state an arbitrary number of SEs could exist for the purpose of increasing geometry throughput. Not all that applicable for current hardware as the up to 4 SE design is rather efficient, but MCM, mGPU, or >4 SEs you should see close to linear scaling without the dependencies.
  3. DavidGraham


    Dec 22, 2009
    Likes Received:
    Ashraf Eassaon Twitter:
    I just clarified with @AMD about the annual cadence of GPUs: they’re committing to annual products, not necessarily new architectures every year (e.g. RX 480 and RX 580 are different products, but same Polaris architecture)

  4. Malo

    Legend Veteran Subscriber

    Feb 9, 2002
    Likes Received:
    Rebrands of existing GPUs for OEM markets are new products as well.
  5. Nebuchadnezzar


    Feb 10, 2002
    Likes Received:
    If they would actually do a refresh à la Ryzen2/Zen+ it would be fine, but just using the same chip re-branded as a continuing strategy is a bit sad.
    jacozz and Ike Turner like this.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.