Apple (PowerVR) TBDR GPU-architecture speculation thread

Discussion in 'Architecture and Products' started by Kaotik, Jul 7, 2020.

Tags:
  1. Lurkmass

    Newcomer

    Joined:
    Mar 3, 2020
    Messages:
    173
    Likes Received:
    161
    It's still made optional on D3D12 while you're forced to use it on other modern APIs. Console APIs also don't expose renderpasses as well since it's not necessary ...

    "Metal only exposes stuff that makes sense for the hardware."

    Maybe this is true on Apple hardware but I don't think it's true for AMD hardware ...

    Metal has some sub-optimal API design decisions for other hardware. Resource state transitions and barriers are resolved implicitly which can complicate using async compute so that limits potential of being able to schedule compute work as efficiently as possible on some GPUs. Also, if you look at the Sea Islands register documentation the register GpuF0MMReg:0x28b54 specifically has a bit for controlling whether the geometry shader is active or not and the now deprecated Mantle API subsequently exposed geometry shaders too but it's also available on D3D12 and Vulkan as well. On the RDNA architecture, it's geometry shader implementation was designed to be more effective in several more cases compared to it's previous generation as well ...

    For modern desktop GPUs, it's not out of the realm of possibility that they have an acceptable implementation of geometry shaders so why does Apple keep actively avoiding exposing this feature on Metal when it can be somewhat usable on IMRs ?

    I heard that Apple plans on exposing programmable blending everywhere but that is going to decimate performance on most IMRs so in what way does Metal only expose features that "makes sense" for the hardware ? From the perspective of AMD HW, out of all the modern APIs Metal exposes the most amount of abstractions that doesn't make sense to their HW ...

    Argument buffers looks like they provide similar functionality as descriptor indexing would on other APIs in that they all let you create unbounded/bindless arrays/tables. I'm not quite sure if Metal's argument buffers are comparable to some of Nvidia's OpenGL bindless extensions. How much longer do we also have to wait for Apple silicon to support hardware accelerated ray tracing ? I don't see most of the industry will tolerate waiting for another generation ...

    Programmable blending and tile memory are nice to have but they aren't all that compelling for most desktop vendors to implement or expose ...
     
  2. rikrak

    Joined:
    Sep 16, 2020
    Messages:
    4
    Likes Received:
    3
    Did geometry shaders ever make sense for any hardware? Does modern Nvidia and AMD hardware have built-in support for them or are they run as a driver-managed compute shaders with bad parallelism? I can't claim in-depth knowledge of the industry, but to me it seems that geometry shaders are all but deprecated. Nvidia is pushing mesh shaders instead and I don't really know what AMD's current stance on these things are. Apple made a pragmatic choice. Instead of giving the programmer a tool that works sub optimally in most cases, they don't give you this tool at all. Their argument got slick this: for GPU-side geometry generation use a tool that works well — compute shaders with GPU-driven render loops. You can do anything you want while using a programming model that is much closer to how GPUs actually work.


    Metal offers two modes of operation. By default it does state management (barriers, memory, residence) automatically — convenient but not the most efficient. You are however free to do low-level management yourself.

    Where did you hear that? As to your second sentence, I think you might be overexaggerating it a little bit. Probably the only abstraction Metal exposes that's not necessary for AMD hardware are render passes.

    I am not sure that I am up to date enough to answer that confidently. At any rate, data buffers that contain resource handles are exposed in Metal shaders as pointers to user defined structs. Components of these structs can be freely manipulated by the shaders, which allows you to build sets of resource bindings on the GPU. As far as I understand, it is a strict superset of functionality exposed by Vulkan or DX12.

    How much longer do we have to wait for AMD to support hardware accelerated ray tracing? For Intel?

    No, but we are not talking about most desktop vendors. We are talking about Apple's TBDR architecture which is coming to the desktop this year. I severely doubt that Apple will ever make GPUs that can compete with the large desktop IMR brute-force renderers, but they should be able to leverage the efficiency of their approach to deliver GPUs that are very fast in compact laptop space.
     
    Naed likes this.
  3. Lurkmass

    Newcomer

    Joined:
    Mar 3, 2020
    Messages:
    173
    Likes Received:
    161
    Geometry shaders are at least somewhat acceptable on modern desktop GPUs and their drivers almost certainly do NOT emulate geometry shaders with compute shaders otherwise it'd be near impossible for them to handle cases involving interactions with other features like streamout/transform feedbacks. Geometry shaders have ordered geometry amplification if transform feedbacks are enabled which can't exactly be emulated on compute shaders efficiently ...

    Nvidia have a mesh shading pipeline but they still offer the traditional geometry pipeline in their hardware. On AMD's latest RDNA architecture their "next generation geometry pipeline" is actually a superset of the traditional geometry pipeline and they also have a unique hardware functionality to efficiently emulate transform feedbacks too. If you take a look at console APIs for an instance such as GNM or NVN they also have the geometry shaders that you seem to dread so much ...

    Let's just stop beating around the bush and admit that Apple doesn't want to expose geometry shaders because it would wreck their TBDR GPU designs in comparison IMRs which can have a passable implementation over there. The disappointment is that Apple still don't have a competitive alternative ...

    Metal doesn't have any concept resource state transitions so it's handled by the drivers which will be sub-optimal for AMD HW and Metal does not offer any control over this ...

    Here's what the manager behind their driver team had to say. They plan on exposing all of those features he just mentioned on all of the GPUs that they support. Exposing programmable blending would be a spectacular disaster on AMD hardware. Metal is only ever "low-level" in the sense that it mostly only applies to Apple hardware ...

    On AMD, it's guaranteed that they'll be launching ray tracing hardware in less than 2 months on consoles ...

    Can you even argue that's there's a solid timeline for an implementation to show up in Apple silicon ?
     
  4. rikrak

    Joined:
    Sep 16, 2020
    Messages:
    4
    Likes Received:
    3
    Fair enough. I am curious to know how they can achieve this functionality on a massively parallel processor. Seems to me there would be a lot of synchronization overhead. I don't think it contradicts my main point however: a lot of tasks that geometry shaders are used for can be more efficiently implemented via compute shaders, by harnessing the parallelism offered by the GPUs explicitly. We need less shader stages, not more :)

    I think there might have been some miscommunication and/or misunderstanding. I am fairly sure that the first tweet refers to Macs with Apple GPUs (i.e. Apple Silicon) only. The second tweet is about the other new features in Metal (raytracing, pull interpolation model, debugging tools etc.). Note that there was a long stream of posts Mr. Avkarogullari made before answering the question.

    No, I certainly cannot. But Metal includes fully featured ray tracing API, which at least to me suggests that hardware ray tracing is something they are working on. Let's all not forget about their renewed deal with Imagination who alley has raytracing IP for TBDR...
     
  5. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    1,710
    Likes Received:
    1,072
    Location:
    France
    Well Imagination/PowerVR has a lot of ip and stuff on paper, but implementing it in a real product is often an other story : /

    (Damn I wished they could continue on PC after Series 3 / Kyros)
     
  6. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,241
    Likes Received:
    615
    Given recent events we can safely say that even if there was a short term timeline and even if they had outside developers cooperating we wouldn't necessarily know. Apple is good at information security.

    The moment where raytracing takes over primary intersections, forward rendering is dead and all the silicon dedicated to optimizing it so much dead weight for modern applications. A G-buffer tile will still be useful though.
     
  7. Lurkmass

    Newcomer

    Joined:
    Mar 3, 2020
    Messages:
    173
    Likes Received:
    161
    On Nvidia's 2nd gen Maxwell architecture and above, there's fast path for a specific set of geometry shaders and they mentioned a 30% speed up in their voxelization pass for VXAO by using the supposed "pass-through geometry shaders" functionality. The side effect of this extension is that it restricts the capability of the geometry shaders which means that no geometry amplification or transform feedbacks are allowed to be used in conjunction with it but the bonus is that it bypasses the synchronization overhead that you mentioned ...

    AMD's RDNA architecture takes things up a step further by removing all of the restrictions imposed in Nvidia's extension. Their hardware can handle nasty edge cases too like ordered geometry amplification with transform feedbacks relatively elegantly since they can use global ordered append (via DS_Ordered_Count instruction) to do very fast synchronization ...

    On Intel, they never seemingly struggled with geometry shaders since they have a unique SIMD-group mode in MSL terminology which might contribute to GS performance ...

    On TBDR GPUs, geometry shaders are an antithetical concept over there. The problem with geometry shaders over there is that it happens AFTER the tiling stage. Geometry shaders can do arbitrary transformations or geometry amplification to the screen space primitives right before the rasterization stage and that can break tiling optimizations which are decided beforehand so there's a potential mismatch since tiling may not necessarily match up to the screen space primitives that get submitted to the rasterizer. Geometry shaders will inevitably cause load imbalance on tilers ...

    As far as shader stages are concerned, we've just added an entirely new ray tracing pipeline with 5 separate shader stages such as ray gen, intersection, miss, closest-hit and any-hit shaders so I don't think we'll be getting rid of any shader stages soon. Neither AMD nor Nvidia are thinking about removing support for geometry shaders too anytime soon ...

    Let's hope that Apple is also working on exposing conservative rasterization in the Metal API too since it's universally supported among IMRs ...
     
    Pete and milk like this.
  8. JohnH

    Regular

    Joined:
    Mar 18, 2002
    Messages:
    589
    Likes Received:
    8
    Location:
    UK
    Sorry, but that's simply not correct, in all modern TBR/TBDR architectures ALL geometry processing happens before tiling, as such the GS does not break the tiling optimisations.
     
    Pete, milk, 3dcgi and 1 other person like this.
  9. Putas

    Regular Newcomer

    Joined:
    Nov 7, 2004
    Messages:
    448
    Likes Received:
    113
  10. liem107

    Newcomer

    Joined:
    Feb 12, 2010
    Messages:
    61
    Likes Received:
    15
  11. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    1,710
    Likes Received:
    1,072
    Location:
    France
    The good thing is they already have customers for B series. And the way they do multi gpu seems really interesting.
     
    Scott_Arm likes this.
  12. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,406
    Likes Received:
    6,030
    I'd love to see a PC attempt if they can get it to scale that high (with the future 'C' architecture).
     
  13. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    1,710
    Likes Received:
    1,072
    Location:
    France
    I guess the major problem for them would be drivers ?
     
  14. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,406
    Likes Received:
    6,030
    The major problem would probably just be financial risk. Drivers probably aren't super easy either. People expect their gpus to be able to run the last 20 years of games.
     
  15. pTmdfx

    Regular Newcomer

    Joined:
    May 27, 2014
    Messages:
    340
    Likes Received:
    278
    The multi-core part sounds alright given TBDR is already binning all triangles before rasterization. I am curious how workload distribution before binning and how graphics pipeline is managed entirely through memory (and doorbells?) though.
     
  16. Lurkmass

    Newcomer

    Joined:
    Mar 3, 2020
    Messages:
    173
    Likes Received:
    161
    It's somewhat dependent on the tiling strategy chosen by the driver. A driver can either choose larger or smaller tiles with different trade-offs ...

    If a larger tile size is chosen, it means there'll likely be less primitives crossing tile boundaries which translates into processing less duplicated vertex shader invocations. The downside is that larger tiles are more likely to have variable geometry density between the different tiles which can cause load imbalance since some tiles will have larger clumps of geometry densely packed together than the other tiles ...

    If a smaller tile size is chosen, there'll be more primitives crossing tile boundaries which will mean more duplicated vertex shader invocations being processed. A smaller tile size can give you a more even distribution of geometry density between the different screenspace tiles so it results in a better load balance ...

    Too big and certain tiles will dominate the frame latency or if it's too small then there'll be lot's of redundant geometry processing. Tile based GPUs and their drivers try to pick the ideal middle ground that'll give them the lowest latency ...
     
  17. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,455
    Likes Received:
    186
    Location:
    Chania
    By the way for the other myth considering PowerVR and T&L units: the SEGA Naomi 2 arcade machines had besides the mGPU config a PowerVR T&L chip named ELAN clocked at 100MHz capable of 10Mio Polys/sec with 6 light sources which was fairly strong for year 2000.
     
    #77 Ailuros, Oct 17, 2020 at 6:25 AM
    Last edited: Oct 17, 2020 at 12:35 PM
  18. rikrak

    Joined:
    Sep 16, 2020
    Messages:
    4
    Likes Received:
    3
    TBDR GPUs usually have hardware-determined tile size. For Apple (I assume PowerVR is the same) it's 32x32, 32x16 or 16x16 (depending on how much data you want to store per fragment).

    Why do you mention "duplicated vertex shader invocations"? Vertex shaker is only invoked once — binning happens after the vertex shader stage.
     
  19. Lurkmass

    Newcomer

    Joined:
    Mar 3, 2020
    Messages:
    173
    Likes Received:
    161
    The bolded isn't the exact truth ...

    The vertex pipeline is split into two parts on tiling architectures. Position only shading happens before the tiling stage. The varying shading on the other hand happens after the tiling stage ...
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...