AMD Mantle API [updating]

Discussion in 'Rendering Technology and APIs' started by MarkoIt, Sep 26, 2013.

  1. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,714
    Likes Received:
    2,135
    Location:
    London
    Maybe there's some data available on the performance of Star Swarm with various NVidia drivers, so we could see if there's been a change in performance.

    I have no idea why Star Swarm uses so many draw calls. It may be separating work into draw calls for no good reason and NVidia has tuned into that. It could also be that there are certain kinds of parallelism in NVidia's GPU state (e.g. a simple ping-pong state change model, where a state change can be set up in hardware across the chip, while work for an existing state is still under way, then a simple flip cuts over "instantaneously" to the new state) that enables the GPU to move the bottleneck deeper, beyond the CP.

    Perhaps NVidia has a near-stateless architecture, such that most work is distributed with piecemeal "state" solely for its own use? Not just bindless resources, but "bindless state".
     
    BRiT likes this.
  2. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    If you look at the available OpenGL 4.4 MDI (multi-draw-indirect) benchmarks results, you will notice that Nvidia clearly beats AMD. I believe the reason here is that Nvidia has a longer history of rendering techniques that allow the GPU to feed itself (they have had custom MDI extensions before MDI became a ARB feature). Nvidia has likely noticed that the draw call submission rate is a problem when the GPU generates a huge amount of draw calls very quickly with MDI. They have been aware of this bottleneck and have had several generations to improve their front end to reduce this bottleneck.

    Mantle and DX 12 are the first APIs that actually make it possible to overload the command processor by the CPU. Now that the command processor is an actual bottleneck (in some cases), I believe that we see quite rapid improvements. Same happened when tessellation become a benchmark feature.
    Yes, partial waves/warps are a real problem (waves more than warps, because waves have 64 vertices, and warps have 32). Once you solve the front end bottlenecks and start rendering 500k+ unique meshes per frame (at 60 fps or more), you immediately hit the fixed function primitive/vertex rate bottleneck. Huge majority of the rendered meshes must be very simple (less than 64 vertices) to avoid that bottleneck. But then the partial vertex shader waves/warps become a real problem (*). And it doesn't stop there. 500k visible meshes at 1080p = 4 pixels/mesh on average = lots of bottlenecks (macro tiles, quads, etc). Obviously in real games most of these meshes would end up being rendered to off screen surfaces (such as shadow maps), but I am sure we will see unrealistic benchmarks that try to render them all to the same back buffer (and hit gazillions of different GPU bottlenecks).

    (*) You can solve the partial vertex shader warp/wave problem by rendering multiple meshes with a single draw call. It seems that Oxide has a CPU-based solution to this problem, but you can also do this entirely on the GPU.
     
    mosen and BRiT like this.
  3. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,236
    Likes Received:
    4,259
    Location:
    Guess...
    Shouldn't this already be a well known problem thanks to console development that doesn't have to deal with the thick API?
     
  4. liquidboy

    Regular

    Joined:
    Jan 16, 2013
    Messages:
    416
    Likes Received:
    77
    mosen and BRiT like this.
  5. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    Command processor was not at all a bottleneck before GCN. Thanks to Mantle (and OpenGL 4.4 MDI) it can now be a bottleneck for GCN in some cases. It is a "Future HW Consideration" to improve the command processor with small batches.
    The old vec4+1 hardware in Xbox 360 is quite different from modern GPUs. PS3 GPU didn't even have unified shaders. On modern GPUs you can reduce a lot of state changes because of bindless resources, general purpose caches (allowing you to index efficiently to big buffers instead of CPU preparing constant buffers of limited size), and many other improvements. Because binding changes are no longer needed at full frequency, the draw calls become cheaper -> you can push more draws -> command processor can become a bottleneck. Of course rest of the GPU has also been getting wider at a rapid pace, so it is actually a valid use case to render considerably bigger amounts of objects.
     
    lanek, Malo, pjbliverpool and 3 others like this.
  6. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
    #2006 lanek, Feb 8, 2015
    Last edited: Feb 8, 2015
  7. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,714
    Likes Received:
    2,135
    Location:
    London
    Doing more than one of those approximation computations per work item could be faster if a constant buffer containing the constants was used (instead of #define). That would eliminate most of those MOVs. The compiler repeats them for each invocation...
     
  8. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    Oxide's philosophy is one that emphasizes developer and artistic freedom, where existing methods of batching, combining texture resources, and minimizing state change can impinge on the ability to arbitrarily alter or add properties to arbitrary objects at arbitrary times, arbitrarily.
    My suspicion on how Nvidia was able to claw back so much performance in DX11 is that most of the time this level of freedom is not utilized, leading to a large number of simple, identical calls that Nvidia could combine or pre-build.

    I'm not enthused with the possibility that AMD improves its front end like it has handled tessellation. There are still some baffling performance behaviors years into that.
     
  9. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    Nvidia seems to have fast path for draw calls that do not change state. This is helpful if you are trying to push huge amount of identical draw calls on DX11. However, this would practically be identical to a single multidraw call, but cost a lot more to the CPU (even with perfect driver optimizations in place). This is an excellent solution for prototyping (when iteration time is more important than peformance), but I wouldn't be comfortable releasing a game like that. They need to implement some kind of software batching approach (CPU or GPU) if they are going to release games using that engine for DX11 customers. Currently their tech seems practically useless for DX11 (if the final game is going to have similar content).
     
  10. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,714
    Likes Received:
    2,135
    Location:
    London
    Are we seeing this in extant games, as apparently NVidia has some kind of CPU-scaling advantage? Despite it being sub-optimal in terms of overall engine design?

    Is there a chance this is a side effect of some engines being built upon NVidia as the primary target?
     
  11. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    946
    Likes Received:
    413
    You can see the differencial between Nvidia and AMD in the graphics here:
    http://www.g-truc.net/post-0666.html#menu
    Under "X architectures behavior against small triangle count per draw call".

    Now you can contrast this with an earlier investigation from Nvidia on older hardware ofc:
    http://www.nvidia.com/docs/IO/8228/BatchBatchBatch.pdf

    Reading between the lines in the GDC paper, Nvidia did pay attention to these issues. Just look how different the behaviour between now and back then is.
    AAA Games are tightly tuned to bottlenecks, the presets are tightly tuned to specific hardware profile's bottlenecks. If you tune you shift the optimization process (LODs, texture sizes, etc.) from local minimum to local minimum. If you only profile on Nvidia hardware ofc you'll find a Nvidia specific local minimum (fill rate, triangle rate, z-buffer rate, geometry-shader rate, etc.).
     
    pharma likes this.
  12. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
  13. Malo

    Malo Yak Mechanicum
    Legend Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    8,929
    Likes Received:
    5,529
    Location:
    Pennsylvania
    Thanks Dave, I guess that was inevitable and good that AMD are addressing it early on instead of dragging it out further.
     
  14. A1xLLcqAgt0qc2RyMz0y

    Veteran

    Joined:
    Feb 6, 2010
    Messages:
    1,589
    Likes Received:
    1,490
    OPEN Mantel is D E A D.

    http://www.pcper.com/news/Graphics-...ight-Be-Dead-We-Know-It-No-Public-SDK-Planned

    So, it turns out that Mantle was never an OPEN standard (API never released) and will be a CLOSED API used only for select partners.

    AMD spinning closed API as open. Too funny. :-D
     
  15. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,382
    It's not closed, it's a redefined open.

    I do think that, at some point, they wanted it to be open. But an API whose claim to fame is closing an efficiency gap is doomed from the start to be short lived. Once the competition does the same, there's nothing left. They probably realized this quite a while ago but forgot to tell their fans (and The Scientist.)

    Either way, we'll never know if DX12 was a reaction to Mantle or not, but if it was, let's remember Mantle as a catalyst of improvement.
     
    #2015 silent_guy, Mar 2, 2015
    Last edited: Mar 2, 2015
  16. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    We may as well wait for March 5 to see if any additional data comes out. That date is significant to almost everyone else as well.

    If it goes as many think it will, then the change in direction would make sense.
    That some partners could still find use for Mantle may point to custom designs whose development period would be well-advanced by the time the next-gen APIs AMD is pointing developers towards can launch, or possibly some offer for more proprietary changes and tweaks outside of the mainstream after that.
     
  17. homerdog

    homerdog donator of the year
    Legend Subscriber

    Joined:
    Jul 25, 2008
    Messages:
    6,294
    Likes Received:
    1,075
    Location:
    still camping with a mauler
    Anyone could have looked at AMD's marketshare and seen that Mantle could never coexist with DX12 and the next OGL. But Mantle served its purpose and I'm glad it existed.
     
  18. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    Perhaps some of the non-technical events surrounding AMD and its own internal changes may have had an influence on the timing.

    I do look forward to seeing what documentation will be opened up for Mantle. It might shed light on what else was going on in terms of features or goals that the Mantle effort was striving towards over the last year. More information is good, even should this be a reveal that is a bit more archeological than may have been planned at the outset.
     
  19. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
  20. homerdog

    homerdog donator of the year
    Legend Subscriber

    Joined:
    Jul 25, 2008
    Messages:
    6,294
    Likes Received:
    1,075
    Location:
    still camping with a mauler
    So Mantle evolves into Vulkan. This is a best case scenario for AMD. Hope they can capitalize.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...