AMD Mantle API [updating]

Discussion in 'Rendering Technology and APIs' started by MarkoIt, Sep 26, 2013.

  1. pMax

    Regular

    Joined:
    May 14, 2013
    Messages:
    327
    Likes Received:
    22
    Location:
    out of the games
    A company that produces* CHIPS as complex as an x86 should have a bad QA?
    That'd be very strange.

    I suppose they are like ALL other companies. Testsuites, regression tests, CI and the like.

    Fact is, no matter if you write 800 tests with dynamic parameters for just a module, your code will always ends up with bugs. Sometime (or often) you even have to deal with 3rd party bugs/side effects that would cause YOU to have bugs... bugs have a nice cascading effect sometime.

    Especially if it is targeted for a big, competitive market and the stuff you do (aka drivers for AMD) is complex and has to deal with an unpredictable thing like an OS and DX/OGL middle layer with a game on top of it.

    *well, WAS producing until it sold its fab's :p
     
  2. UniversalTruth

    Veteran

    Joined:
    Sep 5, 2010
    Messages:
    1,747
    Likes Received:
    22
    I don't understand what exactly you want!

    There will still be a choice for you, either running slower nvidia hardware with DX or faster AMD with Mantle...
     
  3. Svensk Viking

    Regular

    Joined:
    Oct 11, 2009
    Messages:
    627
    Likes Received:
    208
    I don't know tech, is there an inherent advantage in running GPGPU on the integrated GPU? Would it work if you had two different discrete GPUs as well?

    I'm mostly thinking that it would be awesome if one upgrades to AMD's new series two or three years from now, but still can reuse one's current HD 7950 for dedicated GPGPU alongside the new card.
    That's one thing I do like about Nvidia's PhysX, an old GPU doesn't suddenly become a paperweight the moment you upgrade
     
  4. pMax

    Regular

    Joined:
    May 14, 2013
    Messages:
    327
    Likes Received:
    22
    Location:
    out of the games
    ...the advantage would be latency, I think. iGPU shares memory with the CPU, so you can take advantage of HSA and share pointers to data for moderately complex GPGPU with a great latency bonus (like i.e. processing an octree or whatever) where also CPU may take part somehow.

    dGPU would require transfers through the PCIe in those cases.
     
  5. Andrew Lauritzen

    Andrew Lauritzen Moderator
    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,629
    Likes Received:
    1,227
    Location:
    British Columbia, Canada
    We're completely aligned on this. Hence why I was surprised to see that Johan seemed to have other plans...

    Let's be clear... these presentations largely confirmed that the big gains come in CPU overhead, which is unsurprising as GPUs already run with fairly high efficiency in most cases (and would require some big changes to shading languages to improve on that much). On a high end machine with a typical GPU-bound situation, there's not going to be a huge difference unless a developer intentionally sabotages the DX path. iGPUs could of course see larger benefits from reduction in CPU cycles due to the shared power budget.

    Overlapping compute and shadow map rendering is probably the biggest potential gain in GPU performance and that might show up in benchmarks. That said, it's sort of a one-time trick as there aren't that many phases in rendering where the shader array is completely unused... you basically have to be rendering depth/stencil only for that to be the case. On power-constrained GPUs it's likely not to be as much of a win as well since during those portions of the frame the shader array can be shut off and the thermal budget gained from that can be applied to raise the GPU frequency. Whether GCN can power gate the shaders at that granularity is unclear though so perhaps they just sit there wasting power regardless in which case it would still be a win :)

    Note that taking advantage of overlapping compute/graphics does not necessarily require Mantle. A clever driver could notice the relevant disjoint dependencies and do the same thing, but it's not clear that going forward you want yet more cleverness in drivers...

    One other thing from Johan's slides that might make a difference on the GPU side is one bullet that hinted on exposing the compressed depth/MSAA representations. Since I believe AMD has to resolve multisampled depth buffers before sampling them, this could make them more competitive when using deferred MSAA (such as Frostbite supports). That said, NVIDIA already doesn't take as much of a hit from this as AMD, so I expect it to level that out somewhat if anything.

    Moving to pure bindless enables new algorithms and techniques too, but that has been available in GL on NVIDIA for a little while already.

    Stuff like the 100k batches tech demo are cool as they potentially enable some new possibilities (although I'd like to see the same attempted with bindless and multidrawindirect first), but it's not clear that you can ship a game that requires that until DX/GL offer similar capabilities. Which, again, should be the end goal here.
     
    #645 Andrew Lauritzen, Nov 14, 2013
    Last edited by a moderator: Nov 14, 2013
  6. Malo

    Malo Yak Mechanicum
    Legend Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    8,929
    Likes Received:
    5,529
    Location:
    Pennsylvania
  7. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    I'm starting to wonder from the statements of "X has been pushing for this for years" that maybe someone isn't moving to integrate, and Mantle is an attempt to force or bypass them.

    Granted, I don't think all the PR about APIs evolving too slowly is entirely fair. Up until recently, I'm not sure GPUs could be trusted with a ball of string, much less broad access to global memory.

    Maybe that assumption breaks down when the #GPU lines<#platforms.
    Like I noted earlier, a hardware-linked API isn't quite as bad if the platform holder is breaking the API anyway.
    First, there is getting the changes put into the standards, which may involve hashing things out with Microsoft, the IHVs, CAD, mobile, whatever stakeholders.

    Then, even if added to the APIs, a game dev has to consider the API support of the platform or instantiation thereof.
    Including mobile, laptops, consoles, it's probably a tuple.

    Renderer.works
    {platform, ODM, OEM, OS, OS revision/fork, API, API revision?, telecom?, user needs to update the above?, user can't update the above?, OEM drivers?, deprecated device?, not deprecated but typical shoddy device support?}

    OR
    {AMD?,Intel?,Nvidia?,IMG?,Mali???}
     
  8. willardjuice

    willardjuice super willyjuice
    Moderator Veteran Alpha

    Joined:
    May 14, 2005
    Messages:
    1,386
    Likes Received:
    299
    Location:
    NY
    This isn't the place to talk about the quality of EA's QA department.
     
  9. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,714
    Likes Received:
    2,135
    Location:
    London
    So, can GCN (or recent version thereof) create work for itself?
     
  10. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    If we go by AMD's description of HSA queuing and Kaveri's role as the initial product, at least for compute it should be possible.
    I'm not sure it's as applicable to the graphics domain.
     
  11. kukreknecmi

    Newcomer

    Joined:
    Nov 14, 2013
    Messages:
    7
    Likes Received:
    0
    Since Core API is based on AMD_IL, i find it hard for Nvidia to adopt. Not sure if its about "Mantle Driver" or smth else. If its about so called "mantle driver", they should adopt it to pre GCN archs. at first to access more user potential. Before adopting to pre GCN archs, I dont find it reasonable to allow Nvidia to access it. Not sure if Pre-GCN archs prevent smth for developing a "Mental driver" or if its about marketing for GCN cards.
     
  12. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,213
    I guess we have to uphold a certain minimum standard for performance improvement to call Mantle a success .. IMO, anything less than 15% and it's a waste .. a trivial upgrade option at best. 15% and up, then it will be a more significant improvement. Right now none of the developers talking about it has much confidence it can reach more than 20%, which doesn't bode well for Mantle's case.

    On another note, most of things announced so far are optimizations to facilitate programming and control over writing codes, maybe some reduction in memory footprint, others are concerned about CPU improvements, which only helps CPU limited games, the underutilized Bulldozer CPUs and APUs. Battlefield 4, likely the best case for Mantle for months to come, is not even remotely CPU limited. GPU features seem few and far in between and none of the announced so far seem to greatly impact performance.

    Please feel free to correct me when necessary.
     
    #652 DavidGraham, Nov 14, 2013
    Last edited by a moderator: Nov 14, 2013
  13. Psycho

    Regular

    Joined:
    Jun 7, 2008
    Messages:
    746
    Likes Received:
    41
    Location:
    Copenhagen
    Don't look at single player benchies. It IS quite cpu limited, especially if you prefer enough fps over max details. ( this is even in single player: http://gamegpu.ru/action-/-fps-/-tps/battlefield-4-test-gpu.html )

    I would say 20% general performance would be quite a lot. For instance it would generally make the $299 280x trade blows with the $499 gtx780. And I would certainly not expect that much in more gpu limited scenarios.
    Again, it will benefit the fps junkies more than the detail whores :cool:


    Part of the point is obviously to raise the minimum feature level, and pre-gcn cards would certainly miss some required features (and the api is likely also exposing features that kepler is missing). For instance the amount of buffer type aliasing you can't do in DX11 (to allow lesser hardware to require different layouts/implementations for different buffers).
     
    #653 Psycho, Nov 14, 2013
    Last edited by a moderator: Nov 14, 2013
  14. willardjuice

    willardjuice super willyjuice
    Moderator Veteran Alpha

    Joined:
    May 14, 2005
    Messages:
    1,386
    Likes Received:
    299
    Location:
    NY
    The access to compressed msaa data could help BF4 out.
     
  15. snarfbot

    Regular

    Joined:
    Apr 23, 2007
    Messages:
    652
    Likes Received:
    225
    wouldnt mantle benefit a forward renderer more than deferred? neatly avoid all the issues with msaa and transparencies and still get to use a ton of lights.

    anyway i think going forward lol there will be better than 20% improvements in overall performance.
     
  16. Andrew Lauritzen

    Andrew Lauritzen Moderator
    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,629
    Likes Received:
    1,227
    Location:
    British Columbia, Canada
    Mantle doesn't solve any of the issues with (pure) forward renderers... you still don't want to render the same geometry more than once.
     
  17. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    The common misconception seems to be that Mantle brings only CPU gains, and only helps with low end CPUs. This is not true.

    Here are some examples of potential GPU gains:
    Bindless textures and hardware virtual memory (*) allow rendering in larger batches, thus increasing the GPU utilization (GPU partially idles at start/end of draw/dispatch calls). Application controlled memory management means that GPU needs to shuffle less resources around (this isn't only a CPU hit and many current games have frame rate spikes because of this issue). Also the developer can pack resources more tightly (multiple resources in same page/line, increasing memory/cache utilization). With Mantle you can run multiple kernels in parallel (or kernel + graphics in parallel) in a controlled way, and thus reduce the GPU bottlenecks. For example render a shadow map (mostly ROP and geometry setup) and compute ALU heavy pass (for example lighting for the previous light source) at the same time. This results in much higher GPU utilization. Better predicates and storing GPU query results to GPU buffers (without CPU intervention) allow GPU optimization techniques that are not possible with PC DirectX. AMD also claims improvements to indirect draw/dispatch mechanisms, but do not spill the details in Mantle slides (these improvements potentially bring big GPU gains for certain advanced use cases). Direct access to MSAA data could also make deferred rendering AA much faster (more about that in the reply below).

    (*) DirectX 11.2 also has partial support for hardware virtual memory (in form of tiled resources). However it has limitations and the Windows 8.1 requirement basically makes the API useless right now (Mantle has much bigger user base right now). Hopefully Microsoft will solve this issue, and bring some other Mantle's features to 11.3 (and/or 12.0).

    AMD announced that with Mantle we finally have full manual access to both GPUs in Crossfire. This is excellent news. I was quite worried that SLI/Crossfire would die soon, as many new graphics engines will start doing scene management and rendering decisions on GPU side. Alternate frame rendering (with automatically synchronized memory between cards) is just not a good fit for a scenario where the data set is mutated slightly every frame (by compute shader passes that are pretty much impossible to analyze by automatic logic). AFR works best when everything is freshly generated during a single frame and there are no dependencies to existing data. However this kind of processing is a huge waste of GPU (and CPU) time, and frankly we can do much better (and I believe that forthcoming "pure" DX11+ engines that have no legacy baggage surely will). With Mantle, supporting Crossfire is possible even in these kinds of advanced GPU driven rendering engines. Hopefully Nvidia releases something similar in the future as well, or they will see very bad SLI scaling in some games/engines in the future.
    Deferred antialiasing will be much more efficient, assuming the "Advanced MSAA features" in the Mantle slides means that you have direct access to GPU color/depth blocks and MSAA/layer data (including coverage sample index data). With all that data available, tiled (and clustered) deferred renderers can separate pixels (different sample counts) more efficiently and recover geometry edge information (using coverage samples) in a much more precise and efficient way.
    That would definitely give a big GPU boost for a deferred renderer with MSAA (especially with coverage based EQAA/CSAA). Of course estimating the gains is not possible right now, since AMD hasn't yet released full Mantle API specifications, so we don't know exactly how low level access you have to the MSAA & depth/color compression data.
     
  18. NThibieroz

    Newcomer

    Joined:
    Jun 8, 2013
    Messages:
    37
    Likes Received:
    78
    One of the major features of Mantle is the drastic reduction of draw call overhead. Today a lot of developers are having to make compromises on how many batches they render to achieve their target performance; this impacts their technical vision and can therefore be a factor in the decision of which type of renderer to support.
    With Mantle rendering a depth pass to prime the depth buffer in a Forward(+) renderer becomes a completely viable option without running into CPU bottleneck situations.
    Another advantage would be to process Forward+ tile culling (or other compute shaders operating on the scene) with asynchronous compute to get better GPU utilization.
    I am looking forward to seeing how Mantle adopters will be using the power available to them to optimize their engine once they've had more time to play with the API.
     
  19. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    Reduced draw call overhead definitely helps in the cases where you must submit draw calls multiple times. Mantle slides also hint another possibility. With low level access to hardware command buffers, you could just record the draw calls once, add some predicates to pixel shader disabling (allows you to disable shaders on the depth only pass), and instruct the GPU to run the same command buffer twice with different predicates active (reusing same command buffer twice is not possible using stardard PC DirectX). The end result being that you pay zero CPU overhead for the depth only pass.

    However the depth only pass still costs quite a few extra GPU cycles, especially if the game uses complex geometry and tessellation (with displacement maps, etc) and/or heavy skinning (some next gen games will use hundreds of bones for human faces).

    Lighting cost is also higher with forward+, since the lighting is done in the pixel shader that is ran for triangles (not during a full screen pass). Quad efficiency of modern high polygon games (high quality models + tessellation) can be as low as 60%, meaning that you basically lose 40% of your GPU cycles. In comparison, deferred lighting in full screen pass later has no quad efficiency problems (it's a single full screen quad or a compute shader pass that has ~8x8 pixel granularity).

    Forward+ is a good solution if your triangle counts are not that high and you don't use heavy vertex animation or tessellation. In low polygon games (current gen ports for example) the depth only pass is dirt cheap (for GPU), and quad efficiency is often more than 80% (so you lose less than 20% of your lighting performance). But I just don't see it as a viable technique in the future, especially now as Mantle allows low level access to the GPU MSAA data on PC as well (this is a huge gain for the deferred renderers).
    This is a good idea, and should provide similar GPU performance gains as asynchronous compute use during shadow map rendering.
     
  20. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,493
    Likes Received:
    474
    I understand you're using a generalization here, but many games are bandwidth limited for a significant portion of the time so you're not really losing 40% of the performance. It could still be a significant hit though.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...