Direct3D feature levels discussion

Discussion in 'Rendering Technology and APIs' started by DmitryKo, Feb 20, 2015.

  1. JoeJ

    Regular Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    841
    Likes Received:
    940
  2. Alessio1989

    Regular Newcomer

    Joined:
    Jun 6, 2015
    Messages:
    591
    Likes Received:
    300
    I want multiple graphics queue on the same device node.
     
  3. Ext3h

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    365
    Likes Received:
    319
    Care to explain what for exactly? I mean, short of the comfort of synchronization by fences there ain't a difference to be expected just by exposing multiple host side queues per logical engine.
    And if you need synchronization between more than one queue, you should still be able to just serialize the schedule in your own code based on CPU side completion events for now, at no significant penalty. The driver couldn't do significantly better either, yet.

    At least not until the GPU did expose a second engine for the GCP, which was IIRC already attempted by one vendor but dropped again as it wasn't trivial to utilize.
     
    #1003 Ext3h, Oct 31, 2019
    Last edited: Oct 31, 2019
  4. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    1,459
    Likes Received:
    202
    Location:
    msk.ru/spb.ru
    TheAlSpark, jlippo, trinibwoy and 6 others like this.
  5. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    1,459
    Likes Received:
    202
    Location:
    msk.ru/spb.ru
  6. Alessio1989

    Regular Newcomer

    Joined:
    Jun 6, 2015
    Messages:
    591
    Likes Received:
    300
  7. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    10,625
    Likes Received:
    5,187
    My layman's understanding of this is that the new features will allow for lower precision in raytracing calculations, which AFAIK is the biggest criticism devs make about nvidia's Turing implementations.

    This might result into two things:
    - Lower end GPUs with lower compute throughput and bandwidth will be able to perform real time raytracing.
    - If the Turing GPUs don't support these optimizations then their raytracing performance will be significantly lower than Ampere and RDNA2 (assuming both of these support the latest optimizations).
     
  8. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    9,267
    Likes Received:
    8,101
    I think Turing will support it. Just a hunch though.
     
    PSman1700 likes this.
  9. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    1,459
    Likes Received:
    202
    Location:
    msk.ru/spb.ru
    Again:
     
    PSman1700, pharma and Dictator like this.
  10. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    10,625
    Likes Received:
    5,187
    Of course you don't require new hardware. DXR doesn't even require BVH accelerators and will work on any DX12 GPU.
    The question is whether Turing's BVH accelerators can be used within the updated pipeline.
     
  11. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    1,459
    Likes Received:
    202
    Location:
    msk.ru/spb.ru
    DXR does require BVHs to be built to trace rays through them. How exactly this is accelerated is down to the h/w which runs the API calls.

    Turing doesn't have any "BVH accelerators", it has ray traversal accelerators which work with BVH structures which on Turing (and Pascal) are built on CPU and CUDA cores.

    So far there is no indication that RT shaders which could be called from other shader types in DXR 1.1 will operate in any way differently to the same RT shaders running inside their own shader type. Which means that the same h/w should be able to accelerate them in the same way as previously.
     
  12. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,737
    Likes Received:
    2,581
    Location:
    Finland
    One shoud specify what "BVH accelerator" means in each case, Turing has BVH traversal accelerators, but not BVH building accelerators like PowerVR has.

    upload_2019-11-15_17-34-27.png

    vs

    upload_2019-11-15_17-34-3.png
     
    DmitryKo likes this.
  13. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,186
    Likes Received:
    3,297
  14. techuse

    Newcomer

    Joined:
    Feb 19, 2013
    Messages:
    191
    Likes Received:
    101
    Support doesn’t necessarily mean the hardware can benefit from it. Nvidia also claimed Maxwell and Pascal supported async compute.
     
  15. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,186
    Likes Received:
    3,297
    Pascal and Maxwell DO support Async Compute, but Maxwell often achieves very little uplift through its use, because it's ALU utilization rate is especially high.

    I suppose you could be right though, Pascal supports DXR alright but it achieves little acceleration from it, however the context of the tweet is Mesh Shaders and DXR1.1, we know Turing supports hardware Mesh Shaders natively, so DXR1.1 is most probably also supported natively on Turing as well.
     
    PSman1700 likes this.
  16. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,737
    Likes Received:
    2,581
    Location:
    Finland
    They support the feature, but Maxwell outright doesn't do async, it just switches from gfx to compute and compute to gfx, Pascal improves on this but is still apparently somewhat limited compared to what GCN (and probably Turing?) can do. Top is Maxwell, middle is GCN, bottom is Pascal
    upload_2019-11-16_2-6-28.png
     
    trinibwoy and BRiT like this.
  17. Alessio1989

    Regular Newcomer

    Joined:
    Jun 6, 2015
    Messages:
    591
    Likes Received:
    300
    Bottom is a proper high priority compute queue running in between the graphics queue job, middle is normal priority.
     
  18. techuse

    Newcomer

    Joined:
    Feb 19, 2013
    Messages:
    191
    Likes Received:
    101
    I dont remember any games where AC can be toggled showing any benefit on Maxwell and Pascal. Im not saying Turing doesnt benefit from DXR 1.1 and/or mesh shaders, just that its not a certainty until we have some actual data in real games. Nvidia more than anyone has a history of misrepresenting their hardware's capabilities.
     
  19. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,186
    Likes Received:
    3,297
    We've had this discussion repeatedly here. Maxwell can do Async Compute, it's the question of does it actually help in performance or not is the one that matters. And in short, Maxwell does support Async perfectly fine, it's just it can't dynamically load balance it's Compute and Graphics queue on the fly to benefit from it, besides, it's utilization rate is sufficiently high to begin with.

    https://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/9



    https://forum.beyond3d.com/posts/1931938/


    For Pascal, prominent examples off the top of my head include: the TimeSpy test.

    Again, NVIDIA had this to say about Async on Maxwell:

    https://www.legitreviews.com/3dmark...performance-tested_184260#0IwideW8dv31FWXI.99

    I don't think Mesh Shaders are under question at this point at all. It's a DEFINITE Yes for support.
     
    #1019 DavidGraham, Nov 16, 2019
    Last edited: Nov 16, 2019
    PSman1700, pharma and JoeJ like this.
  20. JoeJ

    Regular Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    841
    Likes Received:
    940
    I think Doom showed an improvement, IIRC.

    Do you know how this has improved with Volta / Turing?
    Assumption is, with the new 'fine grained sheduling' it should be good now?

    As far is i understand, Maxwell needs to 'divide' the SMs before the work starts, e.g. task A gets 20% of GPU, task B 80%. If one task finishes earlier, there is no way to utilize the SMs that became idle. (Not sure how accurate this is technically.)
    GCN can dynamically balance with its ACEs, and it can even run multiple shaders on the same CU.


    "You could use 10 compute queues if you wanted to, but that won't help increase performance as internet seems to be convinced this days, it will actually hurt performance even on GCN."

    Actually, in Vulkan the number of queues is fixed, and for GCN i get one GFX+compute queue, and two pure compute queues, so i can only enqueue 3 different tasks concurrently. Also, only the gfx queue offers full compute performance, the other two seem halfed even if there is no gfx load, also given priorities from API seem to be ignored. This seems what AMD considered to be practical. Very early drivers had more queues, IIRC.
    I did some testing with small workloads, which is where i see the most benefit because one task alone can not saturate. Results on GCN were close to ideal. (Forgot to test on Maxwell)

    But there is one more of application of AC that is rarely mentioned because it happens automatically:
    If you enqueue multiple tasks to just one queue, but they have no dependencies on each other (no barriers in between), then GCN can and does run those tasks async. Likely also on DX11.
    This is very powerful because it avoids the disadvantages of using multiple queues, which are: Additional sync necessary accross queues, and the need to divide command lists to feed multiple queues. (Big overhead that can cut the benefit, which hurts especially on small tasks where we need it the most.)

    I see some potential to improve low level APIs here. Again: We need ability to generate commands dynamically directly on GPU, including those damn barriers that are currently missing from DX Execute Indirect, VK Conditional Draw and even NV device generated command buffers.
     
    #1020 JoeJ, Nov 16, 2019
    Last edited: Nov 16, 2019
    PSman1700, pharma and DavidGraham like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...