DX12 Performance Discussion And Analysis Thread

Discussion in 'Rendering Technology and APIs' started by A1xLLcqAgt0qc2RyMz0y, Jul 29, 2015.

  1. Darius

    Newcomer

    Joined:
    Sep 27, 2013
    Messages:
    37
    Likes Received:
    30
    I personally don't have any, but I'm not sure what it would reveal other than that they're not doing async compute either. Maybe if we can find something that does clearly run async on Maxwell 2 and doesn't on Maxwell 1, we'll at least confirm that NVidia wasn't totally fabricating their claims.
     
  2. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,887
    Likes Received:
    4,534
    From @Speccy's link ...

     
    drSeehas likes this.
  3. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    The class of use cases would be frame-spanning or produce results that do not directly feed into the graphics for a specific frame, like AI calculations, or physics. Whether that happens to a significant extent is uncertain, but it came out in the marketing.

    In other cases, I've considered the possibility for using a few long-lived wavefronts that mostly idle for the purposes of reserving some sliver of resources for latency-sensitive operations like audio. With increasing preemption, perhaps that workaround is not as needed, but at least for console-level systems doing things like keeping a thread running as long as possible with a fixed affinity for latency uniformity is still done.
     
    Jackalito and Darius like this.
  4. Darius

    Newcomer

    Joined:
    Sep 27, 2013
    Messages:
    37
    Likes Received:
    30
    Not sure it's relevant here, this operation isn't running up against a vsync wall nor is anything being pre-empted.
     
  5. giannhs

    Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    44
    Likes Received:
    47
    well using specific 3rd party extension that doesnt belong to the dx part as a fact isnt really the way to do it and if im correct we will see the same results on older cards too
     
  6. Darius

    Newcomer

    Joined:
    Sep 27, 2013
    Messages:
    37
    Likes Received:
    30
    You mean PhysX? I can't tell if it's running asynchronously anyway cause we don't have the separate measurements for how long the compute and graphics loads would have taken on their own.
     
  7. giannhs

    Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    44
    Likes Received:
    47
    i mean both games written for nvidia cards and physx
    dont get me wrong its something new from all those infos i got already thats why i need someone to test that hypothesis
     
  8. RecessionCone

    Regular Subscriber

    Joined:
    Feb 27, 2010
    Messages:
    505
    Likes Received:
    189
    You mean you can't tell whether it's running *concurrently*.

    This whole thread is very confused about what asynchrony means. In fact, it's very common to have asynchronous interfaces that execute sequentially.

    A better description for what people on this thread are interested in is "concurrent graphics and compute". Asynchronous compute for GPUs is as old as CUDA. But the ability to run graphics workloads concurrently with compute workloads is what this thread is really about, and is a relatively new thing.

    Just how useful it is in practice remains to be seen. There are always overheads with these sort of scheduling systems, whether implemented in hardware or software or both.
     
    Xuper, drSeehas, serversurfer and 3 others like this.
  9. giannhs

    Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    44
    Likes Received:
    47
    its not really new gamecube was using asynchronous engines back then (although i miss the name they had for it back then..)
     
  10. Darius

    Newcomer

    Joined:
    Sep 27, 2013
    Messages:
    37
    Likes Received:
    30
    Yeah, I did think the terminology was a little confusing. As I understand it, there's no debate whether Maxwell can run multiple compute workloads concurrently, but the issue is whether it can run graphics and compute concurrently. That test showed GCN doing it clear cut. Neither side claims to have more than one graphics queue, so as long as we're using a test that Maxwell interprets as purely graphics, we shouldn't expect it to run anything but serially. What still remains a mystery to me is why the same workload is compute to GCN but graphics to Maxwell.
     
    drSeehas likes this.
  11. Would the ability to handle two concurrent graphics queues help in VR?
     
  12. Fantasma

    Joined:
    Sep 2, 2015
    Messages:
    1
    Likes Received:
    0
    Maybe the answer is in the slide 55: support for D3D11 only. If it runs within the DX11 API, it should not be able to run asynchronous. Physx is an additional API, as far as I know, which can add its own instructions out of DX11.
     
  13. tobi1449

    Joined:
    Sep 4, 2015
    Messages:
    1
    Likes Received:
    0
    Even bigger question for me is if the classification for this is in the hardware (bad) or firmware/driver (better since fixable).
     
  14. Sinistar

    Sinistar I LIVE
    Regular Subscriber

    Joined:
    Aug 11, 2004
    Messages:
    660
    Likes Received:
    74
    Location:
    Indiana
    It appears to me that Nvidia does not support running compute, and graphics concurrently, so they are emulating it by converting compute commands into graphic commands.
     
    serversurfer likes this.
  15. Darius

    Newcomer

    Joined:
    Sep 27, 2013
    Messages:
    37
    Likes Received:
    30
    That can't be it, because when the compute workload is done in isolation it's still considered graphics by Maxwell.
     
  16. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    This is about asynchronous compute, the synchronous form was already baked in--meaning a graphics context could run graphics and compute commands already.
    Perhaps rather than having context that is architecturally incapable of housing graphics functionality, this test is being given a second graphics context that just happens to only have compute commands.

    The ability to host multiple graphics contexts should be within the Nvidia implementation's capabilities, given its multi-user products.
     
    pharma likes this.
  17. Darius

    Newcomer

    Joined:
    Sep 27, 2013
    Messages:
    37
    Likes Received:
    30
    How is the appropriate context determined? The same code is being interpreted by GCN as compute and by Maxwell as graphics.

    Like is there a flag that needs to be set, marking the code as compute? Or is it something the driver/hardware determines on its own?
     
    drSeehas likes this.
  18. Ext3h

    Regular

    Joined:
    Sep 4, 2015
    Messages:
    428
    Likes Received:
    497
    That is possibly even working properly - as long as the driver may assume that the work items in the queue are independent.

    IMHO, the awful performance when enforcing serial execution speaks for a lack of dependency management in the hardware queue. This would enforce a roundtrip to the CPU between every single step.

    Unlike GCN, where "serial execution" doesn't appear to actually mean serial. It's possible that the driver only enforces the memory order in that case, and still pushes otherwise conflicting jobs to the ACEs and uses the scheduling capabilities of the hardware. This could possibly also explain the better performance when enforcing "serial" execution as the optimizer may now treat subsequent invocations as dependent and may therefore possibly even concatenate threads, which ultimately leads to reduced register usages.

    It's a long shot, but it might be that Nvidias GPUs have no support for inter shader semaphores while operating in graphics context.
     
  19. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    The context is defined when the queues commands are being sent to are defined.
    However, what the graphics system categorizes them as internally shouldn't bother the API as long as they act appropriately.

    Perhaps there's a wrinkle in the behavior that caused the timestamps to not work in the Nvidia compute queue?
    https://forum.beyond3d.com/posts/1869354/


    This seems like that can be assumed since DX12 has an explicitly asynchronous compute queue outside of programmer-defined synchronization points, or for independent user contexts for virtualized graphics products that would also be by definition independent.

    With the latest IP, AMD has involved significant hardware management for both scenarios, whereas software appears to be more involved with Nvidia's implementations.

    It's possible that there's more driver-level management and construction of the queue.
    Possibly, the 32 "queues" are software-defined slots of independent calls the driver can determine that the GPU can issue in parallel, possibly by a single command front end.
    If running purely in compute, this seems to stair-step in timings as one would expect.

    AMD's separate compute paths may provide a form of primitive tracking separate from the primitive tracking in the geometry pipeline.
    It does seem like the separate command list cases can pipeline well enough. Perhaps there is a unified tracking system that does not readily handle geometric primitive and compute primitive ordering within the same context?
     
  20. Darius

    Newcomer

    Joined:
    Sep 27, 2013
    Messages:
    37
    Likes Received:
    30
    So if I understand you correctly, the graphics was sent to a predefined "graphics queue" in DX12, and the compute was sent to a predefined "compute queue" in DX12. And then Maxwell internally redirected the compute to the graphics? Presumably because it determined it couldn't run that code in a compute context?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...