Asynchronous Compute : what are the benefits?

Discussion in 'Console Technology' started by onQ, Sep 19, 2013.

  1. Globalisateur

    Globalisateur Globby
    Veteran Subscriber

    Joined:
    Nov 6, 2013
    Messages:
    4,592
    Likes Received:
    3,411
    Location:
    France
    Except in some cases where you can use the shared memory available for both the CPU and GPU on PS4, right? (with a 20GB/s maximum bandwidth)

    [​IMG]

     
  2. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,833
    Likes Received:
    18,632
    Location:
    The North
    edit: nvm.
     
    #222 iroboto, Mar 23, 2015
    Last edited: Mar 24, 2015
  3. mrcorbo

    mrcorbo Foo Fighter
    Veteran

    Joined:
    Dec 8, 2004
    Messages:
    4,024
    Likes Received:
    2,851
    The quote you were responding to was still in the context of a PC with dedicated graphics card, I believe, and was a continuation of the idea presented in this quote from an earlier post

     
  4. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    Yes, you have unified memory on consoles. However if you design your engine around fast CPU<->GPU communication using unified memory, it becomes very hard to port to PC. You have at least one additional frame of GPU roundtrip latency on PC (and multiple frames with SLI/Crossfire). This is partly because of the separate CPU and GPU physical memories, and partly because of the API abstractions (no direct way to control GPU memory and data transfers, hard to ensure enough work for various GPUs when working at near lockstep). DirectX 12 (and Vulkan) will certainly help, but they cannot still remove the need to move data between the CPU and GPU memories.

    The CPU and the GPU run asynchronously. A PC game will never be able to have as low latency as a console game, because the GPU performance is unknown. You want to ensure that there's always enough work on GPUs command queues (faster GPUs empty their queues faster). To prevent GPU idling (= empty queues), you push more data to the queues, meaning longer average wait time until each command gets out. Asynchronous compute has priority mechanisms to fight against this issue, but only time will tell how much these mechanisms can lower the GPU roundtrip latency on PC. For game play code, the worst case latency is of course the most important one (large fluctuating input lag is the worst). It remains to be seen whether ALL the relevant Intel, Nvidia and AMD GPUs provide low enough latency for high priority asynchronous compute. If the latency is not predictable across all the manufacturers, then I expect cross platform games to continue using CPU SIMD (SSE/AVX) to do their game play related data crunching.

    Insomniac (Sunset Overdrive developer) had a GDC presentation about CPU SIMD:
    https://deplinenoise.wordpress.com/2015/03/06/slides-simd-at-insomniac-games-gdc-2015/

    In page 4 Andreas explains why they continue using CPU SIMD instead of GPU compute for game play related things. Latency is the key.
     
  5. chris1515

    Legend

    Joined:
    Jul 24, 2005
    Messages:
    7,157
    Likes Received:
    7,965
    Location:
    Barcelona Spain
    I think sometimes people forget exclusives PS4 dev don't have the same constraint than multiplatform dev working with PC.

    edit: the bullet API PS4 version keep the gameplay physics on CPU.
     
    #225 chris1515, Mar 24, 2015
    Last edited: Mar 24, 2015
    Lucid_Dreamer likes this.
  6. liquidboy

    Regular

    Joined:
    Jan 16, 2013
    Messages:
    416
    Likes Received:
    77
    If you don't mind sharing (and of course if your NDA allows you to) , in this situation would you choose to NOT design your engine around fast CPU<->GPU because you want to support PC (thus handicapping the engine on Consoles) ... ?!

    Just interested in how you would tackle this situation :)
     
    Lucid_Dreamer likes this.
  7. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,235
    Likes Received:
    4,259
    Location:
    Guess...
    Is it really handicapping the engine on consoles? sebbbi already described how it's often (usually?) possible to fill the GPU with rendering based async compute jobs which don't need the fast CPU <->GPU interconnect.

    And the slide deck he posted explained (very well I think) why CPU SIMD should be used for the low latency jobs where possible.

    What I found particularly interesting about that slide deck is on the one hand, the fact that AVX performance on AMD CPU's is crippled to the point that Insomniac don't even bother with it on the PS4 and use SSE4.2 instead. While on the PC, despite some PC's offering AVX2 capability (potentially 4x SSE performance), the fragmentation in the PC market would generally limit them to between SSE2 and SSE4.1 depending on how high the hardware target is.

    That is unless some kind of abstraction software is use that can automatically switch between SSE and AVX (and AVX2?) as described in the deck.
     
    liquidboy likes this.
  8. Inuhanyou

    Veteran

    Joined:
    Dec 23, 2012
    Messages:
    1,305
    Likes Received:
    480
    Location:
    New Jersey, USA
    So basically saying, the benefits of GPGPU like how Ubisoft presented in their slides is not possible with their games because the PS4's set up is incapable of being compatible with PC/XB1 due to the specific advantage the PS4 offers :/

    That sucks, that means a majority of devs won't utilize such a thing even though it has some potential. Thanks for the info as usual sebbbi.

    On the other hand, maybe CC2 will be able to manage some kind of approximation, they have been looking into this very thing for implementation in their games.
     
  9. chris1515

    Legend

    Joined:
    Jul 24, 2005
    Messages:
    7,157
    Likes Received:
    7,965
    Location:
    Barcelona Spain
    No Xbox One is good too for this too. The problem is more on PC side with PCI EXPRESS bus linking the GPU to main RAM and API level of control not available on PC.
     
  10. Inuhanyou

    Veteran

    Joined:
    Dec 23, 2012
    Messages:
    1,305
    Likes Received:
    480
    Location:
    New Jersey, USA
    True, but i was more referring to XB1's lack of compute resources in comparison...its going to be a hard sell to try and put compute into multiplat games where the effect is limited on a platform you have to get working as best as possible to the same level. Is it even worth it at that point to use compute?

    From DF's observations, FF15 uses Infamous SS's type of GPGPU compute effects for particles from everything to the summons dispersal to the damage effect of ordinary slashes. In that case, XB1 is lagging behind. Do they just cut that utilization back or something?
     
  11. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    22,146
    Likes Received:
    8,533
    Location:
    ಠ_ಠ
    If they're on PC, they still have an even more limited baseline to do a cut-off.

    Some already reduce resolution.
     
  12. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,833
    Likes Received:
    18,632
    Location:
    The North
    My understanding from Sebbbi is that he prefers/would like to see all game code to not be run at all on the GPU. All graphics related items run on the GPU and game code stays on the CPU.

    If you heavily tune your engine to purposefully calculate work between cpu and gpu it would be hard to port it to PC due to the round trip latency between the CPU and GPU. If you leverage async compute for graphics however that round trip latency doesn't exist: it goes to the GPU and stays there.

    There are tools on the CPU side that could be explored without having to use the GPU to perform the calculations as Sebbbi has mentioned. Since what your attempting to do is asynchronous compute you run into issues of the data does not return in time the CPU is stalled waiting for the GPU to return results. And you also take up clock cycles that could have been used for graphics.

    He does make a good case in this statement; gameplay programmers should be leveraging and optimizing their CPU and memory as much as you'd have to do on the GPU side.
     
  13. Inuhanyou

    Veteran

    Joined:
    Dec 23, 2012
    Messages:
    1,305
    Likes Received:
    480
    Location:
    New Jersey, USA
    I see, thank you for the information and clarification.
     
  14. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,235
    Likes Received:
    4,259
    Location:
    Guess...
    iroboto already said it more or less but it's worth pointing out that as sebbbi said earlier, tons of games, on both consoles and on PC already use GPU compute (for graphics work) and have done since DX11 became standard. So there's no fear about it being used on consoles since it already is.

    When DX12 lands the PC will also widely support async compute (I say widely as it already does through Mantle on AMD GPU's). The question I'm still not sure about though is whether a game can be developed to use synchronous compute and automatically use async compute if/when it's available in the hardware, (and vice versa) or whether the game needs to be specifically coded to make use of either async or sync compute specifically. Because as far as I'm aware, no Intel GPU's support async compute so that could greatly hinder it's takeup, at least in the PC space if it requires full hardware support and has no fall back option. Judging from sebbbi's enthusiasm for this though, I'm assuming that wouldn't pose too much of a barrier.
     
  15. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,833
    Likes Received:
    18,632
    Location:
    The North
    I didn't actually answer the second part of your question, or really the first which was whether or not the shared memory bus would be used. The answer is yes it would be used in mutliplats. In both scenarios where compute is headed to only the Gpu that bus can be leveraged. And in the scenario that a round trip needs to be made it can leveraged with less much less latency than PC would receive and X amount less latency than XO would receive.

    But the scenario where you are particularly designing an engine for that round trip performance, it could only be leveraged on pS4 as your code would eventually become dependent on the faster latency of that shared memory space. I think this was what Sebbbi's stance was with regards to difficulty porting out.
     
    #235 iroboto, Mar 24, 2015
    Last edited: Mar 24, 2015
  16. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,833
    Likes Received:
    18,632
    Location:
    The North
    I'm going to have to respond in terms of Xbox hardware, but I believe this is the difference between high priority compute queue and low priority queue. My understanding is that compute shaders/directCompute is high priority while async compute is low priority. If this slide is to be followed:: then we see that the intent for asynchronous compute is meant for small multiple jobs to render faster and fit into gaps (the CPU overhead of dx12 is small and parallel rendering enables such a solution that didn't previously exist on DX11 due to CPU overhead, documented by Ubisoft presentation). Instead, they wrote a very long shader code with sync points to complete it's job, not a very good use of resources, but good for determining the maximum capabilities of the hardware.

    I believe Intel Skylake will be DX12 ready and therefore support asynchronous compute.
    [​IMG]
     
  17. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,235
    Likes Received:
    4,259
    Location:
    Guess...
    I don't understand where the synchronisation comes in? These are CPU tasks in the first place that you're moving to the GPU on the consoles without a latency hit because of the shared memory. But on the PC you're just leaving them on the CPU in the first place so why would you need a low latency sync to the GPU? How is it any different to how games have been splitting tasks between the CPU and GPU (with a slow interconnect) for years?
     
  18. psorcerer

    Regular

    Joined:
    Aug 9, 2004
    Messages:
    732
    Likes Received:
    134
    When you need to show the simulation result you need to update a lot of things in GPU memory.

    It was not different it was same slow. But today consoles are fast (low latency). And PCs are still slow.
     
  19. function

    function None functional
    Legend

    Joined:
    Mar 27, 2003
    Messages:
    5,854
    Likes Received:
    4,400
    Location:
    Wrong thread
    Well not everything sim side needs to make its way to the GPU for drawing, and even for the things that do, many are latency tolerant at the point they are queued up for drawing.

    Don't think there's a Windows build for GPU yet, and even when there is, serial x86 legacy apps will run pretty bad on a GPU based x86 emulator! :eek:

    PC form factor needs to evolve beyond 65~95 W APUs before discrete products go away in gaming PCs.
     
  20. MJP

    MJP
    Regular

    Joined:
    Feb 21, 2007
    Messages:
    566
    Likes Received:
    187
    Location:
    Irvine, CA
    Well, that depends quite a bit on what you're using async compute for. Currently the dominant use of async compute is for optimizing graphics-related tasks that can be run in parallel with other graphics tasks. So for instance might update your particle simulation using async compute jobs that are kicked off at the beginning of your frame, and while that's going on your primary graphics pipe is processing draw calls for a depth prepass. For a situation like this, where async compute is just an optimization and the results are still consumed by the GPU, it's pretty trivial to just kick off your compute job on your main graphics pipe instead. All you really need to do is just make sure that it gets submitted before whatever graphics tasks consume the results of the compute job. It won't run as optimally as if you had async compute, but things will still basically work without any major problems.

    Where things get tricky is if you're using async compute for low-latency, non-graphics tasks. The typical game setup goes like this: frame 1 starts on the CPU, by updating the state of all in-game entities. Once this is done, the CPU then builds GPU command buffers to draw the entities at their current state. At this point the CPU is done with frame 1, and so it submits command buffers to the GPU so that it can render frame 1. While the GPU is cranking away on frame 1, the CPU moves on to frame 2 and repeats the process. The consequence of this setup is that if the entity update phase wants to do some quick compute jobs on the GPU, it might have to wait all the way until the end of the frame for the GPU to finish processing the previous frame before it can actually submit something and have the GPU start executing it. On the PC it might even require more than 1 frame of waiting, since by default the driver will buffer up 2-3 frames worth of command buffers before submitting them. Async compute offers a nice way around this problem, since it essentially lets you say "Hey GPU, I know you're doing other stuff right now, but go ahead and execute these compute jobs whenever you have some spare time" (or right now, if you set the priority high enough". This, together with low-latency readback of results into CPU-accessible memory, essentially opens the door for low-latency GPGPU jobs. If you were using async compute to realize this, you can't really just fall back to synchronous compute unless the system that kicked off the task is capable of tolerating multiple frames of latency. For such cases I would imagine that you would need to have an optimized CPU-only path that you could use instead.
     
    Prophecy2k, sebbbi, dobwal and 2 others like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...