Asynchronous Compute : what are the benefits?

Discussion in 'Console Technology' started by onQ, Sep 19, 2013.

  1. MJP

    MJP
    Regular

    Joined:
    Feb 21, 2007
    Messages:
    566
    Likes Received:
    187
    Location:
    Irvine, CA
    There can actually be quite a bit of "idle" time on a GPU, at least if you look at the resources used by compute shaders. Even if you ignore rendering phases where ALU's aren't used heavily to begin with (for instance, depth-only rendering for shadow maps) there's typically quite a bit of time where the GPU has to sync/stall in order to allow subsequent rendering passes to run in lock-step. Async compute offers a convenient way of executing shaders that bypass all of the syncing (hence, the "async" part of its name), which allows you to "fill up" that idle time with compute jobs. I don't really want to go into too many specifics due to NDA, but my friends at Q games are a bit more cavalier and have shared some of their profiling data in these slides (see slide 83).

    Obviously it depends quite a bit on what kinds of compute jobs you're running and what else is happening concurrently on the GPU. However it certainly isn't so cut and dry as "running async compute shaders always takes away processing time from graphics", if that's what you're suggesting.
     
    #161 MJP, Nov 22, 2014
    Last edited: Nov 22, 2014
  2. function

    function None functional
    Legend Veteran

    Joined:
    Mar 27, 2003
    Messages:
    5,461
    Likes Received:
    3,127
    Location:
    Wrong thread
    I think the contentious issue is more the idea that using async to 'offload' work from the cpu can be done without using resources that could also be accessed using async for graphics (or anything else).
     
  3. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    43,576
    Likes Received:
    16,031
    Location:
    Under my bridge
    Seeing as the compute situation keeps coming up, we should have a proper home for it.
     
  4. Arwin

    Arwin Now Officially a Top 10 Poster
    Moderator Legend

    Joined:
    May 17, 2006
    Messages:
    18,095
    Likes Received:
    1,698
    Location:
    Maastricht, The Netherlands
    It doesn't really matter. If the main bottleneck is CPU then your GPU resources are going to waste, producing a very pretty slideshow.
     
  5. Rurouni

    Veteran

    Joined:
    Sep 30, 2008
    Messages:
    1,011
    Likes Received:
    306
    That is why I said barring the bottlenecks thing.
    But if they can actually use the GPU to do their CPU task, then maybe since the CPU is being used less, thus more bandwidth for the GPU because less contention with the CPU.
     
  6. chris1515

    Veteran Regular

    Joined:
    Jul 24, 2005
    Messages:
    4,988
    Likes Received:
    4,092
    Location:
    Barcelona Spain
    I hope compute centric games engine will be there for 2015 holidays... The wait is long...

    It is sad to see two GPU centric consoles not show their potential.
     
  7. patsu

    Legend

    Joined:
    Jun 25, 2005
    Messages:
    27,709
    Likes Received:
    145
    I thought one of their experiments is to see if they can make the GPU scheduling and jobs more autonomous ? If successful, that should free up some of the CPU dependency (but not all).

    One of the questions in my mind is how do Sony see and position themselves in the PS4 developer ecosystem.

    During the PS3 era, Mark Cerny at first was thinking of keeping his Cell expertise to "themselves". He saw it as a competitive advantage against other studios. They later changed position when they found out everyone was having serious trouble. If Sony didn't help out, PS3 would suffer as a platform.

    So now if PS4 is easy to develop for, and the low level tools are available since day 1, will they keep their approach to themselves ? Or will they share the more advanced techniques out ? :)

    Granted, the cross platform developers should be very familiar with AMD tech.
     
    chris1515 likes this.
  8. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,335
    Likes Received:
    5,904
    With PS4, Xbox One and PC(mantle, DX12) all supporting asynchronous compute, there should be no shortage of knowledge about algorithms well suited to GPUs.
     
  9. chris1515

    Veteran Regular

    Joined:
    Jul 24, 2005
    Messages:
    4,988
    Likes Received:
    4,092
    Location:
    Barcelona Spain
    http://m.neogaf.com/showthread.php?t=1009066&page=1

    Very good thread Dylan cuthbert give some precision about the async compute.

    They are early adopter of async compute

    http://m.neogaf.com/showpost.php?p=156110542
     
    #169 chris1515, Mar 16, 2015
    Last edited: Mar 19, 2015
    Globalisateur likes this.
  10. chris1515

    Veteran Regular

    Joined:
    Jul 24, 2005
    Messages:
    4,988
    Likes Received:
    4,092
    Location:
    Barcelona Spain
  11. Globalisateur

    Globalisateur Globby
    Veteran Regular Subscriber

    Joined:
    Nov 6, 2013
    Messages:
    3,594
    Likes Received:
    2,331
    Location:
    France
    chris1515 likes this.
  12. DieH@rd

    Legend Veteran

    Joined:
    Sep 20, 2006
    Messages:
    6,243
    Likes Received:
    2,212
  13. Allandor

    Regular Newcomer

    Joined:
    Oct 6, 2013
    Messages:
    436
    Likes Received:
    275
    ???
    You mean because of the ACEs?
    They don't really matter. They might squeeze a bit more out of the GPU, but the 2 command processors do something similar. Just increasing chances you can use resources that may be unused, so you can use the gpu more efficient.

    what really worries me is, that if the GPU is used more efficiently, it would also consume more power. That on the other hand increases the heat. And the PS4 is already hot and loud enough.
    I even don't know if it can deliver enough energy for the GPU, CPU and memory, if all is under real pressure. PS4 already uses ~140W, how high can it go?
     
  14. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    11,516
    Likes Received:
    12,373
    Location:
    The North
    I think that depends on what task you're attempting to do.
    I'm curious as to what you defined as an optimized async compute experience for PS4. If it's just the larger number of ACE queues, then I don't think that necessarily means Xbox is not optimized for async compute (wrt its own profile) - but I can't fault the idea that more queues would therefore mean more async compute (for PS4).

    edit: nvm - hmm, this is a different approach - time to see what MS did, or didn't.
    • "First, we added another bus to the GPU that allows it to read directly from system memory or write directly to system memory, bypassing its own L1 and L2 caches. As a result, if the data that's being passed back and forth between CPU and GPU is small, you don't have issues with synchronization between them anymore. And by small, I just mean small in next-gen terms. We can pass almost 20 gigabytes a second down that bus. That's not very small in today’s terms -- it’s larger than the PCIe on most PCs!
    • "Next, to support the case where you want to use the GPU L2 cache simultaneously for both graphics processing and asynchronous compute, we have added a bit in the tags of the cache lines, we call it the 'volatile' bit. You can then selectively mark all accesses by compute as 'volatile,' and when it's time for compute to read from system memory, it can invalidate, selectively, the lines it uses in the L2. When it comes time to write back the results, it can write back selectively the lines that it uses. This innovation allows compute to use the GPU L2 cache and perform the required operations without significantly impacting the graphics operations going on at the same time -- in other words, it radically reduces the overhead of running compute and graphics together on the GPU."
    • Thirdly, said Cerny, "The original AMD GCN architecture allowed for one source of graphics commands, and two sources of compute commands. For PS4, we’ve worked with AMD to increase the limit to 64 sources of compute commands -- the idea is if you have some asynchronous compute you want to perform, you put commands in one of these 64 queues, and then there are multiple levels of arbitration in the hardware to determine what runs, how it runs, and when it runs, alongside the graphics that's in the system."
     
    #174 iroboto, Mar 18, 2015
    Last edited: Mar 18, 2015
  15. chris1515

    Veteran Regular

    Joined:
    Jul 24, 2005
    Messages:
    4,988
    Likes Received:
    4,092
    Location:
    Barcelona Spain
    It is not only 8 ACE, volatile bits preventing cache trashing and bus Onion + bypassing GPU cache for synchronisation.

    All GCN are pretty good for Async compute like in PS4 and XB1.

    Async compute is useful for PC too.
     
    Lucid_Dreamer likes this.
  16. chris1515

    Veteran Regular

    Joined:
    Jul 24, 2005
    Messages:
    4,988
    Likes Received:
    4,092
    Location:
    Barcelona Spain
    This time no exotic hardware the "secret sauce" is the same on PS4 and XB1 and work for PC too. It is a good thing...
     
  17. Lucid_Dreamer

    Veteran

    Joined:
    Mar 28, 2008
    Messages:
    1,210
    Likes Received:
    3
    Are you saying an engineering company, like Sony, are in the business of wasting APU space on optimizations that "don't really matter" (especially 4x the amount of async compute pipelines)? That doesn't seem logical.

    These consoles are quite quiet and a LOT cooler than last gen launch consoles. The GPU isn't going to draw more power than it's max (judged from theoretical max performance). Async compute is just making the GPU more capable of meeting and staying closer to that max.

    Having 4x more opportunity at getting something worked on is going to be a lot better at giving you opportunity to get more GPU time back. It just makes sense.
     
  18. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    11,516
    Likes Received:
    12,373
    Location:
    The North
    Because the console shipped with this in mind, it's cooling and power profile is rated for it.
     
  19. Allandor

    Regular Newcomer

    Joined:
    Oct 6, 2013
    Messages:
    436
    Likes Received:
    275
    You quoted me out of context. "that doesn't really matter" was related to the statement "Xbone GPU is not optimized for astync compute like PS4 GPU is."
    Yes, the ACEs will help to use the GPU more effiencent. But the 2 compute command processors will do similar things. so you just can't say that this thing is more optimized for that thing.
     
  20. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    11,516
    Likes Received:
    12,373
    Location:
    The North
    Xbox One has two ACE. The ACEs are just re-labelled from what I understand to Compute Command Processors. MS did claim that they did customize them to be better, somehow, likely at scheduling, but the number of available compute threads is only 16. But I'm unsure as of this moment how many threads are actually required as games continue to evolve.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...