CPU and GPU submissions buffered? *spawn*

Discussion in 'Console Technology' started by McHuj, Jul 7, 2014.

  1. McHuj

    Veteran Regular Subscriber

    Joined:
    Jul 1, 2005
    Messages:
    1,579
    Likes Received:
    802
    Location:
    Texas
    What I don't understand is why isn't the CPU and GPU double buffered on frames: CPU process frame N, while GPU is rendering N-1, that way both have the full frame time to do their work. Obviously that introduces a frame latency, but at a high frame rate, I don't know if that would be perceptible? maybe?
     
  2. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    11,669
    Likes Received:
    12,655
    Location:
    The North
    We are slightly OT but worth spinning off. I would also like to know more from anyone who could provide insight into this, how swap chains or buffering is accounted for in processing time.
     
  3. MJP

    MJP
    Regular

    Joined:
    Feb 21, 2007
    Messages:
    566
    Likes Received:
    187
    Location:
    Irvine, CA
    That is probably the most popular way of handling GPU submission on consoles, since it's simple and lets you fully parallelize the CPU and GPU. Some games use more complex setups where the CPU will send partial submissions to the GPU (D3D drivers often do this on Windows). Other games might even add an additional frame of latency in order to give the CPU more than a single frame's worth of time to complete its work.

    The situation iroboto describes (CPU and GPU working in lockstep) isn't commonly used in games as far as I know, since it means one processor just stalls while the other is working. Even with both processors working in a parallel you can still have one bottleneck the other, since you'll typically have one wait for the other if it's running slow. In the ideal case both processors complete in the target frame time (16.6ms for 60fps, 33.3ms for 30fps) and you only wait for VSYNC (if you're using it).
     
  4. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    11,669
    Likes Received:
    12,655
    Location:
    The North
    Thanks MJP. It didn't occur to me that this was occurring. So basically by the time the CPU finishes running through everything in Render() and all the draw calls were made, it's just going to back into Update() and continue forward. If the GPU completes the work in time it'll write into 1 of 2/3 back buffers and set the pointer there to draw the screen and swap as necessary. And while it's been doing this the CPU is actually still going forward.

    If the slideshow is a result of the CPU being too slow to feed the GPU, what are we seeing as players if the GPU is far behind the CPU like multiple frames? Do we get the weird... speed up effect?
     
  5. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,293
    Location:
    Helsinki, Finland
    By default PC DirectX is allowed to buffer up to 3 frames. Draw calls just add commands to the GPU command buffer and immeditely return until the GPU buffers are full or the maximum latency is exceeded. If buffers are full / maximum latency is exceeded, the draw calls will block until the GPU execution has proceeded. If you do timing on CPU side (in the main render thread), you will notice quite noisy results because of this. Also many PC GPU drivers have a separate thread for processing draw calls and translating them / doing resource management for GPU memory, this adds some fluctuation because of thread contention (assuming your game is using enough threads).
     
  6. taisui

    Regular

    Joined:
    Aug 29, 2013
    Messages:
    674
    Likes Received:
    0
    I recall Turn 10 did a double buffer technique in Forza 2 to push it up to 60 fps, I think they did a presentation/slide on it, but couldn't seem to find it now. I'm under the impression that this is fairly common with game engines today.
     
  7. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    11,669
    Likes Received:
    12,655
    Location:
    The North
    Thanks for the responses everyone. To follow up with McHuj question is the lag perceptible when triple buffered? For 60fps you are visually delayed by 49ms and for 30 FPS nearly 100ms.

    Sebbbi in a game like trials fusion inputs and FPS are critical to completing some courses did you do something to processing player inputs to remove some of the buffer delay?

    Is collision detection as well as audio when triple buffered noticeable by players?
     
  8. steveOrino

    Regular

    Joined:
    Feb 11, 2010
    Messages:
    496
    Likes Received:
    163
    Noticeable input lag depends on game. Easy to see on the PC because of the variable performance you can play with. In FPS or any that requires precise aiming/location in tandem with the input device (mouse) you will notice tracking delay and in a competitive environment people will elect to turn buffering off. But in general you can definitely get away with it in most genres without it being noticeable to the player.
     
  9. ERP

    ERP Moderator
    Moderator Veteran

    Joined:
    Feb 11, 2002
    Messages:
    3,669
    Likes Received:
    49
    Location:
    Redmond, WA
    Adding to this, circa DirectX 3 many games did accidentally render in lock step, there were API's to wait on the GPU, and many games did.
    When DX5 was released, they took out all of he synchronization primitives because the result of synchronizing was a lot of idle GPU time.

    I doubt there are many games that intentionally sync the GPU and CPU anymore, you can still do it "accidentally" by locking a surface or using one of the read back API's, but analysis tools are much better now, so identifying these is much easier.
     
  10. pMax

    Regular Newcomer

    Joined:
    May 14, 2013
    Messages:
    327
    Likes Received:
    22
    Location:
    out of the games
    I think there are some - those that makes heavy use of GPGPU needs a way to synchronize CPU and GPU.
     
  11. taisui

    Regular

    Joined:
    Aug 29, 2013
    Messages:
    674
    Likes Received:
    0
    Using GPGPU technique to process the data in place of the CPU for each frame doesn't change that each frame's data need to be ready before rendering, I don't think they'd conflict each other. It might create a resource issue though, not sure how smart the schedulers are currently.
     
  12. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    43,576
    Likes Received:
    16,031
    Location:
    Under my bridge
    Chances are virtually nil these days. With multiple CPU cores, multiple threads, even multiple GPU threads, nothing is going to be sitting around waiting. Multiple jobs will be available when waiting for something else to finish. Any stalls will be bugs, not design choices. We've even had devs on this board talk about frame N+1 type calculations, so you in any given frame rendering period be processing current frame requirements, next frame requirements, rendering current frame and even rendering parts for the next frame.
     
  13. pMax

    Regular Newcomer

    Joined:
    May 14, 2013
    Messages:
    327
    Likes Received:
    22
    Location:
    out of the games
    ...How do you plan to synchronize your GPGPU with your frame's rendering pipeline and with your CPU if they assisted it?
     
  14. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,416
    Likes Received:
    4,172
    Location:
    Well within 3d
    If dealing with standard API stuff, there are number of places where they'd have to make such a trip, such as handling occlusion queries (edit: and then routing the results back)--which requires CPU intervention through commands put through the runtime and driver. Because of the massive and unpredictable latencies, it simply does not get done on the current frame or possibly multiple frames.

    The more integrated architectures or low-level APIs would be lower latency, or remove outside intervention.

    In relative terms, though, it wouldn't be considered heavy, at least in terms of frequency.
    For CPU to CPU communication, getting data to use in the worst case would be main memory latency, so over a hundred of cycles, with in-cache access taking a handful of cycles. It's still used judiciously.
    Doing the same thing with the latest APUs by sending a command to a GPU buffer to make the results of compute available without using Onion+ would according to Vgleaks have a worst-case of tens of thousands of GPU cycles.
    The predictability of the GPU's queueing is not that great at present, though. That could still make the case for buying 33ms or so by working on previous frame data.
    With Onion+, a bandwidth-restricted amount of data could be sent from the GPU to main memory and then back to a requesting CPU after some multiple hundreds of cycles. It's a minority of the data being processed.

    The amount of synchronization between the two sides would be commensurate to how debilitating using it would be.
    The best GPGPU methods are painful at present, and are used sparingly. They just aren't horrific anymore.

    If long-running compute that handles itself mostly on the GPU with occasional runs through Onion+ can be done, it might lead to a somewhat freer interplay with the CPU because it should remove much of the multi-frame queueing latencies that can accumulate if the GPU is under load. Presentations from Sucker Punch on the PS4 indicate this is still troublesome. For PC drivers, it might be an application killer, since such a kernel isn't one that would conclude in time for a driver's timeout/freakout limit.
     
    #14 3dilettante, Jul 8, 2014
    Last edited by a moderator: Jul 8, 2014
  15. SlimJim

    Banned

    Joined:
    Aug 29, 2013
    Messages:
    590
    Likes Received:
    0
  16. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    11,669
    Likes Received:
    12,655
    Location:
    The North
  17. patsu

    Legend

    Joined:
    Jun 25, 2005
    Messages:
    27,709
    Likes Received:
    145
    Thanks for posting that link.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...