GDC paper on compute-based Cloth Physics including CPU performance

Discussion in 'Console Technology' started by Broken Hope, Oct 13, 2014.

  1. Broken Hope

    Regular

    Joined:
    Jul 13, 2004
    Messages:
    483
    Likes Received:
    1
    Location:
    England
  2. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    42,941
    Likes Received:
    15,028
    Location:
    Under my bridge
    Perhaps more interesting is how much better the XB1 CPU is! Also the poor PS360 results, although PS3 probably isn't too bad at >50% efficiency.
     
  3. MetalSpirit

    Newcomer

    Joined:
    Jan 29, 2014
    Messages:
    48
    Likes Received:
    1
    I noticed that too! Wondering if it has something to do with API compatibility, because the CPU should not be ober 9,5% faster!
    Regardless of facts, the Xbox Cpu is 15% faster at this test.
    Also relevant is the final slide, where we can see that overall, the PS4 GPU can be almost 100% faster than the one in Xbox One.
     
  4. Delta9

    Regular

    Joined:
    Feb 9, 2009
    Messages:
    453
    Likes Received:
    6
    These two.

    [​IMG]
    [​IMG]
     
  5. Broken Hope

    Regular

    Joined:
    Jul 13, 2004
    Messages:
    483
    Likes Received:
    1
    Location:
    England
  6. LightHeaven

    Regular

    Joined:
    Jul 29, 2005
    Messages:
    538
    Likes Received:
    19
    They are not running the exact same code. On xbone they use Dx11 code, and on Ps4 they ported it to PSSL. They do state that the Ps4 might lead to better performance, though you have to manage it by yourself.

    They also say the shader is bandwidth bound, but I couldn't find if their implementation uses esram on the bone or just ddr3... But it might explain the huge disparity in performance between them.
     
  7. steveOrino

    Regular

    Joined:
    Feb 11, 2010
    Messages:
    479
    Likes Received:
    140
    Now we can have a console war over who has more dancers per second :lol:

    But in all seriousness I cannot wait to see what developers decide to run on the GPU over the next few years.
     
  8. function

    function None functional
    Legend Veteran

    Joined:
    Mar 27, 2003
    Messages:
    5,232
    Likes Received:
    2,499
    Location:
    Wrong thread
  9. onQ

    onQ
    Veteran

    Joined:
    Mar 4, 2010
    Messages:
    1,540
    Likes Received:
    55
  10. function

    function None functional
    Legend Veteran

    Joined:
    Mar 27, 2003
    Messages:
    5,232
    Likes Received:
    2,499
    Location:
    Wrong thread
    The compute performance gap certainly makes it look like Xbox main memory BW could be the culprit. Perhaps it's not an optimal case for the esram, or perhaps - being a practical implementation for use in games where buffers will likely always be filling the esram - it doesn't use it because it doesn't plan on using it ...
     
  11. function

    function None functional
    Legend Veteran

    Joined:
    Mar 27, 2003
    Messages:
    5,232
    Likes Received:
    2,499
    Location:
    Wrong thread
  12. LightHeaven

    Regular

    Joined:
    Jul 29, 2005
    Messages:
    538
    Likes Received:
    19
    I though about that too. I like to think that's something Ms took into account on their design, with the DMEs all that jazz to move data around, but I don't think anyone is using the esram with queue buffers, or something like that, removing data as the gpu finishes with them and moving data in before the gpu needs to use it, so I dunno if that design is even possible XD
     
  13. Shortbread

    Shortbread Island Hopper
    Veteran

    Joined:
    Jul 1, 2013
    Messages:
    4,090
    Likes Received:
    2,313
    The PS3 SPUs are quite a monster... nice.

    Anyhow, the XB1/PS4 CPUs are quite matched... however, the PS4 GPU performance is almost twice the performance of XB1 GPU. Go Cerny...
     
  14. MetalSpirit

    Newcomer

    Joined:
    Jan 29, 2014
    Messages:
    48
    Likes Received:
    1
    I highly doubt that! Why use DDR3 and not Esram?
     
  15. onQ

    onQ
    Veteran

    Joined:
    Mar 4, 2010
    Messages:
    1,540
    Likes Received:
    55
    Yes it can do asynchronous compute too but PS4 is more optimized for it with 8 ACEs vs 2 ACEs.
     
  16. MetalSpirit

    Newcomer

    Joined:
    Jan 29, 2014
    Messages:
    48
    Likes Received:
    1
    Besides, 4 CU have 2 ALUs.
     
  17. function

    function None functional
    Legend Veteran

    Joined:
    Mar 27, 2003
    Messages:
    5,232
    Likes Received:
    2,499
    Location:
    Wrong thread
    Yes, and the PS4 doesn't have double the compute resources. It does have more than double the main memory BW though.

    You can see on slide 55 the steps they took to reduce BW / vertex&normal, and on slide 58 that - predictably - you start with a copy from and end with a copy to external memory.

    Even with iteration you can be limited by how fast you can get the data in and out of the processors cache - or in this case the GPUs local data stores. PS4 achieved much closer to its peak performance. Xbox One is bottlenecked. Most likely cause of such a big difference is the huge difference in main memory BW, imo.
     
  18. function

    function None functional
    Legend Veteran

    Joined:
    Mar 27, 2003
    Messages:
    5,232
    Likes Received:
    2,499
    Location:
    Wrong thread
    Does asynchronous compute work on top of synchronous compute? I was under the impression that it for work alongside the traditional rendering pipeline.

    In other words, how are you proposing that asynchronous compute is adding 100% performance on top of compute shaders? Additionally, where are you seeing that they're actually using asynchronous compute in this benchmark? I can't see a single reference to it.
     
  19. function

    function None functional
    Legend Veteran

    Joined:
    Mar 27, 2003
    Messages:
    5,232
    Likes Received:
    2,499
    Location:
    Wrong thread
    Because if you're bandwidth limited on data that is largely a single read and a single write (or a copy in and a copy out) you're going to be limited by how fast you can DMA into and out of esram anyway (i.e. by the speed of the DDR3).
     
  20. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    42,941
    Likes Received:
    15,028
    Location:
    Under my bridge
    Cell == 230 GFlops. Liverpool GPU == 1840 Gflops. Liverpool GPU == Cell * 8. Number of dancers = 16x Cell. So Cell ends up being less efficient than the GPU in this case. I guess that shows what compute is capable of these days!

    That's just granularity. PS4 can extract more unused performance when things are busy. It shouldn't be generating a higher utilisation in a benchmark test where the GPU is focussed on the one task.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...