GDC paper on compute-based Cloth Physics including CPU performance

Not sure if this has been posted anywhere. Surprised how much better the PS4 is at compute.

http://gdcvault.com/play/1020939/Efficient-Usage-of-Compute-Shaders

iorY6axgl39jO.png


i4AEaPnDz8nNZ.png
 
Perhaps more interesting is how much better the XB1 CPU is! Also the poor PS360 results, although PS3 probably isn't too bad at >50% efficiency.
 
Perhaps more interesting is how much better the XB1 CPU is! Also the poor PS360 results, although PS3 probably isn't too bad at >50% efficiency.

I noticed that too! Wondering if it has something to do with API compatibility, because the CPU should not be ober 9,5% faster!
Regardless of facts, the Xbox Cpu is 15% faster at this test.
Also relevant is the final slide, where we can see that overall, the PS4 GPU can be almost 100% faster than the one in Xbox One.
 
Also relevant is the final slide, where we can see that overall, the PS4 GPU can be almost 100% faster than the one in Xbox One.

They are not running the exact same code. On xbone they use Dx11 code, and on Ps4 they ported it to PSSL. They do state that the Ps4 might lead to better performance, though you have to manage it by yourself.

They also say the shader is bandwidth bound, but I couldn't find if their implementation uses esram on the bone or just ddr3... But it might explain the huge disparity in performance between them.
 
Now we can have a console war over who has more dancers per second :LOL:

But in all seriousness I cannot wait to see what developers decide to run on the GPU over the next few years.
 
They are not running the exact same code. On xbone they use Dx11 code, and on Ps4 they ported it to PSSL. They do state that the Ps4 might lead to better performance, though you have to manage it by yourself.

They also say the shader is bandwidth bound, but I couldn't find if their implementation uses esram on the bone or just ddr3... But it might explain the huge disparity in performance between them.

The compute performance gap certainly makes it look like Xbox main memory BW could be the culprit. Perhaps it's not an optimal case for the esram, or perhaps - being a practical implementation for use in games where buffers will likely always be filling the esram - it doesn't use it because it doesn't plan on using it ...
 
The compute performance gap certainly makes it look like Xbox main memory BW could be the culprit. Perhaps it's not an optimal case for the esram, or perhaps - being a practical implementation for use in games where buffers will likely always be filling the esram - it doesn't use it because it doesn't plan on using it ...

I though about that too. I like to think that's something Ms took into account on their design, with the DMEs all that jazz to move data around, but I don't think anyone is using the esram with queue buffers, or something like that, removing data as the gpu finishes with them and moving data in before the gpu needs to use it, so I dunno if that design is even possible XD
 
The compute performance gap certainly makes it look like Xbox main memory BW could be the culprit. Perhaps it's not an optimal case for the esram, or perhaps - being a practical implementation for use in games where buffers will likely always be filling the esram - it doesn't use it because it doesn't plan on using it ...

I highly doubt that! Why use DDR3 and not Esram?
 
Probably not the right answer though, as XB1 can do asynchronous compute too, and as this is a benchmark not likely to be hugely stressing the traditional rendering pipeline at the same time.

Yes it can do asynchronous compute too but PS4 is more optimized for it with 8 ACEs vs 2 ACEs.
 
I don´t see it that way!
If you check the slides you see that the calculation method is an iteration. That means that compute diferences will acumulate at each cicle of the iteration. In the end, the result is almost double performance for the PS4.

Yes, and the PS4 doesn't have double the compute resources. It does have more than double the main memory BW though.

You can see on slide 55 the steps they took to reduce BW / vertex&normal, and on slide 58 that - predictably - you start with a copy from and end with a copy to external memory.

Even with iteration you can be limited by how fast you can get the data in and out of the processors cache - or in this case the GPUs local data stores. PS4 achieved much closer to its peak performance. Xbox One is bottlenecked. Most likely cause of such a big difference is the huge difference in main memory BW, imo.
 
Yes it can do asynchronous compute too but PS4 is more optimized for it with 8 ACEs vs 2 ACEs.

Does asynchronous compute work on top of synchronous compute? I was under the impression that it for work alongside the traditional rendering pipeline.

In other words, how are you proposing that asynchronous compute is adding 100% performance on top of compute shaders? Additionally, where are you seeing that they're actually using asynchronous compute in this benchmark? I can't see a single reference to it.
 
I highly doubt that! Why use DDR3 and not Esram?

Because if you're bandwidth limited on data that is largely a single read and a single write (or a copy in and a copy out) you're going to be limited by how fast you can DMA into and out of esram anyway (i.e. by the speed of the DDR3).
 
The PS3 SPUs are quite a monster... nice.
Cell == 230 GFlops. Liverpool GPU == 1840 Gflops. Liverpool GPU == Cell * 8. Number of dancers = 16x Cell. So Cell ends up being less efficient than the GPU in this case. I guess that shows what compute is capable of these days!

Yes it can do asynchronous compute too but PS4 is more optimized for it with 8 ACEs vs 2 ACEs.
That's just granularity. PS4 can extract more unused performance when things are busy. It shouldn't be generating a higher utilisation in a benchmark test where the GPU is focussed on the one task.
 
Back
Top