Bagel seed
Veteran
Is this asynchronous compute or regular compute? Or is the former implied?
Cell == 230 GFlops. Liverpool GPU == 1840 Gflops. Liverpool GPU == Cell * 8. Number of dancers = 16x Cell.
Is this asynchronous compute or regular compute? Or is the former implied?
Does asynchronous compute work on top of synchronous compute? I was under the impression that it for work alongside the traditional rendering pipeline.
In other words, how are you proposing that asynchronous compute is adding 100% performance on top of compute shaders? Additionally, where are you seeing that they're actually using asynchronous compute in this benchmark? I can't see a single reference to it.
Is this asynchronous compute or regular compute? Or is the former implied?
Cell == 230 GFlops. Liverpool GPU == 1840 Gflops. Liverpool GPU == Cell * 8. Number of dancers = 16x Cell. So Cell ends up being less efficient than the GPU in this case. I guess that shows what compute is capable of these days!
That's just granularity. PS4 can extract more unused performance when things are busy. It shouldn't be generating a higher utilisation in a benchmark test where the GPU is focussed on the one task.
It isn't mentioned in the slides.
Take a look at some GCN benchmarks, paying particular attention to 290X with it's 8 ACEs and 5.6 Gflops vs the 7970 / 280X with its 2 ACEs and ~3.9 Gflops.
http://www.tomshardware.com/reviews/radeon-r9-290x-hawaii-review,3650-34.html
http://www.anandtech.com/show/7457/the-radeon-r9-290x-review/18
Not even close to a 100% performance increase.
The Xbox One is bottlenecked in this case, badly, and it has to be by main memory bandwidth.
Edit: this might actually be a good fit for asynchronous compute on the Xbox One!
That first slide looks like it might be oversimplifying things. With compute, the general CPU power can be brought up to the same levels as the GPU? How accurate is this?
Yes it doesn't seem like any of this is relevant to asynchronous compute. Same with those benchmarks too though, so the efficacy of the added ACE won't be shown there. The added ACE will shine when compute is jammed in together with traditional GPU workloads.
One of the slides mentioned creating a huge compute shader in order to avoid too many CPU dispatch requests. I remember reading somewhere that that is a big no no, in that scheduling long running compute processes causes headaches for scheduling the rest of the normal tasks.
Isn't the dancers being rendered by the traditional GPU pipeline?
You lost me on the edit part. I take it that you're asserting that the GPU would be busy with doing something on ESRAM and when it has a breather moment while waiting for whatever it is waiting for, it will Async GPGPU off DRAM? And so it's a good fit because there is no reason to DMA between the two?
Yea I wonder if it'll bung up your budget. They made it this way to max the number of cloth physics dancers that could actually be going concurrently, but I don't know if this would be ideal with a real game going on. I guess it [having a very long shader] would be terrible as 'async' for sure, but could be ok if the game properly budgets for synchronous version of it.
Chance of this being used in AC:Unity? I wonder if compute shaders to do things like this is eating up GPU time. If so, I like what the future holds
I think I know why there is a big difference between the number of Xbox One & PS4 GPU dancers. Maybe it's because about 600 Gflops is being used for the rendering leaving the Xbox One with about 700 Gflops for compute & the PS4 GPU with about 1.2 Tflops for compute.
It seems like a safe bet, yeah. Just have a look at the main character's cape, it flows more realistically this time. And not only on him but the people in the crowds too. I've no doubt the reports of the horrible frame rates in recent demos are due this and not their super advanced AI
I think I know why there is a big difference between the number of Xbox One & PS4 GPU dancers. Maybe it's because about 600 Gflops is being used for the rendering leaving the Xbox One with about 700 Gflops for compute & the PS4 GPU with about 1.2 Tflops for compute.
Because if you're bandwidth limited on data that is largely a single read and a single write (or a copy in and a copy out) you're going to be limited by how fast you can DMA into and out of esram anyway (i.e. by the speed of the DDR3).
Cell == 230 GFlops. Liverpool GPU == 1840 Gflops. Liverpool GPU == Cell * 8. Number of dancers = 16x Cell. So Cell ends up being less efficient than the GPU in this case. I guess that shows what compute is capable of these days!
That's just granularity. PS4 can extract more unused performance when things are busy. It shouldn't be generating a higher utilisation in a benchmark test where the GPU is focussed on the one task.
I think people take these things far to literal.
We don't have all information here.
This is a paper to show what you can do with the GPU's in these consoles and how they did it.
We don't know how much time they spent optimizing any of these systems.
And the end result was also clear -> gpu wins.
They also only used 5 SPU's for their ps3 code, what equals 128 GFLOPS.
They also only used 5 SPU's for their ps3 code, what equals 128 GFLOPS.
Xbox CPU is ~9% faster, but also has ~15% lower latency access to main memory. If you're hitting main memory a lot that probably makes a difference too.