Does PS4 have excess graphics power intended for compute? spawn

sebbbi · Apr 22, 2014

chris1515 said:
Maybe Sucker Punch, they use GPGPU for Particles.

Yes, they do particle animation/physics using compute shaders but their presentation didn't say anything about particle blitting/gathering using compute shaders. They might use compute shader based particle blitting/gathering, or just pure ROPs (because PS4 has so many). However pure ROPs hit a BW ceiling quite easily, so they need to do something clever (for example split the screen to tiles that fit into ROP cache, and do the blending completely inside the ROP caches). There are many ways to solve the same problem.

Globalisateur · Apr 22, 2014

Sucker Punch use Compute for many things apparently, the list and some details from the GDC postmortem:

- Post effects "PS often slighlty faster than compute code, Raster tiling matches complex image format, ROP catching and better pipeline"
- Tiled deferred+ "16x16 tiled deferred lighting on steroids ...2-10ms per frame, cull lights per tile with spheres and 4 planes"
- Facial animation "streamed vertices...skinning...wrinkles"
- Ambient SH "Calculate Tetrahedron per vert, calculate per draw SH sample"
- Particles "generate code on compute queues"

Shifty Geezer · Apr 22, 2014

All of which are graphics related...

sebbbi · Apr 22, 2014

Globalisateur said:
- Post effects "PS often slighlty faster than compute code, Raster tiling matches complex image format, ROP catching and better pipeline"

This is absolutely true for direct PS->CS ports. However many post effects (such as blur kernels and lighting) have huge benefits when you rewrite them to use LDS (local data storage) properly. Compute based post processing also allows you to do post processing in-place (saving memory and improving cache efficiency). You can also run compute shaders asynchronously (simultaneously) on top of rendering, achieving better GPU utilization.

DavidGraham · Apr 22, 2014

sebbbi said:
Yes, they do particle animation/physics using compute shaders

I think we can put this matter to rest for good by simply asking sebbi right here, Does Trials Fusion use the whole set of 18 CUs of the PS4 or not? I know this is kind of silly, but should be decisive.

zupallinere · Apr 22, 2014

Shifty Geezer said:
It doesn't really matter what it means because it's untrue. Compute is fully programmable meaning devs can use it for anything they want. If they want to use it on new, never-before-thought-of rendering techniques, they can, and suddenly all that ALU is being used for graphics. It's daft to think of flexible computing resources as destined for a particular job as I mention above. Devs can and will commandeer the resources for their own ends, whether that's processing graphics on the CPU, or gameplay on the GPU, or vice versa.

Agreed and I look forward all of the CUs being used for whatever they can be used for. I was mostly just trying to make my context point for better or worse. Might have been nice if Cerny had said something like "fixed-function" as opposed to just "graphics" but THEN I shudder to think what would be made of THAT term

Shifty Geezer · Apr 22, 2014

zupallinere said:
Agreed and I look forward all of the CUs being used for whatever they can be used for. I was mostly just trying to make my context point for better or worse. Might have been nice if Cerny had said something like "fixed-function" as opposed to just "graphics" but THEN I shudder to think what would be made of THAT term

He didn't want to give away the embedded RayTracing hardware just yet, so avoided talking about fixed-function capabilities.

iroboto · Apr 22, 2014

zupallinere said:
Agreed and I look forward all of the CUs being used for whatever they can be used for. I was mostly just trying to make my context point for better or worse. Might have been nice if Cerny had said something like "fixed-function" as opposed to just "graphics" but THEN I shudder to think what would be made of THAT term

Kind of a newbie question here; How does one exactly determine the ALU load of their GPU? Does the debug/sdk unit hook up to a monitor and output information on the side while running your code?

From how many of you are speaking, I only recall in university looking at digital graphs to see my code running at the low level at which many of you are referring to.

I can't imagine digital graphs being an effective method to optimizing code, with the exception of debugging race or timing crashes. Generally speaking, I don't see how one could accurately determine how hard much they can push and where without seeing the graphs, and you'd have to look at the graphs for the whole system, not just one area right?

Is there something I'm missing, or is the determination of how hard the GPU is being pushed being leveraged by a stopwatch on the CPU side of things? If that is the case, I'm baffled on how one could determine how many ALUs are being leveraged unless you are setting a large number of theads and blocks in which you know the total number is probably close the total number of ALUS that are available.

MJP · Apr 22, 2014

iroboto said:
Kind of a newbie question here; How does one exactly determine the ALU load of their GPU? Does the debug/sdk unit hook up to a monitor and output information on the side while running your code?

GPU's and CPU's generally have performance counters that can be read while your code is running. So you can then build special tools that "capture" those counters over a period of time (for instance, a single frame), save it a to file, and then display the counter values in a way that's useful. This can let you inspect utilization of different hardware units at various points in a frame, which can help you identify bottlenecks.

iroboto · Apr 22, 2014

MJP said:
GPU's and CPU's generally have performance counters that can be read while your code is running. So you can then build special tools that "capture" those counters over a period of time (for instance, a single frame), save it a to file, and then display the counter values in a way that's useful. This can let you inspect utilization of different hardware units at various points in a frame, which can help you identify bottlenecks.

Thanks MJP!

Cyan · Apr 24, 2014

sebbbi said:
A new PC GPU needs to run the current games as fast as possible, because all the reviewers will use the currently available games as benchmarks. No PC GPU sells because it might be good fit for future. It will not sell if it gets trounced in current game benchmarks by the competition. When 290X was released, high end PC gaming was mostly about playing last generation (Xbox 360 and PS3) ports with high frame rate (60 fps+) and high resolution 1080p / 1440p / 1600p with some extra PC specific effects. Last generation games have simple shaders, and simple shaders benefit from massive amount of ROPs (because simple shaders are not ALU bound). The extra PC specific effects are usually post processing. These extra effects don't gain performance from extra ROPs, but that doesn't matter much since majority of the frame gets faster.

Tessellation doesn't use ROPs, it mainly stresses the fixed function primitive units and vertex/domain/hull shaders (= ALU and BW). GCN 1.1 does 2 primitives per clock (290X does four). That's the main limitation of tessellation. NVIDIA cards are better in tessellation than AMD cards, because NVIDIA cards can push more primitives per clock (because of distributed geometry engines).

Tessellating to tiny small triangles is not smart, because it decreases the pixel shader quad efficiency, and that means that the shader needs to be running more times than necessarily. However this increase of pixel shader cost increases all the pixel based costs equally (not just the ROP cost), so the shader doesn't get any more ROP bound.

ROPs are also good for solving many things easier than compute shader based solutions. I haven't yet seen any studio using compute based particle gathering (single pass, super BW effective) instead of filling hundreds of alpha planes on top of each other with ROPs. But things will change in the future. We will see solutions like this for problems that are currently brute forced with ROPs.

:smile2: Do you blame the fact that certain 550$ PCs are running some games better in some instances than consoles on consoles having to use those legacy and obsolete techniques?

Cyan · Apr 27, 2014

The Division isn't going to be released on PS3 and X360 and Ubisoft said they were only able to achieve many of the features they have in The Division thanks to the power of the new generation of consoles, and I wonder if they are using those technologies sebbbi is talking about leaving previous effects behind.

image_image_tom_clancy_s_the_division-22300-2751_0006.jpg

Airon · Apr 28, 2014

The truth about the system balance is that it is actually a truth, at least according to some of my tech savvy folks (and many other sources that mentioned it in many cicrumstances).

The point is that "under future scenaries" PS4 will not gain any significant advantage by using 100% of its CUs for graphic tasks. This is due to some other bottlenecks present within the hardware of the machine, mainly the CPU power.

Said otherwise, when drivers, development tools etc.. wil become more mature and the development will be more efficient and tailored around the strenght and the weakness of the system, much probably the PS4 CUs usage for graphic tasks will be around 14CUs, or around 75% of the overal CUs available for the system.
Using an higher % of CUs for graphic tasks, will not add any significant magic & advantage to the graphic department due to other bottlenecks presents within the system.
So, the remaining CUs will be used for other tasks, mainly helping CPU in some of its core business like: sound, AI, physics etc...
How well and how efficently (mainly for sound department) this will be achieved, is something that only time will tell us.

The other side of this situation is also the reason why all the early multiplatform games perform better on PS4 than on the X1. Early titles run on an unoptimized code and last-gen derivative code and tools.
If we throw that kind of code inside the 2 different (but similar) machines, PS4 will perform better as it has (among other things) more row graphic power "to waste".
But this happen also because the graphic burden of many multiplatform games during the first year and half is hugely within "the range" of capabilities of the machine, i.e. within and before the bottlenecks of the system.

So, it is true that PS4 system is balanced "around" 14CUs for graphic tasks (or 13 or 15CU I do not know).
But this clearly doesn't mean that a developer cannot use all 18CUs PS4 for graphic.
It is simply not an efficient way to do things because they could achieve the same graphic performances by using "only" 14CUs (or around there) and they could use the remaining CUs for other, more usefull, tasks.

Xbat · Apr 28, 2014

would it not be better to say that its balanced to use 20% of the gpu for compute and not 4 cu's

Deleted member 11852 · Apr 28, 2014

Xbat said:
would it not be better to say that its balanced to use 20% of the gpu for compute and not 4 cu's

That's 3.6 Compute Units

Although in practice a single CU will be running multiple compute jobs each frame, so that could work. Just looks weird.

pjbliverpool · Apr 28, 2014

Airon said:
Using an higher % of CUs for graphic tasks, will not add any significant magic & advantage to the graphic department due to other bottlenecks presents within the system.
So, the remaining CUs will be used for other tasks, mainly helping CPU in some of its core business like: sound, AI, physics etc...

Even if we accept this assumed bottleneck of yours, how do you conclude that the CU time dedicated to compute tasks rather than graphics shaders cannot be used for graphics related compute jobs?

A typical future game may in fact utilise the equivalent of 4 CU's to for example process the games lighting engine via compute shaders.

By your logic this would bypass the "graphics bottlenecks" of the system but still be applicable to the quality of the graphics.

And in this instance, if another console with only 12 CU's were to reproduce that same lighting engine then it would also need to dedicate 4 CU's worth of compute time to it resulting in only 8 CU's left for "graphics" compared to the PS4's remaining 14.

Shifty Geezer · Apr 28, 2014

Airon said:
The truth about the system balance is that it is actually a truth, at least according to some of my tech savvy folks (and many other sources that mentioned it in many cicrumstances).

The point is that "under future scenaries" PS4 will not gain any significant advantage by using 100% of its CUs for graphic tasks.

Hogswash. Compute is 100% versatile. It can be used for anything you want. The closest justification for the 'balance' has been raised in this thread as more accurately talking about balance with fixed-function units, not graphics work. So there could be cases where the fixed function units are saturated with 14 CUs using them in the traditional way and 4 CUs being available for other work. That's for smarter men than me to determine. But those 4 CUs, if ever that be the case, can still work on graphics because there's a multitude of different ways to approach any computing problem. I believe we have an example in this thread of rolling complex post-processing into a compute kernel instead of multiple shaders.

Ultimate point being, programmable computer power can be used in any way the software wants. CPUs can do graphics. GPUs can do non-graphics. There's no way to cap developers to certain uses on certain resources, short of providing a fixed engine and forcing its use.

Edit: I can only conclude that you didn't read the rest of this thread before posting where graphical uses of compute without relying on other components has been outlined.

Shortbread · Apr 28, 2014

pjbliverpool said:
Even if we accept this assumed bottleneck of yours, how do you conclude that the CU time dedicated to compute tasks rather than graphics shaders cannot be used for graphics related compute jobs?

A typical future game may in fact utilise the equivalent of 4 CU's to for example process the games lighting engine via compute shaders.

By your logic this would bypass the "graphics bottlenecks" of the system but still be applicable to the quality of the graphics.

And in this instance, if another console with only 12 CU's were to reproduce that same lighting engine then it would also need to dedicate 4 CU's worth of compute time to it resulting in only 8 CU's left for "graphics" compared to the PS4's remaining 14.

What makes this thread somewhat odd, is that future gaming systems will more than likely have 6-7 times the compute and rendering power than the current next generation systems. Yet, here we are discussing the PS4 GPU having to many CUs, and 4 of them being strictly dedicated for compute, so not to bottleneck the system. I must remember to revisit this thread within 8-10 years, and see what hypocrisy bubbles to the surface...

Anyhow, I would think using the 14+4 scenario would introduce more headaches for developers ...such as timing issues, added latency and so fourth. Versus allowing the GPU to do what it does best ...process all the needed date (shaders, triangles, compute, lighting, etc...) within its parallel architecture.

Laa-Yosh · Apr 28, 2014

Shifty Geezer said:
Compute is 100% versatile. It can be used for anything you want.

I don't think that's correct either - the type of non-rendering related workloads capable of running efficiently on the GPU are still limited. Restricted memory access, massively parallel execution with little interdependency and so on...

For example in Ryse, the facial animation system is primarily based on bones as that's very efficient on the GPU. The same rigs apparently have completely blendshape based counterparts because those run much faster in the Maya viewport for animators compared to the hundreds of bones. But blendshapes on a GPU are way too inefficient, and even Compute cannot help that.

I think the right way to look at this is that if you can move some code to Compute, it'll automatically benefit from the higher capacity and you can do more of whatever it is. IT could probably also be scaled up even higher (as long as bandwidth and memory are not becoming bottlenecks, I guess).
Whether or not it makes any sense to have 2x or 5x or maybe even 10x more of that stuff is again a completely different question.

Shifty Geezer · Apr 28, 2014

I wasn't suggesting it's perfectly suited to all workloads. There'll be some jobs you won't move onto compute because it's not a good fit, just as there's jobs you won't run on CPU because it's not great at that work whereas the GPU is. You use the most efficient component for the job. Importantly, pretty much all graphics related tasks are good fits for the GPU, meaning compute-based graphics work is a big future. Ergo, all the CUs in PS4 can be used for graphics workloads, whether executed with compute shaders or traditional pixel and vertex shaders.

At no point will devs get so far in their graphics and then say, "right, that's it. We've hit the wall. Doesn't look as nice as we'd like with the post effects but that 20% of the GPU left doing nothing has to be used on audio or rigidbody particles physics."

Does PS4 have excess graphics power intended for compute? spawn

sebbbi

Globalisateur

Globby

Shifty Geezer

uber-Troll!

sebbbi

DavidGraham

zupallinere

Shifty Geezer

uber-Troll!

iroboto

Daft Funk

MJP

iroboto

Daft Funk

Cyan

orange

Cyan

orange

Airon

Xbat

Deleted member 11852

Guest

pjbliverpool

B3D Scallywag

Shifty Geezer

uber-Troll!

Shortbread

Island Hopper

Laa-Yosh

I can has custom title?

Shifty Geezer

uber-Troll!

Similar threads

Does PS4 have excess graphics power intended for compute? *spawn*

Globby

uber-Troll!

uber-Troll!

Daft Funk

Daft Funk

orange

orange

Deleted member 11852

Guest

B3D Scallywag

uber-Troll!

Island Hopper

I can has custom title?

uber-Troll!

Similar threads

Does PS4 have excess graphics power intended for compute? spawn