Does PS4 have excess graphics power intended for compute? *spawn*

I already gave you an example that's not a good fit for neither the GPU nor compute, and I'm sure there are many other techniques. Obviously though, good developers will find the alternatives most fitting for the resources, or balance hardware utilization in a way that allows them to free up the CPU for such tasks. Or (like in Halo 4 using PCA compression) find a way to cache their calculations in advance instead.

Point is, even compute isn't 100% versatile and it can't be used for truly anything, there still are some significant limitations.
 
Shifty's point is that within those limitations, they can be used for whatever the developer chooses. Following from the ridiculous theory by one of the usual suspects that somehow parts of the GPU will be sitting idle as they are 'in excess'. Or that using 18 CUs for graphics would not give any performance gains over 14. Which in itself is a completely nonsensical statement..
 
In that context it's okay then. I just don't want to see compute becoming a silver bullet of sorts ;)
 
To be nitpicky, I wouldn't mind if it did become a silver bullet, as I like things that magically solve all our problems.

That it isn't a silver bullet is a different matter. ;)
 
Is the CPU a bottleneck? Would 6 jaguar cores + 100 CUs be a graphics/compute powerhouse, or would it be severely CPU bound in most cases? I know the answer to that, but just trying to figure out what people are arguing in this thread. I know it varies per workload, which would ideally be tuned properly for each platform/game, but perhaps the underlying thought here is...at what point do you become CPU limited on PS4, and how can you balance compute on the CUs to alleviate that, if at all, in conjunction with graphical tasks?
 
It depends entirely on what the developer tries to do. Many of the techniques used in offline CG, particularly those related to complex characters (detailed clothing, hair, more realistic deformations etc) and fluids (fire, water, smoke etc), are not exactly a good fit for the current GPU architecture.

So there are basically two main options to explore in advancing the contents of the game world, now that the actual rendering is getting quite a bit more straightforward. Also, as the imagery gets closer and closer to real, the various deficiencies in a dynamic world will become more and more obvious and disturbing.

One approach is to invent new, more GPU friendly solutions for the above mentioned complex and calculation intensive features. This is something the offline CG software developers aren't as interested in, because they're not limited to realtime performance budgets and the greater flexibility of CPUs allows for more varied approaches. Game devs on the other hand would be happy with more limited possibilities, as long as they're able to run fast enough on the GPU.

The other option is to move more and more conventional tasks to the GPU to free up as many CPU resources as possible, and dedicate them to these advanced features. This would offer less performance, but more flexibility compared to a GPU-based solution.

So in the end it's a question of focus - more characters, or more complex ones, or a more dynamic game world, or a larger game world... and so on. The new systems should allow for some really different implementations.
 
It depends entirely on what the developer tries to do. Many of the techniques used in offline CG, particularly those related to complex characters (detailed clothing, hair, more realistic deformations etc) and fluids (fire, water, smoke etc), are not exactly a good fit for the current GPU architecture.
Some visual effects houses like ILM use GPUs for fluid calculations. Why don't you think these things are a good fit for GPUs?
 
Fluids on their own, maybe - but you usually want fluids to interact with each other and with objects in the scene and so on. Most VFX hosues I know of are still not using GPUs to run their fluid sims.

There's a lot of R&D on utilizing GPUs but it's still more of a curiosity. Although one of the reasons is that these studios already have a large render farm of CPUs and it is much more economic to invest in only one type so that it can be utilized by any type of task necessary.
 
Fluids on their own, maybe - but you usually want fluids to interact with each other and with objects in the scene and so on. Most VFX hosues I know of are still not using GPUs to run their fluid sims.

There's a lot of R&D on utilizing GPUs but it's still more of a curiosity. Although one of the reasons is that these studios already have a large render farm of CPUs and it is much more economic to invest in only one type so that it can be utilized by any type of task necessary.

I am sure most of the CPUs are of the X86 variety or maybe something Power PC ish but will there be a market for purposely built ARM based systems ? Betcha nVidia is thinking along those lines :)
 
Silicon Graphics and their MIPS processors have already died out, so did the DEC Alphas, VFX work has moved to x86 PCs more than a decade ago. Trying to split the market again doesn't sound like a feasible idea...

Just think about it, this would only make sense to a studio if every piece of software that requires some performance is converted to run on the new architecture. That includes some of their own code, and code from up to maybe dozens of other developers.
Even here in a not too big studio we're running Arnold, Maya, Houdini, Nuke, Syflex, Yeti and who knows what else on the farm. Everything would have to be ported in order for the new hardware purchase to make any economic sense. Unlikely to happen.
 
Geez, not this 14+4 stuff again

To recap what I said numerous times last year (which was explained to me by someone with first hand knowledge of the hardware and the specs):

The 14+4 thing stems from a simple suggestion Sony made to devs at one of its devcons on how they might like to split their rendering and compute workloads.

As Cerny alluded to in the DF interview, the PS4 as a design is a bit ALU heavy.
So there's a point where you get diminishing returns from using additional CUs for rendering and get more bang for your buck using them for things like compute.

It does not mean you will get no benefit for using the extra 4 for graphics though, just diminishing returns, maybe like the 24% improvement in FPS from using 50% more ALUs DF saw in their own tests.

Why this is so, I don't know, but my source was guessing it could be related to cache sizes, register file sizes as well as things like bandwidth, ROPs etc.

The basis for this 14+4 suggestion, very likely comes from AMD, who would have done some profiling using modern AAA engines (eg Cryengine 3) and saw diminishing returns from using the additional CUs for rendering and get more value using them for compute.

But, I guess if your workloads/rendering techniques/engine significantly differ from what's done on current AAA PC games (eg like Sebbi's example of compute based particles) then the diminishing returns caveat for using additional CUs will not apply.
 
Last edited by a moderator:
Geez, not this 14+4 stuff again

To recap what I said numerous times last year (which was explained to me by someone with first hand knowledge of the hardware and the specs):

The 14+4 thing stems from a simple suggestion Sony made to devs at one of its devcons on how they might like to split their rendering and compute workloads.

As Cerny alluded to in the DF interview, the PS4 as a design is a bit ALU heavy.
So there's a point where you get diminishing returns from using additional CUs for rendering and get more bang for your buck using them for things like compute.

It does not mean you will get no benefit for using the extra 4 for graphics though, just diminishing returns, maybe like the 24% improvement in FPS from using 50% more ALUs DF saw in their own tests.

Why this is so, I don't know, but my source was guessing it could be related to cache sizes, register file sizes as well as things like bandwidth, ROPs etc.

The basis for this 14+4 suggestion, would very likely come from AMD, who would have done some profiling using modern AAA engines (eg Cryengine 3) and saw diminishing returns from using the additional CUs for rendering and get more value using them for compute.

So, I guess if your workloads/rendering techniques/engine significantly differ from what's done on current AAA PC games (eg like Sebbi's example of compute based particles) then the diminishing returns caveat for using additional CUs will not apply.

This is what I said at the first post, so what's wrong with this topic (which I didn't created) in your opinion. from the beginning I wasn't talking about different CUs or hardware separated CUs (in 14+4 set-up).
 
This is what I said at the first post, so what's wrong with this topic (which I didn't created) in your opinion. from the beginning I wasn't talking about different CUs or hardware separated CUs (in 14+4 set-up).

Maybe because you chimed in to the 14+4 after the initial "14+4 confirmed by Japanese third party".
The fact that you appear to argue that there's something material in the 14+4 beyond the fact that it's just a recommendation/example of how devs may balance their workload also didn't help.

Sony said that all of 18CU could be used for graphics rendering but this won't change what they said before that. If a developer wants to use all of 18CUs they would have minor boost from that 4CUs "extra" ALUs as this Japanese dev said. This could be a some kind of marketing strategy that Sony have been used to date for free. If they didn't said sth like that to developers why should we hear about that from a developer at this point?

Even Cerny said that but with downplaying the amount of those ALUs.

http://www.eurogamer.net/articles/digitalfoundry-face-to-face-with-mark-cerny

And if you take a look at the slide which I linked above you will see a "formal evangelization" from Sony.

Anyway, enough of digging up history...

We should all be clear by now that the correct balance will certainly differ from project to project so picking anything wacky like 6+12, 8+10 10+8, 12+6, 14+14, 16+2, 17+1 are all good balances if that's exactly what the project requires.
 
Why this is so, I don't know, but my source was guessing it could be related to cache sizes, register file sizes as well as things like bandwidth, ROPs etc.

Aren't cache sizes and register file sizes (presumably we're talking about memory that isn't per CU here) the same on both the 18 CU PS4 and the 44 CU 290X?

And the 290x only has 84% more bandwidth and 150% more fill rate than the PS4 but 206% more CU throughput.

So if bandwidth or ROPs were the bottleneck how would the 290x make any sense for modern PC workloads? You could argue that it's designed to be extremely future looking with a view to assigning a massive percentage of resources to compute (moreso than the next gen consoles) but that wouldn't make much sense from a design standpoint.

I'm not saying that what you're saying can't be true, but if it is, then it raises some questions about the design decisions of Hawaii and in fact - almost every other GCN GPU out there since they're all generally more CU heavy compared with other resources than the PS4 is. In fact far from being CU heavy as is the claim of this thread, the PS4 is one of the lightest CU GCN designs. Is it really reasonable to expect that Pitcairn launched over 2 years ago was more compute focused than the PS4?
 
We should all be clear by now that the correct balance will certainly differ from project to project so picking anything wacky like 6+12, 8+10 10+8, 12+6, 14+14, 16+2, 17+1 are all good balances if that's exactly what the project requires.

I want all my games to use the 14+14 balance :D
 
Those tests were rather flawed though, by comparing Pitcairn to Tahiti they ignored the 2x ROP advantage the PS4 possesses over the XBox One.

They might not be valid for comparing XB1 to PS4, but aren't they suitable for comparing hardware with the same ROPs but more ALUs? Which is the 14 CU vs 18 CU thing for graphics we're discussing.

Indeed, I had told Richard about the 14+4 thing about a month earlier and it was one of the things he was investigating for that article (particularly after his interview with Cerny).

Aren't cache sizes and register file sizes (presumably we're talking about memory that isn't per CU here) the same on both the 18 CU PS4 and the 44 CU 290X?

And the 290x only has 84% more bandwidth and 150% more fill rate than the PS4 but 206% more CU throughput.

Look, I don't know the reasons for it being so, that was just a guess by the person who explained to me the background of the 14+4 rumour.

But as to where the 14+4 comes from and its meaning, what I have said is definite fact.

As I understand, one of the devcon slides showed a graph like performance vs ALU utilisation (or similar) which showed there is a knee in the performance curve, and beyond that point there is a significant drop off in the value of the additional ALU resources for graphics with the PS4 falling well to the right of that knee.

You have Cerny referring to it too:

"The point is the hardware is intentionally not 100 per cent round," Cerny revealed. "It has a little bit more ALU in it than it would if you were thinking strictly about graphics. As a result of that you have an opportunity, you could say an incentivisation, to use that ALU for GPGPU."

An interpretation of Cerny's comment - and one that has been presented to us by Microsoft insiders - is that based on the way that AMD graphics tech is being utilised right now in gaming, a law of diminishing returns kicks in.

My source also guessed then that from the 12CU number that 'MS saw similar data, but because of the embedded memory and perhaps a lesser focus on compute made a different set of tradeoffs'.

And a few months later the XB1 architects basically confirmed the same thing:

If you go to VGleaks, they had some internal docs from our competition. Sony was actually agreeing with us. They said that their system was balanced for 14 CUs. They used that term: balance. Balance is so important in terms of your actual efficient design. Their additional four CUs are very beneficial for their additional GPGPU work

http://www.eurogamer.net/articles/digitalfoundry-the-complete-xbox-one-interview

And the 290x only has 84% more bandwidth and 150% more fill rate than the PS4 but 206% more CU throughput.

How much performance increase does a 290X give you over a GPU equivalent to the PS4's running PC benchmarks?
 
Last edited by a moderator:
They might not be valid for comparing XB1 to PS4, but aren't they suitable for comparing hardware with the same ROPs but more ALUs? Which is the 14 CU vs 18 CU thing for graphics we're discussing.

Perhaps, but not necessarily.
Moving from 14 CUs to 18 CUs with 32 ROPs is less likely to encounter bottlenecks than moving from 16 CUs to 24CUs with 32 ROPs.
(Clock speed changes in the article can be ignored since they are affecting CUs and ROPs equally.)
 
Back
Top