What is PS4's 14+4 CU thing all about? *spawn

Status
Not open for further replies.
Sorry, I do not understand.
This is some kind of taboo?
I believed that could be interesting to see that the 14+4 CU come directly from Sony, from one of their doc.
And Cerny, even in his full PR Mode, in a way confirmed it many time.
I thought that this thing were quite important, maybe one of the most important topic.
That is it.

Because some people take 14+4 as some sort of defined division of the hardware units.

Personally I have no problem with 14+4 but I look at it as some general example where u have 4 CUs worth of compute distributed over 18 CUs. But ultimately its going to be dev dependent not some line drawn in the sand.
 
But it still doesn't make sense to me, how can you have a situation where GPGPU is viable but graphics commands not? it can't be a CPU limited situation unless your moving something from CPU to GPGPU to free it up because GPGPU programs still require some CPU time.

There are fixed function parts of the GPU that will prevent linear scaling of triangle/raster graphics performance with CU count. By mixing in compute you can probably occupy CUs that would otherwise be idling or operating inefficiently. Cerny's comments make it pretty clear that the PS4 was meant to be used like this.

MS have fewer CUs relative to triangle setup (even fewer than the 14 Sony were at some point recommending for "graphics" - almost certainly meaning triangle/raster based graphics only and not graphics via compute). It probably won't be as hard for 1Bone to keep CU's busy, assuming developers can get the hang of the esram. But this is off topic. I only mention it because it might aid in explaining the 14 + 4 thing.

One would expect on the PS4's 18 CUs to be functionally identical.
 
I don't see anywhere in Cerny's comment a confirmation of 14+4; it could be 12+6 or 18 all used for GPU based on how I read his comment. Insisting its 14+4 as some are seems to be driven by some other motive. Cerny is clearly saying that the hardware is not 100% balanced at 18 but does not mean 14+4; it's open to the demands of the developer.
 
It's was posted above (and has been posted numerous times previously):

"Digital Foundry: Going back to GPU compute for a moment, I wouldn't call it a rumour - it was more than that. There was a recommendation - a suggestion? - for 14 cores [GPU compute units] allocated to visuals and four to GPU compute...
Mark Cerny: That comes from a leak and is not any form of formal evangelisation. The point is the hardware is intentionally not 100 per cent round. It has a little bit more ALU in it than it would if you were thinking strictly about graphics. As a result of that you have an opportunity, you could say an incentivisation, to use that ALU for GPGPU."

The takeaways:

- 14 + 4 originated with leaked Sony slides
- these were confirmed to be leaked by Cerny (see above)
- "It has a little bit more ALU in it than it would if you were thinking strictly about graphics"
- By "graphics" he's almost certainly referring to stuff that goes through the triangle setup engines
- (meaning you can still do some graphics stuff as compute, so don't panic)
- The 14 + 4 isn't a hardware split, it's just a recommendation from a particular point in time

Also, this whole issue of "balance" (as talked about by the likes of Cerny and the MS guys) has been horribly misunderstood by "the internet".
 
I wouldn't interpret that as meaning you get anything less out of the last four CUs if you were to use them for graphics. You just have all those extra ACEs there, so you're not taking advantage of them if you aren't doing some compute on the GPU.
 
We can all agree that the PS4 has 18 active CUs, all similar in design and has no sectioned off or removed CUs outside of the actual GPU logic - can we agree? Can we also agree that, 14+4 or 12+6 or whatever combination that floats your boat, is a developers choice on supporting compute needs - can we agree? If so, then this required 14+4 combination that's supposedly physically drawn in the sand is just a misunderstanding of a statement - can we agree?
 
Good god not 14+4 again, the point of the original slide was just that there is a concept of diminishing returns, and some sort of knee in the performance curve given the rest of the system.

There are lots of things that can limit performance if the system is vertex, triangle setup or draw call bound then you having GPGPU like tasks queued up will allow you to exploit the unused cycles.

You only have to look at the 64 compute queues to see that Mark/Sony bet pretty heavily on compute being a big deal going forwards, MS not so much, they seem to have concentrated on reducing latency in the system to better exploit the resources they have.

FWIW As far as I am aware there is no exposed way in the system to actually partition the CU's there are 18 of them connected like any other AMD part.
 
I wouldn't interpret that as meaning you get anything less out of the last four CUs if you were to use them for graphics. You just have all those extra ACEs there, so you're not taking advantage of them if you aren't doing some compute on the GPU.

And again I don't see how you get that from this:

Mark Cerny: That comes from a leak and is not any form of formal evangelisation. The point is the hardware is intentionally not 100 per cent round. It has a little bit more ALU in it than it would if you were thinking strictly about graphics. As a result of that you have an opportunity, you could say an incentivisation, to use that ALU for GPGPU.
 
Good god not 14+4 again, the point of the original slide was just that there is a concept of diminishing returns, and some sort of knee in the performance curve given the rest of the system.

There are lots of things that can limit performance if the system is vertex, triangle setup or draw call bound then you having GPGPU like tasks queued up will allow you to exploit the unused cycles.

You only have to look at the 64 compute queues to see that Mark/Sony bet pretty heavily on compute being a big deal going forwards, MS not so much, they seem to have concentrated on reducing latency in the system to better exploit the resources they have.

FWIW As far as I am aware there is no exposed way in the system to actually partition the CU's there are 18 of them connected like any other AMD part.

ERP for the win, on clearing up this nonsense once again! :smile:
 
A couple of questions:

  1. Cerny mentions the CU's have "a little bit more ALU." I'm assuming he means all CUs are that way. If so, then they have a little bit more ALU as compared to what? A standard GCN CU? Although, given that he say's if you were "strictly speaking about graphics" perhaps the "extra" ALU is just a general characteristic of GCN CUs and not a specific customization?
  2. Can a workload (wavefront?) be assigned to specific CUs or to a specific number of CUs? I was under the impression (perhaps falsely) that you can't split the load in such a manner (e.g. X many CU's assigned to rendering task, while Y many are assigned to compute task during the same slice of time).

EDIT

Looks like ERP answered my second question:

FWIW As far as I am aware there is no exposed way in the system to actually partition the CU's there are 18 of them connected like any other AMD part.
 
A couple of questions:

  1. Cerny mentions the CU's have "a little bit more ALU." I'm assuming he means all CUs are that way. If so, then they have a little bit more ALU as compared to what? A standard GCN CU? Although, given that he say's if you were "strictly speaking about graphics" perhaps the "extra" ALU is just a general characteristic of GCN CUs and not a specific customization?
  2. Can a workload (wavefront?) be assigned to specific CUs or to a specific number of CUs? I was under the impression (perhaps falsely) that you can't split the load in such a manner (e.g. X many CU's assigned to rendering task, while Y many are assigned to compute task during the same slice of time).

1) Unfortunately, that isn't what he said. He was speaking on a more general level.
2) At a low level, some amount of control exists or will. The exposure of lower-level details and possible customizations for the consoles and possibly Mantle might reveal what sorts of controls there are.
 
Good god not 14+4 again, the point of the original slide was just that there is a concept of diminishing returns, and some sort of knee in the performance curve given the rest of the system.

There are lots of things that can limit performance if the system is vertex, triangle setup or draw call bound then you having GPGPU like tasks queued up will allow you to exploit the unused cycles.

You only have to look at the 64 compute queues to see that Mark/Sony bet pretty heavily on compute being a big deal going forwards, MS not so much, they seem to have concentrated on reducing latency in the system to better exploit the resources they have.

FWIW As far as I am aware there is no exposed way in the system to actually partition the CU's there are 18 of them connected like any other AMD part.

I think the 14+4 was just a really poor choice or wording on VGLeak's part. It's easily misconstrued and has negative connotations. There's really nothing negative about adding more CUs than can be efficiently utilized for graphics when they have "balanced" that by making GPGPU so easy to exploit.
 
Good god not 14+4 again, the point of the original slide was just that there is a concept of diminishing returns, and some sort of knee in the performance curve given the rest of the system.

So the question would be the sharpness of the knee or it's placement depending on the game. For the predicted "average" game 14 CUs would be where the knee would likely be.
 
1) Unfortunately, that isn't what he said. He was speaking on a more general level.
2) At a low level, some amount of control exists or will. The exposure of lower-level details and possible customizations for the consoles and possibly Mantle might reveal what sorts of controls there are.

Thanks. Although that is quite literally what he said.

Mark Cerny: That comes from a leak and is not any form of formal evangelisation. The point is the hardware is intentionally not 100 per cent round. It has a little bit more ALU in it than it would if you were thinking strictly about graphics. As a result of that you have an opportunity, you could say an incentivisation, to use that ALU for GPGPU.

What I was wondering what "a little bit more" was in reference to. Sounds like he was speaking in general as you suggested, but I wasn't sure.
 
I think he means the CUs have more ALU than they need if they're being used as part of the graphics pipeline, and it is better utilized by compute, which is why they added all of those ACEs. I don't think he means they have extra ALU over what is normally part of a GCN CU.
 
I think the 14+4 was just a really poor choice or wording on VGLeak's part. It's easily misconstrued and has negative connotations. There's really nothing negative about adding more CUs than can be efficiently utilized for graphics when they have "balanced" that by making GPGPU so easy to exploit.

The 14 + 4 thing came from a Sony slide (though not one meant for public consumption of course).

Thankfully ERP has stepped in now and no-one can argue any more. Yes, diminishing returns and a knee in the performance curve for "graphics". That's what Cerny meant, and that's what the MS Fellows were talking about too.

Niether Cerny nor the MS fellows are contradicting each other when they talk about "balance", so it's pretty amazing that the internet has managed to whip up a shit storm and hate campaign.
 
The DF article today mentions MS said something similar about priority and compute queues. I don't have it in front of me, but it seems the graphics pipeline is going to have spare cycles that compute can use. Since the PS4 has 8 instead of 2, it can better utilize these holes. I wonder if the 14+4 was a simplification, you don't dedicate 4CUs, you just use 4/18 (22%) of the CU ALU time on compute jobs.
 
The DF article today mentions MS said something similar about priority and compute queues. I don't have it in front of me, but it seems the graphics pipeline is going to have spare cycles that compute can use. Since the PS4 has 8 instead of 2, it can better utilize these holes. I wonder if the 14+4 was a simplification, you don't dedicate 4CUs, you just use 4/18 (22%) of the CU ALU time on compute jobs.

Pretty much spot on, I think, except you forgot about the Onion+ bus which also helps developers exploit GPGPU without effecting the performance of their regular GPU jobs.
 
Status
Not open for further replies.
Back
Top