Does PS4 have excess graphics power intended for compute? *spawn*

mosen

Regular
The prior page of discussion? The subject arose again because a dev mentioned the '14+4 configuration'. There is no 14+4 configuration. There is only 18 CUs configuarion. How devs choose to use that is down to them. The notion of diminishing returns is tangential to the discussion of PS4's technical hardware investigation. PS4's hardware is 18 CUs that devs can use however they want. The discussion of whether 18 CUs on graphics in PS4 is a wasteful or not belongs to its own thread.

It seems the "14+4" term became a taboo here. The developer talks about using suggested 14+4 set-up (not configuration) and said that if the number of CUs was higher they could do even more with additional CUs. So there is noting wrong with his claim. Did he say that there are different CUs on PS4? Or using 14+4 set-up is mandatory for all developers?

14+4 is a suggested set-up for using those 18CUs, nothing more, for some reasons that Sony knows better (and we don't, maybe it's for diminishing returns as you said).

I'm sorry if I ruined the topic, this will be my last post on this subject.
 
It seems the "14+4" term became a taboo here. The developer talks about using suggested 14+4 set-up (not configuration) and said that if the number of CUs was higher they could do even more with additional CUs. So there is noting wrong with his claim. Did he say that there are different CUs on PS4? Or using 14+4 set-up is mandatory for all developers?

14+4 is a suggested set-up for using those 18CUs, nothing more, for some reasons that Sony knows better (and we don't, maybe it's for diminishing returns as you said).

I'm sorry if I ruined the topic, this will be my last post on this subject.
As far as I can tell, 14+4 isn't even a recommended setup, considering that things tend to be scheduled very fluidly; it makes no sense for a developer to try to lock down 14 CUs for graphics and 4 for other stuff when the system can freely allocate anything to whatever when it's needed.

14+4 is, at best, just a wishy-washy representation of the design decisions in light of expected reasonable workloads (and/or a recommendation to use GPU compute).
 
14+4 is a suggested set-up for using those 18CUs...
It's not a suggested setup because it's not possible to set up the CUs that way. You don't decide to set 4 CUs working on compute. You send jobs to the CUs and the GPU scheduler arbitrates what does what when. 14+4 is conceptually a nonsense as far as is known, and so it's frustrating that it keeps making an appearance. At best, in terms of workload devs can think about dedicating 2/9ths of the GPU time on compute and 7/9ths on graphics, because of diminishing returns or whatever, but that's not a technical design principle of PS4.

Until some (first party) dev posts a slide showing some CUs walled off from the rest working on compute, this discussion is redundant and needn't be repeated.
 
It's not a suggested setup because it's not possible to set up the CUs that way. You don't decide to set 4 CUs working on compute. You send jobs to the CUs and the GPU scheduler arbitrates what does what when. 14+4 is conceptually a nonsense as far as is known, and so it's frustrating that it keeps making an appearance. At best, in terms of workload devs can think about dedicating 2/9ths of the GPU time on compute and 7/9ths on graphics, because of diminishing returns or whatever, but that's not a technical design principle of PS4.

Until some (first party) dev posts a slide showing some CUs walled off from the rest working on compute, this discussion is redundant and needn't be repeated.

What was the decision behind the inclusion of the extra 4 CUs?So far they appear useless/reduntant. What is the conclusion? Bad design that ended with wasted silicon?
 
What was the decision behind the inclusion of the extra 4 CUs?So far they appear useless/reduntant. What is the conclusion? Bad design that ended with wasted silicon?
What extra 4 CUs? There's 18 CUs. That's the design. They are 18 massively wide, parallel processors that can be used for graphics or compute. Calling it an extra 4 is daft, like calling the CPU a 5+3 core CPU with 5 cores designed for doing graphics work and 3 for game code. If devs want a really simple game with awesome graphics, they can, or they can shift workload for computation, on either/both processors in the system.
 
What was the decision behind the inclusion of the extra 4 CUs?So far they appear useless/reduntant. What is the conclusion? Bad design that ended with wasted silicon?
There's no "extra 4 CUs." If you're referring to Sony suggesting that 14 would have been more in line with a traditional balance, the answer is basically exactly what it says on the tin: Sony wanted a GPU balanced toward more programmable computation power. That doesn't mean the silicon was wasted, it means that they were targeting a slightly different balance of workloads than what GPUs usually target (perhaps a balance encouraging the use of non-graphical GPU compute tasks).
 
What was the decision behind the inclusion of the extra 4 CUs?So far they appear useless/reduntant. What is the conclusion? Bad design that ended with wasted silicon?

I feel like I've stepped off the train into Crazytown right now.
 
(I don't know if rendering at higher resolution needs more ALUs or not, or it's only ROPs/Bandwidth/Memory related).
Not just resolution, but frames too, PS4 has a general lead in fps even when resolution is the same.(e.g.,Tomb Raider) So ALUs definitely play a part in that advantage.
Comparing the PS4's GPU to PC GPUs that are balanced for what they are isn't right thing to do,
PC GPUs are not balanced to do anything, the manufacturer balances the top chip, picks it apart and makes the rest of the lineup from it's scraps .. they try to make things even as much they can .. but they don't go out on a limb to force extreme balance across the board.
we don't have enough information about PS4's GPU or PC GPUs to compare them.
Yes we do, they are the same thing. and GCN architecture has been discussed to death in the PC section of this forum.

Sony said that they have balanced PS4 GPU for 14 CUs , or in Cerny term there are more ALUs in PS4 than what they needed if they were only aiming for graphics rendering.
Allegory talk probably, there is no such thing as balancing a chip for the whole general graphics rendering, you can always scale your code to do whatever you like, shader heavy games will make use of whatever resources you have got. And with the low clocks of the PS4's GPU, the number of available ALUs becomes crucial in upgrading or downgrading performance.
 
What was the decision behind the inclusion of the extra 4 CUs?So far they appear useless/reduntant. What is the conclusion? Bad design that ended with wasted silicon?

I needed that laugh... :LOL:

Someone needs to call AMD/Nvidia, letting them know anything over 14 compute units are a waste ...12 CUs being balanced (irony).
 
PC GPUs are not balanced to do anything, the manufacturer balances the top chip, picks it apart and makes the rest of the lineup from it's scraps .. they try to make things even as much they can .. but they don't go out on a limb to force extreme balance across the board.

Yes we do, they are the same thing. and GCN architecture has been discussed to death in the PC section of this forum.

Then explain the differences of each series to me (especially "peak GPU bandwidth" section which obviously has a trend with # of CUs):

7xxx series: http://www.upload7.ir/imgs/2014-04/58514407304172332613.png

68xx-69xx series: http://www.upload7.ir/imgs/2014-04/37659473897437140574.png

According to this PDF (Pages 228-229).

None of them are the same. Every series uses the same architecture for compute units but there are some differences as well. If PS4 has 18 CUs but balanced for 14 CUs, what should I expect according to this plot (I'm referring to 7xxx series)?

97020386877367137886.png


Compare it with PC GPUs.
 
The first graph says that they design about two times the bandwidth for a 18 CU unit compared to a 8 CU unit.

The second is an older architecture and doesn't belong in the discussion.

Comparing the two graphs the conclusion is pretty clear that the 7xxx series design is more well thought out than the 6xxx series, assuming that more CUs require more bandwidth, and that the requirement scales linearly with the number CUs, i.e. if you need 1GB/s for 1 CU, you'll need 2GB/s for 2CUs, and that's how the 7xxx series got designed. 6xxx series obviously didn't scale the bandwidth in this manner, while the 7xxx series did.

So how does this argue anything related to our topic? If anything it tells us that the 7xxx architecture that PS4 is using a design that has scaling up in mind, and that the 4 "extra" CUs are designed to contribute just as much as the 14 "native" CUs by feeding them enough bandwidth.
 
The first graph says that they design about two times the bandwidth for a 18 CU unit compared to a 8 CU unit.

The second is an older architecture and doesn't belong in the discussion.

Comparing the two graphs the conclusion is pretty clear that the 7xxx series design is more well thought out than the 6xxx series, assuming that more CUs require more bandwidth, and that the requirement scales linearly with the number CUs, i.e. if you need 1GB/s for 1 CU, you'll need 2GB/s for 2CUs, and that's how the 7xxx series got designed. 6xxx series obviously didn't scale the bandwidth in this manner, while the 7xxx series did.

So how does this argue anything related to our topic? If anything it tells us that the 7xxx architecture that PS4 is using a design that has scaling up in mind, and that the 4 "extra" CUs are designed to contribute just as much as the 14 "native" CUs by feeding them enough bandwidth.

I can't understand what you want say. What do you mean by saying "PS4 using a design that scaling up in mind"?! You mean it has same bandwidth as a balanced 18 CUs set-up?

By linking the second graph I wanted to say that every GPU of each series has it's own specifics and they are optimized for what they are.
 
Last edited by a moderator:
I can't understand what you want say. What do you mean by saying "PS4 using a design that scaling up in mind"?! You mean it has same bandwidth as a balanced 18 CUs set-up?

By linking the second graph I wanted to say that every GPU of each series has it's own specifics and they are optimized for what they are.

Are you saying that the PS4 does not have the approximate specs of a 7850~7870?

58514407304172332613.png


Because the PS4 looks damn close to a 7850, only exchanging a few Mhz of clock speed for 2 more CUs.


The 6000 series is obviously older than 7000 series, and it is arguable that they realized that having bandwidth scale linearly with the number of CUs is important, thus resulting in the differences between these two graphs.

I still don't get your argument??? What exactly are you trying to say in relation to the 18 or 14+4 CUs?
 
Last edited by a moderator:
You mean it has same bandwidth as a balanced 18 CUs set-up?
This conversation really needs people to stop talking about balance and instead define it. What is imbalanced about PS4 with 18 CUs? Compare 14 CUs @ 1080p with 18 CUs and show that four more can't contribute much. Because that's not true. In a highly tuned console game, devs will use whatever resources. If there's BW to spare, they'll use it. If there's ALU power to spare, they'll up the complexity of their shaders. If they want more physics, they may scale back their visual design. The balancing always happens in the software. Look at the total imbalance of consoles like PS2, which still produced 'balanced' games.

This topic is not going to gain any traction or serve any purpose until those asking questions can really clarify them. I'm thinking that the real question here is, "what did Cerny mean by 'balance'?" and it's that single talking point that's spawned this rather odd view of PS4's graphics processor.
 
There's no "extra 4 CUs." If you're referring to Sony suggesting that 14 would have been more in line with a traditional balance, the answer is basically exactly what it says on the tin: Sony wanted a GPU balanced toward more programmable computation power. That doesn't mean the silicon was wasted, it means that they were targeting a slightly different balance of workloads than what GPUs usually target (perhaps a balance encouraging the use of non-graphical GPU compute tasks).

Yes I was referring to the balance thing
 
Yes I was referring to the balance thing

The process for determining an architecture specification to achieve goal X is convoluted and relies on looking backwards as much as it does looking forward. I've observed the process for forward hardware planning for our server farms and taking that as a basis, this is how I envisaged Sony decided on a Jaguar core with 18 Compute Units.

Sony approached AMD with a bag of $$$, the budget for their CPU and GPU. To expedite the explanation let's assume they've already discounted discrete CPU/GPU solutions and have settled on AMD's GCN architecture in an APU and the timing of the project means they'll be using Jaguar and not Bobcat or Puma. Let's also assume that they've already decided on GDDR5 - lot's of discussions with AMD have already taken place at this point.

How did they reach 18 Compute Units at Z Mhz?

Sony would have had a ballpark graphics performance target for PS4 e.g. 30/60fps at 1080p and in part this would have been derived from earlier discussions with AMD about what hardware their $$$ could actually buy, i.e. die size, expected yields, number of CPU cores and the overall GPU performance.

Working out how many CPU cores and compute units would have been a process of reviewing the performance profiles from a lot of older and current games and the CPUs and GPUs driving them. AMD would have been reviewing DirectX/OpenGL profiling data and perhaps leveraging early experience from Mantle. Sony would have been profiling as many PlayStation 3 game as possible, looking at the number and types of API calls and measuring what the GPU was actually doing. Sony are also a PC developer (Sony Online Entertainment) so have that experience to draw on as well. Between AMD who are experienced in what hardware is required to run games well on DirectX and OpenGL and Sony who are experienced in graphics hardware use with a low-level API, they can determine pretty accurately the number of CUs needed for Sony's target allowing for some unknowns, like the API (GNM) being incomplete.

From the VG leak and what Cerny has alluded too, this 'balance' appears to be 14 CUs for graphics. The reason to include more Compute, which Cerny has been more explicit on, is because Sony think GPGPU is going to be big in a few years and they didn't want developers to have to cut back on compute rendering graphics for what will in the future be traditional compute for other game processing:

Mark Cerny said:
"The vision is using the GPU for graphics and compute simultaneously. Our belief is that by the middle of the PlayStation 4 console lifetime, asynchronous compute is a very large and important part of games technology."

Having a performance target and determining that X number of Compute Units is sufficient/optimum for graphics is predicated on the performance of the available CPU cores, their ability to feed the CPU (and run the rest of the game) using the buses in the system and the available memory and its bandwidth - executed efficiently. If you are not being efficient in programming the GPU then 18 CUs will produce better results than 14 CUs unless you are constrained elsewhere - CPU or bandwidth for example.
 
Also... the chart above states for the 7850/7870 generation a bandwidth of 150GB/s for the 16/20 CU chips at higher frequency.

Given that PS4 has ~176GB/s and has to feed the CPU as well, that also makes sense.

In what way would this design be unbalanced anyhow? I mean... you can always use more X or Y if it's there. You'll eventually run into another bottleneck. But comparing it to PC GPUs, there's no obvious one.
 
How did they reach 18 Compute Units at Z Mhz?

Sony would have had a ballpark graphics performance target for PS4 e.g. 30/60fps at 1080p and in part this would have been derived from earlier discussions with AMD about what hardware their $$$ could actually buy, i.e. die size, expected yields, number of CPU cores and the overall GPU performance.
I doubt there's anything like that going on, at least not to a big degree. You can't design a modern console as 30 or 60 fps*, as what appears on screen is down entirely to software. Look at last gen. Was PS3 a 720p60 console? Or a 720p 30? Or a 1080p30? Or 1080p60 ish? Because there are games of all those types on there. Whatever hardware you give to developers, they'll choose themselves what resolution and framerate to target, and scale their engine accordingly. It's impossible to define those parameters in hardware. Even looking at XB360 with eDRAM spec'd to 720p + MSAA, it wasn't used as such.

So given that Sony were not going to have any control over what devs do with their hardware, and there's no way to design a '1080p60' box that won't in some games be reduced to 20-30 fps and in others fail to hit 1080p, there's no point in factoring in the performance relative to game targets. Instead, you target performance for cost. Sony pick a price-point, look at hardware options that fit that pricepoint (basically die size), and go with what looks good value. AMD offer a decently balanced part based on their GPU options with some feedback from Sony about what tweaks they'd like (32 ROPs being one obvious decision. We have Sebbbi telling us that's basically overkill and compute is more important going forwards, so that seems imbalanced). Why the choice of 18 CUs instead of 20 or 16 or 14? Because that hit the price range that Sony wanted. If they could have put in 20 CUs at the same price, they would have - it has nothing to do with balancing with the CPU. Thanks to compute, the GPU doesn't need to be balanced with anything. If you can't drive enough graphics to saturate it (and anyone who wants to can), put something else on there. The reason Sony didn't pick 14 CUs is they wanted more grunt and could afford to get that.

Probably the only balancing consideration that came into it was what CPU cores to have. Sony could have gone Piledriver, maybe held off for Steamroller, but that'd have taken up more die space meaning less room for CUs or a higher cost. So Sony looked at workloads and decided CPU wasn't as important. If they go weaker CPU and stronger GPU, they can use an existing APU design as a base and always have the option to shift compute workloads to GPU. The alternative would have been stronger CPU performance in general but weaker graphics performance. It's worth noting that there are plenty of PC parts that aren't well balanced themselves, being weak in CPU relative to a lot of game requirement out there. AMD doesn't carefully analyse games and determine where the bottlenecks are and produce highly balanced parts that spread the bottlenecks evenly through the system - that's not even possible because games work in so many different ways. They just put together the best CPU and GPU parts they can for given price and power envelopes, and then we see how well the software does or doesn't run on those parts.

Ultimately, the design for PS4 should have been based on what hardware could be provided regards flexibility and performance at a given pricepoint. If Sony made their choice based on what devs were trying to accomplish on DX9 hardware and what they may be trying to target in future, they'd have gone about the task poorly.

* assuming no esoteric fixed-function hardware like the 8 and 16 bit consoles with fixed 60Hz sprite hardware, etc.
 
This conversation really needs people to stop talking about balance and instead define it. What is imbalanced about PS4 with 18 CUs? Compare 14 CUs @ 1080p with 18 CUs and show that four more can't contribute much. Because that's not true. In a highly tuned console game, devs will use whatever resources. If there's BW to spare, they'll use it. If there's ALU power to spare, they'll up the complexity of their shaders. If they want more physics, they may scale back their visual design. The balancing always happens in the software. Look at the total imbalance of consoles like PS2, which still produced 'balanced' games.

This topic is not going to gain any traction or serve any purpose until those asking questions can really clarify them. I'm thinking that the real question here is, "what did Cerny mean by 'balance'?" and it's that single talking point that's spawned this rather odd view of PS4's graphics processor.

I mean a balance between internal components, not a balance for 1080p/60fps ... . Rayman was running at 1080p/60fps on last gen consoles. This things aren't good measure for calling a GPU balanced or unbalanced.

For example, normally, bandwidth of L1, LDS and Registers (in every CU) is tailored to the number of all CUs in GPU (considering the graph). What happens if the bandwidth of L1, LDS, Registers (for each CU) and L2 on a GPU being tailored for a lower number of CUs that GPU has? For example assuming a GPU that has 20CUs but L1, LDS, Registers and L2 bandwidth being tailored for 15CUs. What happens in this situation? The efficiency of the GPU decrease? Or noting changes? The efficiency of all workloads (rendering, GPGPU, ...) would be the same on this situation (if this change tends to decreasing the performance)? Different bandwidths of L1, LDS and Registers on different GPUs are the results of a physical limitation (memory/cache bus wide) for each design, clock differences or a mixture of both?
 
Last edited by a moderator:
Back
Top