PlayStation 4 (codename Orbis) technical hardware investigation (news and rumours)

Status
Not open for further replies.
This like, almost every other, "could X happen" question is yes.
It can, because there are only a few things that are physically forbidden in the whole vast universe of computation.

Yes doesn't mean "should", however. It sounds like they will be busy on non-graphics things.
 
I also wonder if this ties into the "dual camera". Just as Durango has resources dedicated to Kinect, Orbis might have dedicated resources to their own solution.
 
The specifics of what the 4 CUs do will help under the system. I'm certainly curious how it relates to PC, as it seems a concept that could only be pulled off in a console and may be quite valuable. GPGPU augmentations, especially when audio and video is being taken care of by custom silicon, could mean quite a lot of versatile computer resources for non-direct-graphics work. Maybe light propagation systems, AI sorting, or whatever.

Computers have the option of just brute forcing through it. Especially when going forward, PC GPUs will have more and more compute resources available. In other words, there's no real reason to reserve any with regards to what can be done on the rumored console specs.

Just to add on Shifty's post (even though this is not DvO). Even if you have "only" 14 CUs available for rendering, these 4 CUs doing their job will still have to "fit" somewhere in Durango.

I think this situation is even worse than the one before for MS since Sony would practically make devs push extra compute in any direction they want and they would still have more shading performance than MS in that case. Not to talk about ROPs, texture units etc. I doubt Durango matches it there anyway.

It "could" be or it might not be. One of the specific use cases mentioned was to assist in Physics calculations. On PC, even with a huge monetary and marketing push by Nvidia, GPGPU assisted physics calculations are still extremely rare. And while it enabled additional effects at fasters speeds (sometimes) over hardware that couldn't do it (GPGPU accelerated PhysX) the difference for many wasn't enough to justify going with one graphics vendor over another.

In other words, most titles may not ever make use of it, but first party titles probably will. It's arguable if multiplatform titles will bother to put to significant effort into something which only one platform could benefit from.

And going at least by the rumors, if they really do mean "minor" speed increase in graphics rendering when used for such, that certainly sounds a lot less than 30% which I wouldn't regard as minor. And even then, 30% when it is completely compute bound which is rarely the case (look at Radeon x1950 xtx versus Geforce 7900 GTX, the x1950 xtx had significantly more shader power but couldn't effectively leverage that due to things not always being shader bound) isn't going to lead to 30% better graphics rendering performance.

So, yes, the potential is there to do something the other console may not do as well at. How impressive that will be remains to be seen. Coming from PC gaming, I'm a bit jaded with regards to something like this getting widespread use. But we'll see.

As well, not all games require robust and extensive physics simulations in which case whatever you may or may not ever use it to varying degrees.

Regards,
SB
 
Computers have the option of just brute forcing through it. Especially when going forward, PC GPUs will have more and more compute resources available. In other words, there's no real reason to reserve any with regards to what can be done on the rumored console specs.



It "could" be or it might not be. One of the specific use cases mentioned was to assist in Physics calculations. On PC, even with a huge monetary and marketing push by Nvidia, GPGPU assisted physics calculations are still extremely rare. And while it enabled additional effects at fasters speeds (sometimes) over hardware that couldn't do it (GPGPU accelerated PhysX) the difference for many wasn't enough to justify going with one graphics vendor over another.

In other words, most titles may not ever make use of it, but first party titles probably will. It's arguable if multiplatform titles will bother to put to significant effort into something which only one platform could benefit from.

And going at least by the rumors, if they really do mean "minor" speed increase in graphics rendering when used for such, that certainly sounds a lot less than 30% which I wouldn't regard as minor. And even then, 30% when it is completely compute bound which is rarely the case (look at Radeon x1950 xtx versus Geforce 7900 GTX, the x1950 xtx had significantly more shader power but couldn't effectively leverage that due to things not always being shader bound) isn't going to lead to 30% better graphics rendering performance.

So, yes, the potential is there to do something the other console may not do as well at. How impressive that will be remains to be seen. Coming from PC gaming, I'm a bit jaded with regards to something like this getting widespread use. But we'll see.

As well, not all games require robust and extensive physics simulations in which case whatever you may or may not ever use it to varying degrees.

Regards,
SB

I get where you are coming from but I would say that console development do benefit pc games alot. The move to dx11+ gpu will change the baseline graphic makeup of most games coming out post next-gen launch. So the fact that the ps4 has dedicated hardware for compute jobs means that that benefit will carry over to the PC.
 
Computers have the option of just brute forcing through it.
Sure, but I was talking in terms of efficiency from customised hardware. It's possible (how much, I don't know) that the general purpose compute from these 4 CUs is equivalent to 5 or 6 CUs on a PC GPU. There's no point drawing a comparison on performance as PC will always win, but as consoles are all about best bang-per-buck, I'll be interested to see if Sony/AMD have managed to latch upon a customisation that wouldn't work in the PC space which pays worthwhile dividends. Similar to the value of eDRAM and blitters and such in other machines.

In other words, most titles may not ever make use of it...
Unless there's an SDK library which is something I'd expect. May not see custom solutions in ports (eg. no hair simulation in UE4 ports even if it features in Orbis exclusives), but then again, it may. And to me, augmenting light and sound through physics seems an obvious plug-in solution similar to PS3's MLAA solution that devs were free to use. Or it may be used for the camera stuff as others suggest.
 
Do that, however, and you risk junking the chip if one of those special CUs has a flaw, or increasing the amount of redundant CUs the chip already has.
I suspect that at least 2 regular CUs are physically present but fused off for yield purposes.

Using the same hardware makes the reserved CUs less of a manufacturing risk, since they can draw from the same redundant units.
To further elaborate on, the leak figure for texturing power (in the ballpark of a HD 7850) would let one assume that the figure is for 18CUs running at slightly lower speed than the ones in a HD7850 (only 16 CUs).
It is unlikely that the figure comes from 14 CUs running at higher clock speed (than the HD 7850).
To me all the CUs are the same, and indeed a 20CUs part could make sense for yields.

The magic has to be in how all those CUs are linked to the command processor and the ACE(s).
I could see the command processor having control of the 18CUs in the design and the ACEs (for some reason) having only control of 4 of them.
Either way it is a completely "software thing" in the way Sony presents the hardware to developers.
 
So there's these executing units that can, at least in theory, work on different workloads, say graphics computations versus more general computations, but in reality having one of those units doing both of these works at the same time can degrade performance because it's not fast to switch context at will and the different workloads may require different tweaks on the executing units themselves.

Two approaches you could use to tackle this problem are: Create separate "dedicated" units, tweak them for each workload and not have to deal with the performance pits having one working on both workloads could yield, or you make bigger (suited to both workloads) but fewer (assuming a fixed budget), and work on architectural changes that allow these units to switch very fast between those workloads...

I am the only one who thinks this is eerily similar to PIxel Shaders + Vertex Shaders vs Unified Shaders?
 
So there's these executing units that can, at least in theory, work on different workloads, say graphics computations versus more general computations, but in reality having one of those units doing both of these works at the same time can degrade performance because it's not fast to switch context at will and the different workloads may require different tweaks on the executing units themselves.

Two approaches you could use to tackle this problem are: Create separate "dedicated" units, tweak them for each workload and not have to deal with the performance pits having one working on both workloads could yield, or you make bigger (suited to both workloads) but fewer (assuming a fixed budget), and work on architectural changes that allow these units to switch very fast between those workloads...

I am the only one who thinks this is eerily similar to PIxel Shaders + Vertex Shaders vs Unified Shaders?

That's one way of looking at it, though they are 'unified' from an instruction set POV (I guess).

I wonder what level of changes they'd bring to the 4 CUs. Bigger set of registers? More cache? Could it possibly even work more closely with CPU or are we getting into the realm of (unlikely) physical separation from the other CUs..? It'd be lovely if those CUs could look at CPU cache in a fast way although not being a hardware designer, that might sound very naive... :)
 
So there's these executing units that can, at least in theory, work on different workloads, say graphics computations versus more general computations, but in reality having one of those units doing both of these works at the same time can degrade performance because it's not fast to switch context at will and the different workloads may require different tweaks on the executing units themselves.

Two approaches you could use to tackle this problem are: Create separate "dedicated" units, tweak them for each workload and not have to deal with the performance pits having one working on both workloads could yield, or you make bigger (suited to both workloads) but fewer (assuming a fixed budget), and work on architectural changes that allow these units to switch very fast between those workloads...

I am the only one who thinks this is eerily similar to PIxel Shaders + Vertex Shaders vs Unified Shaders?
Well I don't have the time to search that very post but it is from Sebbbi and it is in the "software rendering thread".
He states that at some point during a frame the GPU ALUs are mostly idle as the "fixed" functions part of the GPU does there things.

It makes me wonder if Sony could present the GPU as it does (assuming it is software) so a tiny part of the GPU for example handle filling the G-Buffer and other render targets at frame x, whereas the other part of the GPU operate "sort of asynchronously" at frame x+1 without ever "stalling".

Time for work long shitty night to come for me :(
 
Not really sure what to feel about this yet since we don't know how much compute tasks we'll see in the average game. In light of this news we can finally see where those strange comments about performance being nearly equal came from. It's not that one was significantly weaker than the other, but rather one was less powerful than we thought... for tasks related to general rendering. And since I imagine most games devs are making now are probably not doing much fancy compute stuff it all makes sense.

And then libGCM +10-20% console magic optimization brings us the rest of the way, for the hopes and dreams.
 
Reading the updated gpu info there could be a wording misunderstanding: it could be that the 4 special CUs have a extra ALU taylored for GPGPU ops and that when used these 4 CUs for rendering "this special ALU" will give a minor boost over a normal CU graphics performance.
 
Last edited by a moderator:
The magic has to be in how all those CUs are linked to the command processor and the ACE(s).
I could see the command processor having control of the 18CUs in the design and the ACEs (for some reason) having only control of 4 of them.
Either way it is a completely "software thing" in the way Sony presents the hardware to developers.

I don't recall reading something that nails down the exact relationship, but I think the ACEs are in charge of allocating work and resources in the CU array. The command processor itself, and the graphics pipelines, may not have direct access to the array.

There may be a static split of physical CUs to one ACE or the other for some GPUs, since there are two banks of CUs in most of the GCN designs, except for the small ones. However, there may be an interconnect that allows for communication to both sides, and even the small GPUs have two ACEs despite having only one bank of CUs.

I'm leaning towards the idea that the system sets up kernels before apps can access the GPU, and those contexts run persistently as generic compute contexts that take up a fixed amount of resources and have their priority set so they get a guaranteed amount of issue capability on the four CUs they exist on.
The ACEs and the scheduling firmware would control the allocation of any other programs, and they would force other programs running on the reserved CUs to throttle to some minimum timeslice if the reserved program contexts are given work.
 
I am the only one who thinks this is eerily similar to PIxel Shaders + Vertex Shaders vs Unified Shaders?
Pixel and vertex shaders were combined because they shared very similar operations when running; there wasn't enough difference between the modern programmable flavours to justify keeping them separate. Any hardware that is going to remain discrete and not part of the open programmability has to show it's value by being vastly more efficient. eg. We see a DSP and a video en/decoder, operations that could be performed in a unified processing architecture (CPU) but for which custom hardware that operates differently is more efficient.
 
Not really sure what to feel about this yet since we don't know how much compute tasks we'll see in the average game.
If the processor is there, devs will use it. That's the joy of closed-box hardware! The question is how it will be used (and what exactly it can do), and as others say, that should push forwards the tech in general so PC code starts to see DX11 games running compute functions lifted from PS4 and run across the GPUs in a conventional fashion.
 
I get that, but it just feels like another round of waiting on improving devkits, developer coddling and nudging to squeeze the last ounce of blood from the stone scenario to get something "whoa like" again. And I stopped being impressed by console graphics around Uncharted 2 times which was ages ago. Doesn't look like any Agni's out of the box here without some hard, hard work or IQ downgrades.

(I reserve the right to change my judgement depending on how nice these tech demos will be come E3. Also after watching Agni's a couple times I'm not that impressed with it either anymore :) )
 
If this is what Orbis is, I don't think any epiphany will be required to extract very good performance from it. It looks simple and powerful. Durango looks a little weird, and it might take some thought.

Overall, I like the look of Orbis. Sounds like Sony got their tools in order last gen, so things should be running smoothly out of the gate. I know some people were expecting a super computer (unreasonable expectations based on PC hardware), but this thing will be a big leap from PS360.
 
What extra things (apart from DP) are in the Tahiti CUs that are missing from the Pitcarin and Cape Verde CU? Don't they have extra logic dedicated to GPGPU? Better branch prediction etc.?

Could the 4 'balanced' CU's be more like Tahiti?
 
What extra things (apart from DP) are in the Tahiti CUs that are missing from the Pitcarin and Cape Verde CU? Don't they have extra logic dedicated to GPGPU? Better branch prediction etc.?
Aside from DP, no differences are mentioned, and GCN does not predict branches.
 
Status
Not open for further replies.
Back
Top