The capabilities of the 4 special CUs in Orbis

ultragpu

Banned
This thread is for speculating the possibilities of the 4 special CUs and give examples to show what can be achieved when using them purely for computing.

By now we're all aware of those 4 special CUs inside Orbis are made for compute assuming the rumor is true. We have 400 gflops of raw processing power under the hood ready to calculate physics, AI, lighting, animation, particles and etc. So if combing the rest 14 CUs which is obviously for rendering purpose, just what kind of results can we see at the maximum utilization? What would be possible if 400 gflops of computing power is focused purely on one aspect such as lighting, physics or particle etc? Is it possible to see Killzone 2 CGI target render level particle, or destruction far beyond what's available in Frost Bite 2.0?

9a4g3k.jpg
 
If they were dedicated to compute that is more raw performance than RSX and Xenos (taking into consideration of architecture 4CUs ~ Xenos+RSX). From that perspective look at games that use have great shadows or lighting or fantastic particle systems and consider if the entire visual budget was dedicated to ramping it up. Pretty sweet.

That said, and not to derail, I think 3dil. had the best theory on the rumor thus far imo: the reason for a slight performance bump is because they are designed to service the OS first (e.g. OS blade, streaming to a tablet like SmartGlass or to the Vita, etc) and whatever open threads/cycles can be used by the game but there is no guarantee of service timeliness. I could be wrong but based on the MS docs and how they were looking at QoS guarantees for background services, and this included GPU resources, I think there may be a strong arguement for this. For gamers I hope this isn't true--of course I would hope it was 18CUs, period, so developers could use them however they wish. It could also come down to using the CUs for their Eye camera, especially if it is stereo and doesn't have an extra channel for native 3D.
 
This thread is for speculating the possibilities of the 4 special CUs
I would claim with utmost certainty, that there is nothing "special" about some 4 CUs. There is a shader array, there is the usual command processor, and there is (at least) one ACE. It will be ultimately in the developer's hand how to distribute all the shader resources between graphics and compute tasks (save for some reservation by the OS). It's possible to let the hardware itself figure it out (could be helped with setting priorities for tasks) and there is the possibility to do a static split of the CUs for slightly more predictable runtimes (but less overall efficiency). The number of 4 CUs is probably just an example to give developers a hint what Sony deems reasonable or following 3dillettante that 4 CUs may be shared with the OS and the remaining ones can be used exclusively.
 
400Gflops is not that much for GPU physics. Its horribly inefficient. Its not the same as having 400Gflops on a CPU, and its not going to make any magic happen, if its even true.
 
400Gflops is not that much for GPU physics. Its horribly inefficient. Its not the same as having 400Gflops on a CPU, and its not going to make any magic happen, if its even true.

I would much prefer it go towards post processing, or something else thats generally in the graphics pipeline.

And to say 400GFLOPS isnt a lot is a bit disingenuous depending on what your talking about, after all its more then the GPU performance of both of the PS3+360.
 
400Gflops is not that much for GPU physics. Its horribly inefficient. Its not the same as having 400Gflops on a CPU, and its not going to make any magic happen, if its even true.
By your logic why wouldn't they keep it as a unified 18CU setup in the first place if the 4 CUs are so inefficient? If they really wanted the extra computing resources why wouldn't they simply upgrade the CPU instead? Something just doesn't add up here for them to go through all these hassles.
 
So there are Special CUs in the PS4? Unless I missed a new rumor, we don't even know if they are any different, and if they are we have no idea what's different. Right now we have a weird wording from a rumor that suggests there's something about them... or maybe not. We might as well speculate about how the Special Scan Out Engine can improve the graphics.

We're speculating into the void.
 
I've got inside sources which say the 4 CUs are infused with ponies and rainbows.


No, I don't think the 4 CUs should be any different from the other 14.
First of all, these "special" CUs wouldn't be missing stuff from the usual CUs as it's simply illogical from any standpoint. Why gimp them? Saving silicon space? Doesn't sound possible.

Thus there are 2 possibilities: A regular CU, or a "buffed" CU.

If "upgrading" the 4 CUs is worth it, why not just upgrade all 18 CUs and maintain flexibility in manufacturing?

Taking us back to the 4 CUs being most likely the same as the other 14.
 
Last edited by a moderator:
So there are Special CUs in the PS4? Unless I missed a new rumor, we don't even know if they are any different, and if they are we have no idea what's different. Right now we have a weird wording from a rumor that suggests there's something about them... or maybe not. We might as well speculate about how the Special Scan Out Engine can improve the graphics.

We're speculating into the void.

people are reading VGleak wrong and making false claim of what these 4CUs actually do... for example NeoGAF talking about things they don't understand.
 
By your logic why wouldn't they keep it as a unified 18CU setup in the first place if the 4 CUs are so inefficient?

Hence why I don't believe the rumor. "Reserving" 4 CUs doesn't make any sense on any level, nor does "customizing" only 4 CUs on a pre-existing design, it would cost a bunch for little difference from just using them as they are.
 
Is it possible to emulate CELL via these CUs?
Realistically? No way in hell.

In theory yes, as long as the emulating device is Turing complete it is possible to emulate any human computer with anything. It's not going to be anywhere near realtime emulation though, which would make emulation pointless.
 
The number of 4 CUs is probably just an example to give developers a hint what Sony deems reasonable or following 3dillettante that 4 CUs may be shared with the OS and the remaining ones can be used exclusively.

It is most likely that 14 of the 18 CUs are automatically scheduled for use by the unified shader pipeline.

In effect 'you throw shaders at the GPU' and 14 of the 18 CUs run those shaders, perfectly synchronized and with data flowing in a nice & predictable manner. That's your "rendering pipeline".

The other 4 CUs are not plugged in to that pipeline, nor is there any obvious way to plug them in. Instead they can be programmed/scheduled separately - in a similar manner to methods used on the PS3 Cell.

The 4CUs are likely intended to be used by:
- PS Move (if enabled). The algorithm was apparently very optimized for running on the Cell, and this should be a good match.
- stereoscopic 3d vision stuff (if enabled). It's a lot of data being compared with a lot of data as output.

But if you aren't doing either, it probably leaves 4 CUs "free".

As a non-pro who doesn't know what CUs are capable of these days (nor the advanced techniques), my list of things that seem like possibilities would be:
- hair/cloth modelling.
- AI pathfinding/line-of-sight?
- "static" shadows.
- dynamic textures/terrain (e.g. water)
- particles.
- physics collision checks.
 
400Gflops is not that much for GPU physics. Its horribly inefficient. Its not the same as having 400Gflops on a CPU, and its not going to make any magic happen, if its even true.

Uncharted 3 has some of the most advance physics i have seen on a game,including PC games and it was done on Cell which doesn't have 400Gflops,in fact the entire PS3 don't move 400Gflops.;)

I refuse to dismiss 4CU with 410Gflops as not so much and horribly inefficient,in fact in what your argument is based have you work on Orbis.? Do you know for fact that the 410Gflops don't help much,how do you know they are not efficient.?:oops:
 
Uncharted 3 has some of the most advance physics i have seen on a game,including PC games and it was done on Cell which doesn't have 400Gflops,in fact the entire PS3 don't move 400Gflops.;)

I refuse to dismiss 4CU with 410Gflops as not so much and horribly inefficient,in fact in what your argument is based have you work on Orbis.? Do you know for fact that the 410Gflops don't help much,how do you know they are not efficient.?:oops:

All Gflops are not created equal. To simplify, a GPU with 1 Tflops might be awesome for graphics tasks, but it might be slower than a 100Gflops CPU at another more complex task. Physics is a complex task. Sure, simple physics, like what you see PhysX used for most of the time, can be done well enough on a GPU, but the more complex stuff is taxing on GPU performance.

Look up some PhysX benchmarks with various Nvidia GPUs, you'll notice that the low end cards, with 300-400 Gflops tend to choke on the data and lower frame rates. If a full card with dedicated RAM, bandwidth, resources, ect, is choking on simple PhysX effects with 400 Gflops, then what do you really expect from 400 Gflops of "just shaders" on a chip doing lots of other tasks, using the same RAM, memory controllers, bandwidth and the same front end hardware as the graphics side?
 
If you fully use the Cell (which is hard to do, but has been proven to be possible in practice, which is unlike most processors), it has 227GFlops. That's not really a trivial amount. But it is very hard to do. CUs are easier to use, because most programmers are used to them and there are more 'common' apis. Also, the Jaguar CPU should be quite a bit more capable at physics than the PPE. So that in total we have an equivalent of 100GFlops of OoOE CPU cores plus 400GFlops of CUs. It is still not that much more than the Cell in theory, but we're going to have to assume that it will be better at many tasks that are asked from it, and easier to use. The really difficult thing to ascertain is efficiency - SPEs are far better at certain tasks than PPEs, so not all GFlops are created equal. The same holds for Jaguar's cores and CUs, though they shouldn't be too much of an unknown quantity at this point either.

The really big question is if there is physically something special about the CUs. In theory, Sony could be saying that we've estimated 14CU to be necessary fot the graphics pipeline, and so we decided to add 4 more just for physics. So that could literally be all there is to it. However, it is also possible that the CUs are located somewhere else on the bus, giving them closer access to the Jaguar so that Jaguar cores can work more efficiently together with these CUs on physics, animation, i age processing and so on, closer to how the PPE and SPE cores could work together in the Cell.

If and how this is done, and what the impact of that is, is the big question here.
 
All Gflops are not created equal. To simplify, a GPU with 1 Tflops might be awesome for graphics tasks, but it might be slower than a 100Gflops CPU at another more complex task. Physics is a complex task. Sure, simple physics, like what you see PhysX used for most of the time, can be done well enough on a GPU, but the more complex stuff is taxing on GPU performance.

Look up some PhysX benchmarks with various Nvidia GPUs, you'll notice that the low end cards, with 300-400 Gflops tend to choke on the data and lower frame rates. If a full card with dedicated RAM, bandwidth, resources, ect, is choking on simple PhysX effects with 400 Gflops, then what do you really expect from 400 Gflops of "just shaders" on a chip doing lots of other tasks, using the same RAM, memory controllers, bandwidth and the same front end hardware as the graphics side?


Oh i know that.

The only problem with your point is that you are talking about a 300 400 Gflops card which is handling everything,so those 400Gflops on those cards are not just been use for PhysX they are also been use for rendering which is why it also choke.

We are talking about 400Gflops outside the CU for rendering,is not the same having a 14 CU GPU with another 4 CU doing physics,than having a GPU with just 14 CU handling the Physics and rendering as well.


I just don't think sony put those there to be useless of inefficient,we all know now what Cell is capable of doing,but back on 2006 most people dismiss it as just another CPU but bad for general purpose code and branching.
 
Oh i know that.

The only problem with your point is that you are talking about a 300 400 Gflops card which is handling everything,so those 400Gflops on those cards are not just been use for PhysX they are also been use for rendering which is why it also choke.

No, I'm refering to 400Gflops on a dedicated PhysX card, with another GPU that is much more powerful as a the graphics unit. Thats literally a whole card with 400Gflops doing nothing but physics, but its not enough, even for 3-4 year old PhysX games.
 
It is most likely that 14 of the 18 CUs are automatically scheduled for use by the unified shader pipeline.

In effect 'you throw shaders at the GPU' and 14 of the 18 CUs run those shaders, perfectly synchronized and with data flowing in a nice & predictable manner. That's your "rendering pipeline".

The other 4 CUs are not plugged in to that pipeline, nor is there any obvious way to plug them in. Instead they can be programmed/scheduled separately - in a similar manner to methods used on the PS3 Cell.

The 4CUs are likely intended to be used by:
- PS Move (if enabled). The algorithm was apparently very optimized for running on the Cell, and this should be a good match.
- stereoscopic 3d vision stuff (if enabled). It's a lot of data being compared with a lot of data as output.

But if you aren't doing either, it probably leaves 4 CUs "free".

As a non-pro who doesn't know what CUs are capable of these days (nor the advanced techniques), my list of things that seem like possibilities would be:
- hair/cloth modelling.
- AI pathfinding/line-of-sight?
- "static" shadows.
- dynamic textures/terrain (e.g. water)
- particles.
- physics collision checks.
Yes, I guess that when PS Eye is not used the 4 CUs can help as any other CU out there in the graphics pipeline, adding flexibility and power to the system.

Particles and so on will greatly benefit from those CUs -Crysis 2 at top settings on a PC is a great example of this-.

I think the first image in the thread is not quite right though. I remember @Laa Yosh saying that the trailer that picture comes from used 128bit HDR rendering, which I don't think next gen consoles are going to pull off.
 
Back
Top