What is PS4's 14+4 CU thing all about? *spawn

Ceger · Oct 3, 2013

Rangers said:
He says:

Seems reasonable to this layman.

Maybe if he has inside info as that level of detail was never released publicly, though it could be feasible if it was true. However, in the latest DF article they talked about the concurrent render pipes, but they were pretty clear that the main pipe (game) had prioritization over the lower priority OS pipe. That seems to be contrary, at least without any info provided to having specific priority CU operations.

Perhaps Adiv can elaborate on where that assessment comes from?!

adev · Oct 3, 2013

Betanumerical said:
A ring of truth? the high priority VFX pipe can't do wavefront already know that, and how does it make any sense for the OS to reserve parts of the CU register space?. His posts smell of someone who knows something but is just making it up as he goes along.

That was a mistype in my first post.

The graphics queue obviously queues graphics jobs, not GPGPU.

The resource reservation system is accurate, it can reserve SGPRs, VGPRs, LDS and wavefront slots.

Betanumerical · Oct 3, 2013

adev said:
That was a mistype in my first post.

The graphics queue obviously queues graphics jobs, not GPGPU.

The resource reservation system is accurate, it can reserve SGPRs, VGPRs, LDS and wavefront slots.

Why would it reserve registers?. It makes no sense, the OS doesn't require any GPU register whilst the game is running.

shredenvain · Oct 3, 2013

Has sony stated what percentage of their gpu will go towards os and and camera etc?

I mean it seems to me if sony are allocating around 3gigs of ram for their os wouldnt they have to also allocate a % of their gpu to do some of the features they are offering.

patsu · Oct 3, 2013

Betanumerical said:
Why would it reserve registers?. It makes no sense, the OS doesn't require any GPU register whilst the game is running.

Awww... let him say more first before shooting.

adev said:
The resource reservation system is accurate, it can reserve SGPRs, VGPRs, LDS and wavefront slots.

It can doesn't mean it will or must. Would be great if you can say where you get these info from.

In your opinion, under what circumstances do these resources get reserved ?

3dilettante · Oct 3, 2013

adev said:
The PS4 can mask which CUs are used for certain jobs.

The PS4 also can reserve GPU memory (LDS/register space, etc.) on individual CUs for use by specific tasks.

The VSHELL has reserved memory on "several" CUs 100% of the time and I wouldn't be surprised if VSHELL tasks are masked to those CUs.

There's a graphics command queue dedicated to VSHELL and it can create high priority GPGPU tasks.

This was something I speculated about as a reason why there could be a different performance profile for a portion of the CUs, instead of the other speculation of physically distinct CUs.

This makes more sense now that we've seen leaks about the VSHELL pipe.
One portion of the info on Vgleaks indicated the VSHELL command processor couldn't perform compute, however a later part seems to state that it can arbitrate in the same pool as the 8 compute pipes, instead of getting its own compute pipe like the primary GFX ring.

I let the topic drop because, even though reserved resources fit what was being done, no additional information was forthcoming to say this particular choice had been made.
If true, it would allow for a lower cycle overhead for idled OS tasks, but occupancy and prioritization shifts could cause subtle differences for games.

patsu said:
Who can issue commands to the VSHELL ? Only the OS ?

Other posters indicated that VSHELL is what the OS is called.

EDIT:
I vaguely recall 3dilettante suggested that the PS4 OS may reserve some GPU resources 100% of the time, perhaps for serving game/app requests ? What might those (high priority jobs) be ? ^_^

At the time, I had speculated the OS and/or runtime could put long-running shaders in before the game was permitted to arbitrate for access. The disclosure about the VSHELL pipe makes the reserve more strongly enforced by the GPU's allocation logic.

Reserved storage and wavefront slots allow high-priority tasks to have an immediate landing spot for requests. In the absence of preemption, long-running shaders could sit on a CU for an unpredictable amount of time.

The impact wouldn't be directly measured in cycles reserved, rather in terms of changed behavior where certain CUs would not treat game shader priorities in the same way, or would never schedule shaders whose resource requirements made them too big to fit in the same storage as the OS reserve.
The ALUs would be potentially available when the VSHELL is not actively executing something, so a percentage of GFLOPs wouldn't be automatically lost so much as there would be non-uniform behavior from the CU array.

I wonder if the latency of this scheme would be predictable enough and low enough for the audio processing Sony thinks is possible on the GPU.
Aside from that, until the decision was made to not make the camera standard, some of the processing could go there as well. It may very well still have a camera reserve of sorts, just in case.

Betanumerical said:
Why would it reserve registers?. It makes no sense, the OS doesn't require any GPU register whilst the game is running.

If a CU's registers have already been allocated, no new wavefronts can be initiated on it until one of the other allocations is freed.
In a loaded scenario, it may be that all 18 won't have enough spare capacity for an indeterminate period of time. That would mess with system tasks that require predictable or lower latency.

patsu · Oct 3, 2013

Yeah, at least we got some new (fake or real) things to talk about, rather than the same old same old 14+4 bullet.

Betanumerical · Oct 3, 2013

3dilettante said:
If a CU's registers have already been allocated, no new wavefronts can be initiated on it until one of the other allocations is freed.
In a loaded scenario, it may be that all 18 won't have enough spare capacity for an indeterminate period of time. That would mess with system tasks that require predictable or lower latency.

That makes sense but surely rendering the OS overlay / the GUI does not require 4 CU's worth of power. Not to mention that reserving GPR's would effect GPGPU just as much as FF GFX.

3dilettante · Oct 3, 2013

It wouldn't need to reserve them all, but it would probably need to be conservative this early in the generation.
Scalar registers can't be fully allocated to any single wavefront, but vector registers on a SIMD may.
The goal would be to have enough storage on hand as soon as possible, and since register allocation and de-allocation are done at wavefront initialization and termination, opportunities to increase the register count may not come in a timely fashion.
It does seem like enough is free for at least some game shaders to migrate there.

Allowing for enough allocations ready to go and potentially keeping some wavefront slots in a sleep state or ready for immediate use would keep other shaders from starting on the affected CUs.
Camera processing, or system libraries could be a source of work.

patsu · Oct 3, 2013

Betanumerical said:
That makes sense but surely rendering the OS overlay / the GUI does not require 4 CU's worth of power. Not to mention that reserving GPR's would effect GPGPU just as much as FF GFX.

I wasn't thinking about reserving resources in 4 or all 18 CUs. The exact number is not so important at this point to me. Without more info, I don't think it's necessary to hog a large chunk of resources too.

I am more interested in the run-time policies (allocation, priority, scheduling, pre-emption, memory behavior, and such). Specifically, the strings a low-level programmer can pull to make things work well together.

EDIT:

3dilettante said:
It wouldn't need to reserve them all, but it would probably need to be conservative this early in the generation.
Scalar registers can't be fully allocated to any single wavefront, but vector registers on a SIMD may.
The goal would be to have enough storage on hand as soon as possible, and since register allocation and de-allocation are done at wavefront initialization and termination, opportunities to increase the register count may not come in a timely fashion.
It does seem like enough is free for at least some game shaders to migrate there.

Allowing for enough allocations ready to go and potentially keeping some wavefront slots in a sleep state or ready for immediate use would keep other shaders from starting on the affected CUs.
Camera processing, or system libraries could be a source of work.

I was wondering if some of these OS tasks were requested by the games/apps. The game process has its own memory + system heap shared with the OS.

3dilettante · Oct 3, 2013

Using reserved resources to provide a consistent baseline that doesn't require explicit carving out of a game's performance budget or worries about interference from other processes was a benefit that was mooted for the other console.
It seems like it would work here as well.

Rangers · Oct 3, 2013

3dilettante said:
Using reserved resources to provide a consistent baseline that doesn't require explicit carving out of a game's performance budget or worries about interference from other processes was a benefit that was mooted for the other console.
It seems like it would work here as well.

I thought X1 DID carve out a hard amount of resources?

Or maybe I'm misunderstanding, you just mean that game developers know they have "X"amount available to them that will never be tampered with?

3dilettante · Oct 3, 2013

It does, which provides for easier development and consistent performance.

Possible examples for the PS4: the method to call the audio decode hardware could be run through an OS-reserved core and parts of the camera's processing pipeline could run through a CU.

Possibly, a set of base services could be baked for convenience and consistency. Some of them might abstracted as a result of the secondary processor's role as load balancer for disk and network I/O, in which case the reason for reserved resources may also be to keep certain areas hidden (DRM) while still providing their services.

This is more pie in the sky, but I've thought of schemes where developers may someday opt out of specific service slots with the understanding that they'd have to provide them. In theory, if they can do better or integrate it better, they can have memory or performance to use elsewhere, while the system knows it could reduce certain reservations knowing they won't be used.

Averagejoe · Oct 3, 2013

Rangers said:
At possibly less visual return...(If Sony is to be believed)

The context in which they say minor boost if use for graphics is not even clear,they could had been talking about the so call extra alu and not the CU as a whole.

But i don't think anything change even if it was really 14+4 because you have 400+gflops for compute and that is a allot compute power.

It doesn't matter if the PS4 has 18 CU for anything (like Cerny him self say) or 14+ 4 in the end it is 18 CU,either been use for one thing or another

Arwin · Oct 3, 2013

3dilettante said:
This is more pie in the sky, but I've thought of schemes where developers may someday opt out of specific service slots with the understanding that they'd have to provide them. In theory, if they can do better or integrate it better, they can have memory or performance to use elsewhere, while the system knows it could reduce certain reservations knowing they won't be used.

Doesn't the PS3 work that way? It did it very inefficiently at first, but all OS stuff like friend's list etc were different modules that could be loaded or not, and if loaded, would take a certain fixed portion of RAM or something such.

Averagejoe · Oct 3, 2013

function said:
The 14 + 4 thing came from a Sony slide (though not one meant for public consumption of course).

Thankfully ERP has stepped in now and no-one can argue any more. Yes, diminishing returns and a knee in the performance curve for "graphics". That's what Cerny meant, and that's what the MS Fellows were talking about too.

Niether Cerny nor the MS fellows are contradicting each other when they talk about "balance", so it's pretty amazing that the internet has managed to whip up a shit storm and hate campaign.

I am not sold on the whole diminishing returns part,specially when there are GCN with more than double the CU of those found on the 7790 and perform quite better.

Is not that i don't believe the whole diminishing returns,is a question of how much diminishing returns.

Every GCN out there with more CU performs better than those with less CU.

mrcorbo · Oct 3, 2013

Averagejoe said:
I am not sold on the whole diminishing returns part,specially when there are GCN with more than double the CU of those found on the 7790 and perform quite better.

Is not that i don't believe the whole diminishing returns,is a question of how much diminishing returns.

Every GCN out there with more CU performs better than those with less CU.

It would probably be trivial to create a CPU-limited scenario where a GCN-based card with more CUs performed no better than one with less. If your performance is being held back by the lack of one resource, throwing more of another resource at it isn't going to add to your performance.

upnorthsox · Oct 3, 2013

mrcorbo said:
It would probably be trivial to create a CPU-limited scenario where a GCN-based card with more CUs performed no better than one with less. If your performance is being held back by the lack of one resource, throwing more of another resource at it isn't going to add to your performance.

That's a bottleneck, not diminishing returns.

Averagejoe · Oct 3, 2013

astrograd said:
Evidently Cerny and his team at Sony very much disagree. There would seem to be bottlenecks somewhere along the line to prevent effective utilization of those 4 CU's for purely graphical rendering tasks.

Yeah that is why the 7790 performs better than the 7770,and the 7850 better than the 7790..

7970>7950>7870>7850>7790>7770

This is a line on PC each one with more or less CU than the other and in all scenarios the one with more CU performs better this is 100% accurate and we haves tons and tons of benchmarks proving this without shadow of a doubt.

Unless something is horrible wrong inside any of the next gen consoles this should hold pretty well.

This is what make the whole theory of diminishing returns so hard to swallow.

mrcorbo · Oct 3, 2013

Can anyone provide any insight into what elements of the GPU are in play when executing regular GPU operations that are not in play (or are utilized differently) when executing GPGPU operations? More to the point, maybe; what is it about GPGPU operations that would allow them not to be effected (or effected less) by whatever is bottlenecking the regular GPU performance?

What is PS4's 14+4 CU thing all about? *spawn

Ceger

adev

Betanumerical

shredenvain

patsu

3dilettante

patsu

Betanumerical

3dilettante

patsu

3dilettante

Rangers

3dilettante

Averagejoe

Arwin

Now Officially a Top 10 Poster

Averagejoe

mrcorbo

Foo Fighter

upnorthsox

Averagejoe

mrcorbo

Foo Fighter

Similar threads