The capabilities of the 4 special CUs in Orbis

I suspect it comes down to software. Existing PC software are optimized for copying and "brute force" style GPU computing. They already paid the price for copying data between GPU and CPU.

Console programming can be optimized differently. Look at games run by RSX + Cell vs the equivalent nVidia part (7800 ?).
 
What I never understood is why separate these 4 CUs for anything specific if they are no different than normal CUs?

maybe being part of the cpu they can better be suited for compute tasks rather than in the gpu separated by an onchip bus
just guessing
 
One concern for maintaining consistent performance with the current formulation of GCN is the latency in getting a kernel set up and running on the GPU.

If a new task is requested, it has to get on a queue and the GPU's allocation hardware has to allocate the necessary resources, such as buffers and CUs with enough register space and local memory available to handle the kernel.

If the system is under load, queues may be filling up, or the CUs are running other shaders that occupy most of their storage. There are signs in the SI ISA manual (or whatever it gets renamed to) that AMD is increasing the amount queues to help with that possibility.
In terms of a kernel having to wait on resources and CUs to be freed up, preemption is not yet possible. If you don't want your high-priority functions sitting around until a CU finally gets released, in the absence of preemption you can instead reserve those resources ahead of time.
If you control what jobs can get to those resources, you can keep them freed up enough that your high-priority tasks don't find themselves stuck waiting on shaders that can run for an indeterminate period of time.

The price is an underutilization of the reserved resources, since some portion of their capabilities must be kept off-limits even when they're not needed.
The system functions also can't dream too big, since they only get this slice of the resources guaranteed.
 
Last edited by a moderator:
One concern for maintaining consistent performance with the current formulation of GCN is the latency in getting a kernel set up and running on the GPU.

If a new task is requested, it has to get on a queue and the GPU's allocation hardware has to allocate the necessary resources, such as buffers and CUs with enough register space and local memory avaliable to handle the kernel.

If the system is under load, queues may be filling up, or the CUs are running other shaders that occupy most of their storage. There are signs in the SI ISA manual (or whatever it gets renamed to) that AMD is increasing the amount queues to help with that possibility.
In terms of a kernel having to wait on resources and CUs to be freed up, preemption is not yet possible. If you don't want your high-priority functions sitting around until a CU finally gets released, in the absence of preemption you can instead reserve those resources ahead of time.
If you control what jobs can get to those resources, you can keep them freed up enough that your high-priority tasks don't find themselves stuck waiting on shaders that can run for an indeterminate period of time.

The price is an underutilization of the reserved resources, since some portion of their capabilities must be kept off-limits even when they're not needed.
The system functions also can't dream too big, since they only get this slice of the resources guaranteed
.

Or manual manipulation of the resources to include them in on a larger job utilizing maximum resources. My guess though would be that it happens only if absolutely neccessary. Mostly it'll be batching up jobs to keep them busy, kinda like how it ended up happenning with Cell.

Nice post, I think you really captured the heart of what the configuration is and why its there.
 
Yeah, my guess is this is one of the reasons Sony had to "split up" the CUs for compute and rendering.

Is there any leaked libGCM documents on the net ? Would love to see what exactly it enables the developers to do with the hardware.
 
I still stand by my opinion that the CU's are for CPU resources based on Sony's history with being CPU focused in their home console hardware.

The PS4's current supposed specs suggest a Vita-like dramatic shift to GPU reliance with the CPU being a possible bottleneck as opposed to the other way around for once. This possible CU sharing may be some sort of equivalent to Durango's eSRAM bandwidth sharing for ITS possible bottleneck with the main RAM...
 
I don't think we'll be hearing of any further details of possible CU customizations.. at least according to some in the know.

The more esoteric operations will probably be NDA'd forever (like certain RSX elements) and confined to tech docs. 14+4 split may still technically be in. I guess we'll just have to wait for the die shots and see if those 4 extra SIMD are truly nestled in there, or if there's a physical split.
 
Or it's a logical reservation of resources, which doesn't matter physically and can be done with little visibility to game software.
 
Ha ha, not enough details in the PR to decide either way. It only says we can use all 18 or a mixture of the CUs for compute and graphics.

EDIT: I'm playing the presentation stream in the background. Mark Cerny did say the GPU is highly enhanced, whatever that means.
 
Ha ha, not enough details in the PR to decide either way. It only says we can use all 18 or a mixture of the CUs for compute and graphics.

EDIT: I'm playing the presentation stream in the background. Mark Cerny did say the GPU is highly enhanced, whatever that means.

I imagine that it is based on AMD's next generation of gpus, or atleast it is what it is; a 7850 based gpu paired with a jaguar cpu, in an apu arrangement, thus it is enhanced.
 
Cerny also refer to it as a "next-gen GPU" whatever that means. We'll have to wait for GDC or E3 to know.
 
The official PR says: http://www.scei.co.jp/corporate/release/pdf/130221a_e.pdf

The Graphics Processing Unit (GPU) has been enhanced in a number of ways, principally to allow for easier use of the GPU for general purpose computing (GPGPU) such as physics simulation. The GPU contains a unified array of 18 compute units, which collectively generate 1.84 Teraflops of processing power that can freely be applied to graphics, simulation tasks, or some mixture of the two.

Sounds like they loosened the 14+4 configuration.
 
Back
Top