The capabilities of the 4 special CUs in Orbis

patsu · Feb 14, 2013

I suspect it comes down to software. Existing PC software are optimized for copying and "brute force" style GPU computing. They already paid the price for copying data between GPU and CPU.

Console programming can be optimized differently. Look at games run by RSX + Cell vs the equivalent nVidia part (7800 ?).

fehu · Feb 18, 2013

kagemaru said:
What I never understood is why separate these 4 CUs for anything specific if they are no different than normal CUs?

maybe being part of the cpu they can better be suited for compute tasks rather than in the gpu separated by an onchip bus
just guessing

3dilettante · Feb 19, 2013

One concern for maintaining consistent performance with the current formulation of GCN is the latency in getting a kernel set up and running on the GPU.

If a new task is requested, it has to get on a queue and the GPU's allocation hardware has to allocate the necessary resources, such as buffers and CUs with enough register space and local memory available to handle the kernel.

If the system is under load, queues may be filling up, or the CUs are running other shaders that occupy most of their storage. There are signs in the SI ISA manual (or whatever it gets renamed to) that AMD is increasing the amount queues to help with that possibility.
In terms of a kernel having to wait on resources and CUs to be freed up, preemption is not yet possible. If you don't want your high-priority functions sitting around until a CU finally gets released, in the absence of preemption you can instead reserve those resources ahead of time.
If you control what jobs can get to those resources, you can keep them freed up enough that your high-priority tasks don't find themselves stuck waiting on shaders that can run for an indeterminate period of time.

The price is an underutilization of the reserved resources, since some portion of their capabilities must be kept off-limits even when they're not needed.
The system functions also can't dream too big, since they only get this slice of the resources guaranteed.

upnorthsox · Feb 19, 2013

3dilettante said:
One concern for maintaining consistent performance with the current formulation of GCN is the latency in getting a kernel set up and running on the GPU.

If a new task is requested, it has to get on a queue and the GPU's allocation hardware has to allocate the necessary resources, such as buffers and CUs with enough register space and local memory avaliable to handle the kernel.

If the system is under load, queues may be filling up, or the CUs are running other shaders that occupy most of their storage. There are signs in the SI ISA manual (or whatever it gets renamed to) that AMD is increasing the amount queues to help with that possibility.
In terms of a kernel having to wait on resources and CUs to be freed up, preemption is not yet possible. If you don't want your high-priority functions sitting around until a CU finally gets released, in the absence of preemption you can instead reserve those resources ahead of time.
If you control what jobs can get to those resources, you can keep them freed up enough that your high-priority tasks don't find themselves stuck waiting on shaders that can run for an indeterminate period of time.

The price is an underutilization of the reserved resources, since some portion of their capabilities must be kept off-limits even when they're not needed.
The system functions also can't dream too big, since they only get this slice of the resources guaranteed.

Or manual manipulation of the resources to include them in on a larger job utilizing maximum resources. My guess though would be that it happens only if absolutely neccessary. Mostly it'll be batching up jobs to keep them busy, kinda like how it ended up happenning with Cell.

Nice post, I think you really captured the heart of what the configuration is and why its there.

patsu · Feb 19, 2013

Yeah, my guess is this is one of the reasons Sony had to "split up" the CUs for compute and rendering.

Is there any leaked libGCM documents on the net ? Would love to see what exactly it enables the developers to do with the hardware.

yewyew · Feb 20, 2013

I still stand by my opinion that the CU's are for CPU resources based on Sony's history with being CPU focused in their home console hardware.

The PS4's current supposed specs suggest a Vita-like dramatic shift to GPU reliance with the CPU being a possible bottleneck as opposed to the other way around for once. This possible CU sharing may be some sort of equivalent to Durango's eSRAM bandwidth sharing for ITS possible bottleneck with the main RAM...

Shifty Geezer · Feb 20, 2013

Those historic CPU resources have typically been used to render graphics...

yewyew · Feb 21, 2013

Well, we can now scratch this out. The GPU now has 18 Unified CU's...

http://www.scei.co.jp/corporate/release/pdf/130221a_e.pdf

Bagel seed · Feb 21, 2013

I don't think we'll be hearing of any further details of possible CU customizations.. at least according to some in the know.

The more esoteric operations will probably be NDA'd forever (like certain RSX elements) and confined to tech docs. 14+4 split may still technically be in. I guess we'll just have to wait for the die shots and see if those 4 extra SIMD are truly nestled in there, or if there's a physical split.

3dilettante · Feb 21, 2013

Or it's a logical reservation of resources, which doesn't matter physically and can be done with little visibility to game software.

Bagel seed · Feb 21, 2013

Right. Either way, I don't think we'll find out anytime soon or ever.

Ninjaprime · Feb 21, 2013

Oh hey look I was right. /gloat

patsu · Feb 21, 2013

Ha ha, not enough details in the PR to decide either way. It only says we can use all 18 or a mixture of the CUs for compute and graphics.

EDIT: I'm playing the presentation stream in the background. Mark Cerny did say the GPU is highly enhanced, whatever that means.

scently · Feb 22, 2013

patsu said:
Ha ha, not enough details in the PR to decide either way. It only says we can use all 18 or a mixture of the CUs for compute and graphics.

EDIT: I'm playing the presentation stream in the background. Mark Cerny did say the GPU is highly enhanced, whatever that means.

I imagine that it is based on AMD's next generation of gpus, or atleast it is what it is; a 7850 based gpu paired with a jaguar cpu, in an apu arrangement, thus it is enhanced.

patsu · Feb 22, 2013

*shrug* Going to wait for GDC.

Shifty Geezer · Feb 22, 2013

Does the PR mean GCN2? I was getting that vibe, but the message was unclear.

Deleted member 7537 · Feb 22, 2013

Cerny also refer to it as a "next-gen GPU" whatever that means. We'll have to wait for GDC or E3 to know.

patsu · Feb 22, 2013

The official PR says: http://www.scei.co.jp/corporate/release/pdf/130221a_e.pdf

The Graphics Processing Unit (GPU) has been enhanced in a number of ways, principally to allow for easier use of the GPU for general purpose computing (GPGPU) such as physics simulation. The GPU contains a unified array of 18 compute units, which collectively generate 1.84 Teraflops of processing power that can freely be applied to graphics, simulation tasks, or some mixture of the two.

Sounds like they loosened the 14+4 configuration.

expletive · Feb 22, 2013

patsu said:
The official PR says: http://www.scei.co.jp/corporate/release/pdf/130221a_e.pdf

Sounds like they loosened the 14+4 configuration.

They are not going to make any reference to a nuanced design decision like 14+4 anywhere near a press release. It would only cause confusion and draw attention away from 1.84TF or 8GB.

Whether its 18 or 14+4 that press release is still accurate in its wording.

DJ12 · Feb 22, 2013

Or it never really existed and was mearly a bad translation

The capabilities of the 4 special CUs in Orbis

patsu

fehu

3dilettante

upnorthsox

patsu

yewyew

Shifty Geezer

uber-Troll!

yewyew

Bagel seed

3dilettante

Bagel seed

Ninjaprime

patsu

scently

patsu

Shifty Geezer

uber-Troll!

Deleted member 7537

Guest

patsu

expletive

DJ12

Similar threads