PlayStation 4 (codename Orbis) technical hardware investigation (news and rumours)

Titanio · Jan 28, 2013

14 CUs under control of the regular hardware thread scheduler ('hardware balanced')

4 CUs on a separately scheduled regime, perhaps under explicit programmer control

Make sense, or..?

The 4 CUs the 'GPU-like compute module' Eurogamer referred to?

What would be the reasoning for this? A means to segregate task types on GPU if hardware scheduling and switching of mixed compute/graphics contexts is less than optimal under GCN? At the expense at some extra programmer burden?

The idea of splitting graphics work across two different groups of resources may not have quite died with PS3

But at least these 'SPEs' will run regular shader code...

dragonelite · Jan 28, 2013

mrcorbo said:
The CUs in AMD's GCN architecture correspond to groups of stream processors. If my math is correct, 18 CUs would contain 1,152 stream processors.

Edit - As you found yourself.

Jep that is right i believe.

18 compute units each having 16 stream processors and each stream processors doing the same instruction for 4 cycles. I could be wrong with this info i also come up with 1152 stream processes.

Edit: Should probably be each stream processors has 4 vector units but im not sure anymore

AlphaWolf · Jan 28, 2013

arijoytunir said:
thuway from neogaf who did the prediction of durango ram speeds right is saying that theres 2 CUs as special sauce . http://www.neogaf.com/forum/showpost.php?p=47000636&postcount=715

If you believe redundancy is a euphemism for special sauce.

arijoytunir · Jan 28, 2013

updated info on gpu for ps4

UPDATE: some people is confused about the GPU, here you have more info about it:

Each CU contains dedicated:

- ALU (32 64-bit operations per cycle)

- Texture Unit

- L1 data cache

- Local data share (LDS)

About 14 + 4 balance:

- 4 additional CUs (410 Gflops) “extra” ALU as resource for compute

- Minor boost if used for rendering

Dual Shader Engines:

- 1.6 billion triangles/s, 1.6 billion vertices/s

18 Texture units

- 56 billion bilinear texture reads/s

- Can utilize full memory bandwith

8 Render backends:

- 32 color ops/cycle

- 128 depth ops/cycle

- Can utilize full memory bandwith

liolio · Jan 28, 2013

So, if true, pretty much a confirmation of what were leaked a few weeks ago.

I find bothering that part of the leak: bigger than any of AMD APU, along with GDDR5 I could definitely see the system be costlier than Durango.
I would put the die size somewhere between 280mm^2 and 300mm^2.

At least it should perform better than Durango. I think that pricing and online policies are going to be the factor that set the tide wrt which system get the biggest user baser, business wide (/income) the pictures could be different though.

Entropy · Jan 28, 2013

arijoytunir said:
can someone explain the 922 gigaops?

When calculating FLOPs it is customary, for marketing reasons, to count a fused Multiply and Add as two FLOPs. However, that is not the rate at which the processor can process instructions in more common lingo, hence the gigaops being half the maximum FLOPs rate.

Pretty redundant information, I'd say, but it does make (semi-)explicit that the processor in question support FMADD at full instruction rate.

SunSunich · Jan 28, 2013

Could be that this 4 cu reserved for physics or ai calculations?

McHuj · Jan 28, 2013

SunSunich said:
Could be that this 4 cu reserved for physics or ai calculations?

Probably any kind of computation that's easily vectorizable.

babybumb · Jan 28, 2013

SunSunich said:
Could be that this 4 cu reserved for physics or ai calculations?

Looks like it will perform worse for rendering than the 14 other? Minor boost?

arijoytunir · Jan 28, 2013

Entropy said:
When calculating FLOPs it is customary, for marketing reasons, to count a fused Multiply and Add as two FLOPs. However, that is not the rate at which the processor can process instructions in more common lingo, hence the gigaops being half the maximum FLOPs rate.

Pretty redundant information, I'd say, but it does make (semi-)explicit that the processor in question support FMADD at full instruction rate.

thanks :smile:

Dual Shader Engines:

- 1.6 billion triangles/s, 1.6 billion vertices/s

how does it stacks up compared to ps3's traingle/s count ?

Deleted member 7537 · Jan 28, 2013

http://m.neogaf.com/showpost.php?p=46998393

I think I have an explanation for the '18 texture unit' thing. I think it's just another semantics issue about how things are referred to in different contexts.

In some AMD documentation, they talk about each CU having one texture unit e.g. in here: http://www.siliconwolves.net/frames/...ming_Guide.pdf

Each compute unit contains 64 kB local memory, a texture unit with 16 kB of L1
cache, and four vector units.
The texture unit has 4 texture filter units (what we refer to as 'texture units' or tmus) and 16 texture load/store units.

Click to expand...

So, like the render backend/ROPs thing, I think it's talking about 18 'texture units' in this sense - the group of 4 texture filter units in each GCN CU.

DieH@rd · Jan 28, 2013

18 Texture units

- 56 billion bilinear texture reads/s

- Can utilize full memory bandwith

How does this stack to 7850/7870?

upnorthsox · Jan 28, 2013

DieH@rd said:
How does this stack to 7850/7870?

7850 is 55b so right there/same.

Scott_Arm · Jan 28, 2013

No mention of big BluRay discs for 4k. Think the camera is part of the bundle?

Looks like they've updated the GPU info:

UPDATE: some people is confused about the GPU, here you have more info about it:

Each CU contains dedicated:

- ALU (32 64-bit operations per cycle)

- Texture Unit

- L1 data cache

- Local data share (LDS)

About 14 + 4 balance:

- 4 additional CUs (410 Gflops) “extra” ALU as resource for compute

- Minor boost if used for rendering

Dual Shader Engines:

- 1.6 billion triangles/s, 1.6 billion vertices/s

18 Texture units

- 56 billion bilinear texture reads/s

- Can utilize full memory bandwith

8 Render backends:

- 32 color ops/cycle

- 128 depth ops/cycle

- Can utilize full memory bandwidth

Edit:
I like that a 500 GB HDD will be the standard storage. I was expecting 250 GB at the low end, but this is much better.

DuckThor Evil · Jan 28, 2013

Well I think it means that 14 CUs is enough for rendering in this setup and the last 4 CUs are better spent on other tasks, but can also be used for rendering.

3dilettante · Jan 28, 2013

babybumb said:
Looks like it will perform worse for rendering than the 14 other? Minor boost?

Some additional information would have to explain why this would be the case. GCN is structured such that the rest of the system really doesn't know which CU results came from. The rest of the specs show a system just as balanced for 18 CUs as regular GPUs.

That doesn't rule out some design specific reason why they can't contribute equally by providing 30% more shader throughput, but the data given so far provides no reason.

liolio · Jan 28, 2013

DieH@rd said:
How does this stack to 7850/7870?

Not just for you but mods should put a stop (/prune) to people having posts consisting of only a bare question especially when the answers are easily available on the web...
http://www.hardware.fr/articles/856-7/performances-theoriques-pixels.html

You might even be able to calculation the GPU clock speed

deanos · Jan 28, 2013

Dr Evil said:
Well I think it means that 14 CUs is enough for rendering in this setup and the last 4 CUs are better spent on other tasks, but can also be used for rendering.

but they didnt have to split the CUs in order to do that.

DuckThor Evil · Jan 28, 2013

deanos said:
but they didnt have to split the CUs in order to do that.

Are they split in some way? I just think that past 14 CUs the CPU or something else in the system is starting to be a bottleneck and you get overall better performance/balance by dedicating a few CUs to help/boost the CPU.

dumbo11 · Jan 28, 2013

arijoytunir said:
14+4 CUs could be related to the graphics context switching . when not gaming - only the 4 CUs would be running with cpu .

i think the liverpool gpu is designed as 18 CUs gpu . One exception is that the 14 CUs can be turned off when not needed or can be used as total 18 CUs for graphics if the developer choose to.

Not exactly - it seems like that 'hardware balanced' is outlined here:
http://en.wikipedia.org/wiki/Unified_shader_model

As I understand it:
- when you 'run random shaders', they will be automatically scheduled/run across 14 CUs, the same as a normal modern graphics card.
- the other 4 CUs are independently controlled by the developer/their API.

I'm guessing that it might be better for physics or something like that - as it should be more reliable? (or doing AA there? dunno much about that)

I assume you could just use them as 4 more CUs, although it wouldn't be as efficient.

EDIT: an obvious use might be something involving that dual-camera setup, or move...?

PlayStation 4 (codename Orbis) technical hardware investigation (news and rumours)

Titanio

dragonelite

AlphaWolf

Specious Misanthrope

arijoytunir

liolio

Aquoiboniste

Entropy

SunSunich

McHuj

babybumb

arijoytunir

Deleted member 7537

Guest

DieH@rd

upnorthsox

Scott_Arm

DuckThor Evil

3dilettante

liolio

Aquoiboniste

deanos

DuckThor Evil

dumbo11

Similar threads