PlayStation 4 (codename Orbis) technical hardware investigation (news and rumours)

Status
Not open for further replies.
By "not completely round", could it be that the "balanced at 14 units" refers to 14 units having rops and textures and all the usual , and 4 units not having them to save silicon which could explain why using these 4 units doesn't fully contribute to graphx?
 
By "not completely round", could it be that the "balanced at 14 units" refers to 14 units having rops and textures and all the usual , and 4 units not having them to save silicon which could explain why using these 4 units doesn't fully contribute to graphx?

Ousting the ROP's and texturing units would not save a lot of silicon, in the long run it would require extra worth for a net negative gain with a tiny gain in over all silicon, iirc the biggest space in a GPU , just like in a CPU is the caches, and thats something you can't just chop off to benefit compute only.
 
By "not completely round", could it be that the "balanced at 14 units" refers to 14 units having rops and textures and all the usual , and 4 units not having them to save silicon which could explain why using these 4 units doesn't fully contribute to graphx?

No, no, no, and no.

The whole 14+4 thing as many have said is quite likely just a misunderstanding of an example that Sony likely gave on how you could use graphics and compute at the same time in an application.

Regards,
SB
 
You are right we are both wrong:
PS4 20gb/s (10gb/s x 2)
X1 41,6gb/s (20,8 gb/s x 2)
68 gb/s is to northbridge...

Also the coherent access to GPU CPU is 10gb/s PS4 - 30gb/s X1.

You are quoting dissimilar numbers. You can't double the Xbox One figures because those already include both directions where the PS4 numbers are for a single direction. Same for the coherent link.

Well, actually in reality it seems that PS4 doesn't have a dedicated audio block. Cerny is great in PR, but he is simply referring to a compression decompression unit, similar to the one that is present on the Xbox360 but all the heavy audio task will be performed by CPU (or CU).
For istance, X1 has a huge audio block with 4/5 different Chip/CPU.
Pay a visit to the Versus Audio topic you will learn a lot there (at least I have)!!!

Yes, in actuality and with no evidence it's easy to conclude something doesn't exist even when we've been told it does. Amazing! What we don't know is what the PS4's audio hardware's full capabilities are or how they compare to the Xbox One's.
 
By "not completely round", could it be that the "balanced at 14 units" refers to 14 units having rops and textures and all the usual , and 4 units not having them to save silicon which could explain why using these 4 units doesn't fully contribute to graphx?

No, as has been said numerous times before, all 18 CUs are identical.

The gist of the 14+4 thing is that there is a point where throwing additional ALU resources at rendering gets you diminishing returns and so devs might get more value using those additional resources for compute work.

Incidentally, you'll see what I mean later today.;)
 
By "not completely round", could it be that the "balanced at 14 units" refers to 14 units having rops and textures and all the usual , and 4 units not having them to save silicon which could explain why using these 4 units doesn't fully contribute to graphx?

There's much less that can be removed than is being claimed. GCN already separates much of these already.

GPUs aren't architected so that the ROPs are directly tied to the ALUs. They send data to the ROPs over an export bus, so they've been physically separated already.

Texture hardware is a component of the memory pipeline for a CU, so it can't be ripped out without providing a replacement design. I do not see major gains from doing this. The rest of the CU isn't changing, and the rest of the cache and load/store units would still be there.
Four slightly more compact CUs don't save much in the big picture. It would be tiny slivers of freed silicon in part of the GPU that would have to be reworked to reclaim the area, otherwise the chip remains the same size and all we have is money flushed redesigning a reduced-functionality CU, more complicated management, and a set of CUs that complicate matters for redundancy.

The rest of the graphics pipeline is also physically separate and works with the CUs via some kind of bus, leaving any subset of the CUs equally as free of them as the rest.
 
You are quoting dissimilar numbers. You can't double the Xbox One figures because those already include both directions where the PS4 numbers are for a single direction. Same for the coherent link.

You are wrong, it seems that PS4 has less then 20gb/s r&w to cpu; X1 has more then 40gb/s r&w :

http://beyond3d.com/showthread.php?t=63317


Yes, in actuality and with no evidence it's easy to conclude something doesn't exist even when we've been told it does. Amazing! What we don't know is what the PS4's audio hardware's full capabilities are or how they compare to the Xbox .

Sorry to say that but you are wrong again. Ps4 only has a compression decompression unit x audio.
Again, read the Audio topic if you wish. Veeeery good read.
But, you are not courious to know how things really are?
 
You are wrong, it seems that PS4 has less then 20gb/s r&w to cpu; X1 has more then 40gb/s r&w :

http://beyond3d.com/showthread.php?t=63317

I would hardly take that as gospel. You really think they went to the effort in custom engineering interfaces on the CPU Modules, you would then have to change behaviors of things like the L2 prefetcher etc. If i was a betting man, im going to say the interconnect interfaces on the CPU modules on both consoles are exactly the same. You really think AMD doesn't know how much bandwidth they need to feed 8x128bit AVX instructions at there target range of IPC.
 
You are wrong, it seems that PS4 has less then 20gb/s r&w to cpu
Not quite true. There's <20 GB/s marked, and also 10 GB/s shared Onion/Onion+, meaning < 30 GB/s total for CPU. That diagram also shows XB1 has 20.8 GB/s each way for two CPU modules, so actually has 83.2 GB/s total BW to the CPUs, which can be fed from the DDR3 and CPU Memory system according to the arrows. Although they may just be inconsistent with their labelling, and have two arrows to represent a 20.8 GB/s bidirectional bus, which seems far more likely.
 
I don't think that you can double the bandwidth of the CPU <-> memory controller, based on the two arrows shown.

Unlike, for instance, PCI Express, in which each lane is comprised of two differential pairs, each going in opposite directions, thus (for Gen.3) 8Gb/s in each direction, giving 16Gb/s aggregate (http://en.wikipedia.org/wiki/PCI_Express#Lane), the buses inside SoCs tipically are bidirectional, and can only transfer in one direction each clock, like the buses used for RAM.

You can, however, have multiple buses, like the PS2 had one 1024bit for write, one 1024bit for read, and one 512bit for read/write to/from the eDRAM, but I don't think that's the case. This one is a special purpose bus, not a general purpose one like a CPU bus.

Taking numbers out of my ass, let's say there's a 256bit bus at 800MHz, 25.6GB/s (yes, I know these numbers don't match up with the leaks, maybe they reserve some bw to Onion/Onion+, etc.).

If they instead were to put in one 256bit read and one 256bit write, they would double the size and complexity of routing those signals, and it would be wasteful not to use them for 512bit read/write.

On another note, the various (leaked?) diagrams make it look like PS4 has a single memory controller access bus for the 2 CPU modules, and XB1 has two independent buses, thereby doubling the CPU BW.
 
I don't think that you can double the bandwidth of the CPU <-> memory controller, based on the two arrows shown.
I generally agree. The only reason I question it is because other arrows in the diagram are double-headed. If there's no difference (a 20 GB/s bidirectional bus can be shown by one arrow with 2 heads labelled read/write, or with two separate arrows for read and write), why does the diagram bother to use different representations? Why not just have double-headed arrows for all interconnects?

I'm not saying it's one thing or the other - just my understanding is now a little more shaky now that I've looked at the diagram again, because I remember referencing those arrows before when talking about the buses in the XB1 BW boost rumour discussion.
 
I think it's because the GPU bus is asymmetric, it has 170GB/s read and 102GB/s write. So they show the other read and write BWs separately for clarity.

But this is getting OT, unless someone leaks a similar diagram for the PS4 :oops:
 
Edit: NVM. Reading about how Garlic and Onion operate on Llano changes everything. I'll have to try this again.
 
Last edited by a moderator:
No, as has been said numerous times before, all 18 CUs are identical.

The gist of the 14+4 thing is that there is a point where throwing additional ALU resources at rendering gets you diminishing returns and so devs might get more value using those additional resources for compute work.

That's exactly what the VGLeaks doc stated months ago with regards to 14+4 (the +4 provide a "minor boost if used for rendering"). So there's something limiting about the design of the PS4 or the resolution this platform is targeting (1080p) that makes using more than 14 CUs 'wasteful.' The rendering performance must have a significant dip in efficiency for this to surface in the leaked docs and mildly confirmed by Cerny in his interview.

Clearly there is not an inherent dip on all graphic subsystems at 14CUs or else AMD and nvidia would've focused their energy elsewhere once they hit this magic number on their desktop cards. It's either a bottleneck somewhere else in the PS4 rendering pipeline or just that fact that 1080p is 'maxed out' at this point.
 
That's exactly what the VGLeaks doc stated months ago with regards to 14+4 (the +4 provide a "minor boost if used for rendering"). So there's something limiting about the design of the PS4 or the resolution this platform is targeting (1080p) that makes using more than 14 CUs 'wasteful.' The rendering performance must have a significant dip in efficiency for this to surface in the leaked docs and mildly confirmed by Cerny in his interview.

Clearly there is not an inherent dip on all graphic subsystems at 14CUs or else AMD and nvidia would've focused their energy elsewhere once they hit this magic number on their desktop cards. It's either a bottleneck somewhere else in the PS4 rendering pipeline or just that fact that 1080p is 'maxed out' at this point.
The GPU can use all CUs when there is enough job for them.
There are cases in rendering pipeline when ALUs are not fully used and the idea is to give them something else to do during those times.
IE. G-buffer and shadowmap generation, where ROPs and texture units get to work, but ALUs are mostly twiddling their thumbs.
 
I’ve just been looking at the Anandtech article on Jaguar and the TDP details are quite ambiguous; it shows that 4 active cores @ 1ghz on Kabini consume 4 watts of power, so presumably 8 watts powering two CUs (8 CPUs). Unfortunately, there’s no indication as to power requirements for the processor @ 1.6ghz, only that 1.6 is the ‘sweet-spot’. If we assume it’s 66% for a 400mhz increase (from 1.6ghz to 2ghz), it’d be fairly safe to assume the increase is approximately 16.5% per 100mhz. A 600mhz increase from 1ghz to 1.6ghz would maybe be around a 100% increase in consumption, so the total for the 8core CPU would be around 16 watts @ 1.6ghz (probably a bit lower). An increase to 2ghz, should take it to about 26.5 watts in total (16 + 66%).

I think it’s been speculated recently that the total TDP for the whole system is 100 watts, so the GPU would presumably take somewhere between 50 and 85 watts to power, as all of the other components should be taken into consideration too. If we take the approximately 75 watts as the total for the GPU (leaving 10 watts for the rest of the system) and increase that by the same percentage difference as the CPU (75watts / 8 = 9.375 + 16.5% = 14.5 watts for every 100mhz increase), the additional 200mhz from 800mhz to 1ghz should be about a total of 104 watts (75 + 14.5 + 14.5) for the GPU overclock.

If the system were to be overclocked to 1ghz GPU and 2ghz, the TDP could be around the 140 watts (including the 10 watts for the rest-of-system)?
I’m almost certainly talking out of my ass though, so anyone (everyone?) that’s brighter than me should correct my misinterpretation of the numbers. I would imagine that the GDDR5 memory must take a significant amount to power, but I cannot speculate on that number.
 
The fact that PS4 will have 4,5 Gb of direct memory + 0,5 of flexible (?) for gaming, gives me a strong suspect that the memory reservation will be used also for speech recognition and some kind of simplified NUI (maybe via move).

So, if this is true, I come back to my theory (that starts to have
renowned representatives..). I assume that also a % of CUs would be involved in some kind of reservation, as PS4 lack dedicated hardware for speech recognition & motion sensing.

Yes, to me, CU s are the new SPU.
 
If Sony's reservation is similar to MS's, which in CPU and RAM it appears to be, then they'll reserve about 2 CUs for the system. Although it may not be 2 actual CUs, it might be something nebulous like 10%.

Did PS3 and 360 have such a GPU reservation? (I honestly don't know..)

And what would be a reasonable allottment of resources for the OS GPU wise? Or is it even necessary?
 
Status
Not open for further replies.
Back
Top