PlayStation 4 (codename Orbis) technical hardware investigation (news and rumours)

Status
Not open for further replies.
Some info (and speculation) on the Secondary Custom Chip from Watch Impress.

The translation is a bit sketchy, but it would seem that the secondary custom chip and the video encode/decode hardware sit outside of the APU and it looks like it doesn't write the video to main memory.
I thought, there was a better translation of Mark Cerny's statements already:
Built-In Video Encoder for Video Sharing and Vita Remote-Play
Cerney: The PS4 has a dedicated encoder for video sharing and such. There are a few dedicated encoder and decoder functions which are available and use the APU minimally. This is also used for playback of compressed in-game audio in MP3 and audio chat.

When the system is fully on, the x86 CPU core controls the video sharing system. However the Southbridge has features to assist with network traffic control.
Nothing indicates the encoding hardware is in the extra chip. It wouldn't make much sense.

Edit:
After looking to Hiroshigo Goto's drawings, I will say his sketches are at least misleading, especially how and where he draws the "front end (queues)" (it's not any kind of front end as one usually knows it, it's probably the coherent request queue, providing the coherent connection [routes snoops to the other CPU module] of the two CPU modules to memory and very likely for coherent accesses of the GPU through the Onion link) and the "Fusion Compute Link (Onion)" (it should go from the GPU core to the coherent request queue).

Edit2:
The simplest explanation is that the video recording (or sharing) is using the VCE unit integrated in the main PS4 chip for encoding. Located there, it has easy access to the framebuffer content (same is true for the display outputs). The encoded stream is then routed to the second chip which controls the write to the HDD (or some flash buffer connected to it and which I suggested already for the connected standby/background download function so it doesn't congest the harddrive accesses of the game) and streaming over the net. That would completely fit with Mark Cerny's comment and would also reduce the size and power consumption in the standby mode (when only the small secondary chip is powered and the encoding hardware is not needed).
 
Last edited by a moderator:
The interesting thing about that WatchImpress article is its speculation regarding DisplayPort in PS4. ^_^

The article reads like the author's attempt to reverse engineer PS4 from info he gathered during GDC.
 
I thought, there was a better translation of Mark Cerny's statements already:

Nothing indicates the encoding hardware is in the extra chip. It wouldn't make much sense.

Edit:
After looking to Hiroshigo Goto's drawings, I will say his sketches are at least misleading, especially how and where he draws the "front end (queues)" (it's not any kind of front end as one usually knows it, it's probably the coherent request queue, providing the coherent connection [routes snoops to the other CPU module] of the two CPU modules to memory and very likely for coherent accesses of the GPU through the Onion link) and the "Fusion Compute Link (Onion)" (it should go from the GPU core to the coherent request queue).

Edit2:
The simplest explanation is that the video recording (or sharing) is using the VCE unit integrated in the main PS4 chip for encoding. Located there, it has easy access to the framebuffer content (same is true for the display outputs). The encoded stream is then routed to the second chip which controls the write to the HDD (or some flash buffer connected to it and which I suggested already for the connected standby/background download function so it doesn't congest the harddrive accesses of the game) and streaming over the net. That would completely fit with Mark Cerny's comment and would also reduce the size and power consumption in the standby mode (when only the small secondary chip is powered and the encoding hardware is not needed).

It seems they feel the encode/decode hardware is outside of the APU based on something talked about during GDC (the Google translation is pretty rough about that). From that they conclude that is only makes sense to have the display output go to Custom Chip instead of the standard display output for AMD APUs (diagrammed out). Since that should allow the encoder to use the frame-buffer directly for the Share functions. That's my interpretation of their interpretation, at least.
 
It's just a one-liner. "In GDC, it became clear that the gameplay (video) encoding is handled by another chip."

It is an ambiguous statement. Gipsel could still be right. The custom chip delivers encoded frames over the network or to the HDD (traffic control).

Overall, he thinks the secondary chip has rich functionality. He also wonders if the PS4 output system goes through the secondary chip. I don't quite understand his paragraphs about DisplayPort vs AMD APU. He seems to be saying DisplayPort is very compatible with (and optimized in some way for) the AMD APU.
 
Could low-power mode warrant not firing up the GPU, hence the display out and video compression could be on the southbridge assuming the SB can operate as a low power SOC? I doubt that as you'd need some display and some interface, which is surely going to be more powerhungry for stuff like video conferencing. That's the best reason to move the display there though IMO.
 
Could low-power mode warrant not firing up the GPU, hence the display out and video compression could be on the southbridge assuming the SB can operate as a low power SOC? I doubt that as you'd need some display and some interface, which is surely going to be more powerhungry for stuff like video conferencing. That's the best reason to move the display there though IMO.

The framebuffer is normally going to be in GDDR5, transferring it over an interconnect to a this south bridge chip so it can access the display controller there seems like a bad idea.

You don't have to be on a separate chip to be powered separately. If the APU is an even remotely modern design - and with GCN and Jaguar there's no reason it shouldn't be - it should be able to power gate the GPU and majority of CPU cores when not in use. Honestly I'm not sure what the real motivation behind the southbridge chip is (as opposed to putting it on the same chip as everything else). Sony has been promoting it as a power saver but I think there were other motivations..
 
Without massive architectural changes, it's not really suited to a primarily gaming device. It certainly indicates that Sony is aiming for a more general purpose box.

I think it's safe to assume on ps4 hardware all memory accesses go through mmu. It should be possible to run/map most stuff game developers need in user space and avoid context switching between kernel and user space. It shouldn't be a big rewrite to begin with as sony would anyway have had to write custom drivers for their hardware.

DRM related stuff would probably be inside kernel/hypervisor and slow things down to some extent when reading data from some mass storage. Though I doubt the io speed is such to begin with that context switching here would be a problem in practice.

So in short, I don't think it would be a huge rewrite. Sony doesn't have legacy on bsd that would need to be rewritten, most drivers would be brand new and written from scratch. And icing on the cake is unified memory access which would allow running most API's used in games in user mode while still retaining platform security and drm.
 
The framebuffer is normally going to be in GDDR5, transferring it over an interconnect to a this south bridge chip so it can access the display controller there seems like a bad idea.

You don't have to be on a separate chip to be powered separately. If the APU is an even remotely modern design - and with GCN and Jaguar there's no reason it shouldn't be - it should be able to power gate the GPU and majority of CPU cores when not in use. Honestly I'm not sure what the real motivation behind the southbridge chip is (as opposed to putting it on the same chip as everything else). Sony has been promoting it as a power saver but I think there were other motivations..

Some wild possibilities:

(1) It's a precursor to the Gaikai chip. It RemotePlays any media (At least one of the interviews mentioned that RemotePlay is handled by hardware). This can be reused by other Sony devices.

or/and

(2) As a secure traffic controller, it can prioritize, repackage and deliver network (progressive download), BR or HDD content to the AMD APU, or a Cell module for PS3 b/c :runaway:
Progressive download is part of Gaikai's offerings.


I think video encoding should happen inside the APU. 4K upscaling can also occur inside the APU (as part of the video unit, or the scan out engine ?).
 
The framebuffer is normally going to be in GDDR5, transferring it over an interconnect to a this south bridge chip so it can access the display controller there seems like a bad idea.

If I were paranoid about diverting resources from gaming to the record/share functionality, I wonder if the southbridge could act as a pass-through for the digital-out of the APU. It could at its discretion copy or encode on the fly, and the APU itself would not see a difference in its bandwidth.
The APU wouldn't know the difference, although this may inject latency into the display output process.


You don't have to be on a separate chip to be powered separately. If the APU is an even remotely modern design - and with GCN and Jaguar there's no reason it shouldn't be - it should be able to power gate the GPU and majority of CPU cores when not in use. Honestly I'm not sure what the real motivation behind the southbridge chip is (as opposed to putting it on the same chip as everything else). Sony has been promoting it as a power saver but I think there were other motivations..

Perhaps it's not standby, but background updates and downloads while the console is in sleep mode?
That might be 2-4 Watts total for Energy Star.
Power gating cuts off a lot of leakage, but chips on the scale of desktop APUs still burn a little power from the remaining uncore, and even power gating has a small leakage component. It's tiny relative to a 80-100W desktop, but large relative to the .5-4W range.
AMD's Temash, with maybe 2 cores and a miniscule GPU, may not leave enough margin even at idle. And that's not considering that Temash is heavily binned to hit the low range.

The likely need to bin the Orbis APU may be a reason to have a dedicated low-power core. Fewer functional APUs need to go in the reject pile for excessive static power or the design constrained by the task of bringing the APU down two (edit: possibly three) orders of magnitude from peak, if the one power regime where it matters is handled by a core whose whole yield curve can fit in it with margin to spare.
 
Last edited by a moderator:
Kind of makes sense from a multitasking perspective as well, if not only for power saving. If someone is playing a game on the console, it would be a bit shit if it couldn't stream video to another source (tablet, Vita, PC). In that case they'd want the encoding process to not touch any of the resources used by gaming as much as possible. Keeping in mind, while gaming the player could be live streaming at the same time. Supposedly 1/2 a CPU core is reserved for the OS, and maybe none of the GPU?

It seems they went to some lengths to not have to reserve large amounts of CPU/GPU cores to support the OS features so it might support this separated chip or division of output/encoding theory. Also, if the console is in standby you could still access the media on it without having to fire up the main APU and memory.
 
Some info (and speculation) on the Secondary Custom Chip from Watch Impress.

The translation is a bit sketchy, but it would seem that the secondary custom chip and the video encode/decode hardware sit outside of the APU and it looks like it doesn't write the video to main memory.



Which likely means its writing to the HDD, or some separate buffer dedicated to the video encode/decode hardware, or perhaps even shared with the secondary custom chip. A smaller secondary storage location of what could be currently undecided size would explain why we've not seen a firm confirmation of how much of the gameplay video is stored for upload. We've heard 15 minutes from rumor, but all official Sony statements seem to say "the last few minutes". Wherever its stored, it appears that the video can be saved until the system goes into standby and then upload automatically (perhaps a good option while playing online games).

They also speculate that the display output goes through the secondary chip instead of in the APU's itself (along with the supposition that the secondary custom chip includes the video encode/decode hardware as well):



As others have speculated, Watch Impress thinks there may be separate memory for the Custom Chip (and video hardware):



They also go quite a bit deeper into why they think that the custom chip sitting outside of the APU with the video encode/decode hardware is a Sony design, and not AMD. Good read overall, a solid translation would be very useful, though.


Starsha?

DCE 7.0
UVD 4.0
VCE
 
Last edited by a moderator:
Power gating cuts off a lot of leakage, but chips on the scale of desktop APUs still burn a little power from the remaining uncore, and even power gating has a small leakage component. It's tiny relative to a 80-100W desktop, but large relative to the .5-4W range.

Do you have a reference for how much static leakage decent power gating can actually allow? Because on current mobile SoCs their power down states are very, very low power (even relative to their overall much smaller peak power budget). Or maybe AMD's just not this aggressive yet. Perhaps the problem isn't the leakage of the stuff that'd be off but that the static leakage of the stuff that's in the southbridge would be higher in the APU because it's on a more performance optimized process.

Still can think of a lot of other reasons why Sony did this, though..
 
Do you have a reference for how much static leakage decent power gating can actually allow? Because on current mobile SoCs their power down states are very, very low power (even relative to their overall much smaller peak power budget). Or maybe AMD's just not this aggressive yet. Perhaps the problem isn't the leakage of the stuff that'd be off but that the static leakage of the stuff that's in the southbridge would be higher in the APU because it's on a more performance optimized process.

I thought I had an article or presentation with some definite numbers for power gates from when Nehalem introduced them to the desktop, but I must have been mistaken.

Discounting imprecise web site idle power estimates for desktop processors of at least several watts, I've inferred a rough floor based on claims of improvement with power gating on CPUs with peak draws at 80-100 Watts, of which a significant fraction is leakage.
AMD claims a greater than 20x improvement in leakage, and Intel something around 30x.

I had a link for Intel's 30x claim, which also included a distribution of power gated leakage versus ungated, but now I can't find it.
AMD had a bunch of slides here, including the source of the 20x claim:
http://www.hotchips.org/wp-content/uploads/hc_archives/hc23/HC23.17.1-tutorial1/HC23.17.111.Practical_PGandDV-Kosonocky-AMD.pdf

An AMD patent claims as low as 100mW for a core in a CC6 state:
http://www.faqs.org/patents/app/20120146708

The turbo description sounds like it could be like the one used by Richland or some soon to be released chip. It leaves the question of which definition of core is being used and how many.
Best-case, it's quietly assuming a whole Piledriver module is a core and consumes 100mW when gated, otherwise a Richland APU (possibly ULV) would burn 400mW in its power-gated cores alone.
That might explain why Piledriver isn't in Orbis.

At least for AMD chips with the size and TDP of a desktop CPU, I can see Sony being leery of thinking trying to fit an APU in the .5-4W range.
It may be possible to get an APU to gate down to that range, with no guarantee it is useful for anything at that point--coupled by its likely reliance on at least part of the GDDR5 pool.
On top of that, the AMD chips that might power gate that low tend to be heavily binned or very stripped down Bobcat/Jaguar chips with nowhere near as much heft as the PS4 chip.

If the theory is that Sony wants to be able to get things like background updates and downloads within a very constrained standby/sleep mode, we know there are chips capable of doing that at power levels AMD's larger chips have not demonstrated they can even power gate down to.
 
I thought I had an article or presentation with some definite numbers for power gates from when Nehalem introduced them to the desktop, but I must have been mistaken.

Discounting imprecise web site idle power estimates for desktop processors of at least several watts, I've inferred a rough floor based on claims of improvement with power gating on CPUs with peak draws at 80-100 Watts, of which a significant fraction is leakage.
AMD claims a greater than 20x improvement in leakage, and Intel something around 30x.

I had a link for Intel's 30x claim, which also included a distribution of power gated leakage versus ungated, but now I can't find it.
AMD had a bunch of slides here, including the source of the 20x claim:
http://www.hotchips.org/wp-content/uploads/hc_archives/hc23/HC23.17.1-tutorial1/HC23.17.111.Practical_PGandDV-Kosonocky-AMD.pdf

An AMD patent claims as low as 100mW for a core in a CC6 state:
http://www.faqs.org/patents/app/20120146708

The turbo description sounds like it could be like the one used by Richland or some soon to be released chip. It leaves the question of which definition of core is being used and how many.
Best-case, it's quietly assuming a whole Piledriver module is a core and consumes 100mW when gated, otherwise a Richland APU (possibly ULV) would burn 400mW in its power-gated cores alone.
That might explain why Piledriver isn't in Orbis.

At least for AMD chips with the size and TDP of a desktop CPU, I can see Sony being leery of thinking trying to fit an APU in the .5-4W range.
It may be possible to get an APU to gate down to that range, with no guarantee it is useful for anything at that point--coupled by its likely reliance on at least part of the GDDR5 pool.
On top of that, the AMD chips that might power gate that low tend to be heavily binned or very stripped down Bobcat/Jaguar chips with nowhere near as much heft as the PS4 chip.

If the theory is that Sony wants to be able to get things like background updates and downloads within a very constrained standby/sleep mode, we know there are chips capable of doing that at power levels AMD's larger chips have not demonstrated they can even power gate down to.

It's not just sleep mode. The secondary chip may also be active during gameplay for background download, install and possibly streaming at the same time.


Starsha?

DCE 7.0
UVD 4.0
VCE

I have no idea.

I don't quite see the need to stream or encode video in low power mode, unless people are talking about low power video conferencing phone.

If there is no such app on the PS4, then we may not need the video units in the secondary CPU. For RemotePlay over the WAN, as mentioned in some reports, I suppose they can turn on the full unit (but startup more quickly).

That CPU alone (without further GPU-like parts) can perform background update and install just fine.

I was trying to think of a scenario where a game may use the APU's video encoder, and also a video encoder in the secondary CPU (e.g., RemotePlay) at the same time, but I can't nail one down. So I still can't see a Starsha in the the secondary CPU setup at the moment.

EDIT:
e.g., When a video call comes in right in the middle of an Uncharted video cutscene, I am assuming the APU should be able to handle both the video conferencing and the cutscene playback without visible/drastic drop in quality. In the worst case, they can suspend the gameplay/movie.
 
It's not just sleep mode. The secondary chip may also be active during gameplay for background download, install and possibly streaming at the same time.
That's fine, there's no rule that a chip that can operate in a very low power state can't perform functions when the device is active. Since it is the networking and storage arbiter, it's going to have to be on.

The difference is that if Sony wants a fair amount of background activity to still happen while the device is in the same power band as sleep mode, it is able to ensure that there's a component that can affordably hit the target. If pad requirements mean that there was going to be a chip interfacing to all the IO and networking anyway, a small core added to the southbridge would be an incremental increase in cost that would spare the main APU from being binned for high performance and low-power functionality. So a few mm2 on a cheap chip to prevent throwing out X% of the big expensive chip's production or hobbling the big chip so it can work at as low power as a phone processor.
 
I think most expect some kind of bump with the change to 8GB. Question is how much.

1GB kinda seems like a nice round tidy number more than anything else to me.

Then again sometimes I think the prevailing number is often influenced by the truth behind the scenes, so maybe that's the case here.
 
I think most expect some kind of bump with the change to 8GB. Question is how much.

1GB kinda seems like a nice round tidy number more than anything else to me.

Then again sometimes I think the prevailing number is often influenced by the truth behind the scenes, so maybe that's the case here.

Yes, and they have to be consequential with the number of cpu cores reserved. If it is 1 or less than 1 like was rumored with 1 GB it will be enough. Would be a waste to reserve more.
 
Status
Not open for further replies.
Back
Top