PlayStation 4 (codename Orbis) technical hardware investigation (news and rumours)

Pixel · Apr 27, 2014

if you look on the die the 32 rops takes a fair amount of die space...

I think this was part of that "4K" initiative that they wanted PS4 to support, regardless of the fact it would only support limited image quality at that resolution.

GravityX · Apr 27, 2014

Shifty Geezer said:
The prior page of discussion? The subject arose again because a dev mentioned the '14+4 configuration'. There is no 14+4 configuration. There is only 18 CUs configuarion. How devs choose to use that is down to them. The notion of diminishing returns is tangential to the discussion of PS4's technical hardware investigation. PS4's hardware is 18 CUs that devs can use however they want. The discussion of whether 18 CUs on graphics in PS4 is a wasteful or not belongs to its own thread.

Curious.

Isn't the CPU cores kinda "split" too. In the sense, where 6 are used for games and 2 for OS. Their not necessary different or divided, but the are reserved.

So can't the GPU cores be used the same way? 14 for graphical effects and the other 4 used for whatever else the developer needs?

Just throwing it out there.

Brad Grenz · Apr 27, 2014

GPUs are already massively parallel arrays of execution hardware being issued tasks automatically from jobs broken down into hundreds of threads. There's no point to thinking of the hardware in a topographic sense and reserving certain parts because of where they are located on the physical die. You find the parts that are idle and issue them a task. Increasingly this has also been the approach to CPUs as well, rather than manually assigning threads to cores.

Shifty Geezer · Apr 27, 2014

GravityX said:
So can't the GPU cores be used the same way? 14 for graphical effects and the other 4 used for whatever else the developer needs?

Just throwing it out there.

No. Read this thread. Heck, just read the past couple of pages!

3dilettante · Apr 27, 2014

The PS4 potentially exposes the more low-level hooks available to help direct the GPU's scheduler. There was pre-release discussion about using specially crafted shaders or control code to soft-partition the CU load.

I haven't seen further disclosures that put any weight in that direction.
Some recent presentations, such as the Infamous Second Son engine tech slides, put long-running compute as something that is seriously challenged either by software requirements and/or potentially architectural issues.

Dedicating a subset of CUs can be considered a form of that, since it's CUs running a workload independent and exclusive of the others for an indeterminate period of time. To the rest of the GPU, it would look like the dedicated CUs are running kernels that have taken up the full share of the allocated resources at all times.
It might well be that an informal split could be done by doing just that (assuming the PS4 won't freak out at some point in the lifetime of such a shader, as PC drivers are wont to do), but currently it doesn't look like it's considered a good idea.

Grall · Apr 27, 2014

Why would you want to soft-partition the GPU to run several simultaneous jobs? You'd thrash the caches that way, wouldn't you. Surely it's better to get one job out of the way as quickly as possible to make room for the next.

3dilettante · Apr 27, 2014

The GPU would thrash the caches dynamically anyway.
Whether a few CUs are available as frequently for the 64 compute queues and the graphics front end aren't likely to change that.

What a reservation might be able to do is provide a predictable amount of resources at any given time. CU allocation is shown in Sony's GDC slides as being very spiky, and possibly some future game that doesn't have Sucker Punch's desire to override compute with graphics spikes would want a consistent and reliable simulation kernel that doesn't get jerked around by the occasional particle-fest.

There is also the desire to avoid queuing and launch delays under load, which surprised me at how bad they can be. At AMD's APU event, there was a presentation from a Sony dev concerning using HSA for audio. The conclusion was that current methods only made the GPU reliable for workloads that were tolerant of multi-frame delays when the system was under load. Just starting a wavefront could take 33ms, never mind it completing.

imaxx · Apr 28, 2014

3dilettante said:
CU allocation is shown in Sony's GDC slides as being very spiky, and possibly some future game that doesn't have Sucker Punch's desire to override compute with graphics spikes

Not sure if the Sony composer allows you to do that, but cant one just put compute threads in the hiperf 3d queue? It's just a command buffer, in the end. Wether the shader does 3d or not, it (should) not matter...

Grall · Apr 28, 2014

3dilettante said:
Just starting a wavefront could take 33ms, never mind it completing.

I assume the above scenario is inserting a standalone compute job to run simultaneously with 3D rendering?

I thought the extra compute queues introduced in the latest GCN revision was meant to deal with stuff like that.

In other unrelated news, if I run folding@home simultaneously with a game (diablo3 in this particular case), my PC will bluescreen, usually within minutes. Or this was the case with the WHQL driver from last year, I haven't gotten around trying with the new release from a couple days ago.

imaxx · Apr 28, 2014

Grall said:
In other unrelated news, if I run folding@home simultaneously with a game (diablo3 in this particular case), my PC will bluescreen

you cannot compare the complexity of a PS4 (or Linux) kmodule with windows counterpart... come on!!

The extra compute stuff is added to prevent compute rounds to kill 3d rounds, not the opposite way around: the goal is (I suppose) to utilize the unused shader resources that happens at various stages during the rendering with compute kernels, in order to squeeze more performance from the same hardware..

3dilettante · Apr 28, 2014

imaxx said:
Not sure if the Sony composer allows you to do that, but cant one just put compute threads in the hiperf 3d queue? It's just a command buffer, in the end. Wether the shader does 3d or not, it (should) not matter...

Did you mean high-priority 3D? That one is diagrammed as not being compute-capable in the Vgleaks docs and it's reserved for VSHELL.

Grall said:
I assume the above scenario is inserting a standalone compute job to run simultaneously with 3D rendering?

It's in the context of audio effects running on the GPU. Latencies were considered acceptable generally, but become very high when the GPU becomes heavily utilized.

I thought the extra compute queues introduced in the latest GCN revision was meant to deal with stuff like that.

The queue improvements cover a different part of the GPU compute process.
Having many queues reduces contention when multiple threads are trying to send commands to the GPU, and it reduces the chance that the in-order queues will stall a ready kernel because its commands are mixed with others that are not.
The expanded number of ACEs also boosts the number of wavefronts that can launch or receive commands in a cycle, and I'm assuming (dangerous to do, but I hope this is the case) the wavefront completion logic is also scaled up.

This significantly enhances the process of getting a wavefront to the point of allocating resources and launching, and where Orbis as it has been disclosed starts becoming inconsistent.
The front end processors need to be able to allocate the necessary resources for their wavefronts, and those become available when the CUs release them and the ready wavefronts win out in the arbitration phase.
Wavefront execution is a pretty coarse thing, and the audio presentation's lament is that a latency-sensitive load cannot count those resources being ready in a timely fashion.
Audio is particularly sensitive, but spiking up to 33ms for startup time in other workloads is going to become noticeable.

It's still the early days, so perhaps some of the extra QoS tweaks or better tools might provide a way around some of the problems faced at launch.

In other unrelated news, if I run folding@home simultaneously with a game (diablo3 in this particular case), my PC will bluescreen, usually within minutes. Or this was the case with the WHQL driver from last year, I haven't gotten around trying with the new release from a couple days ago.

I'm not sure of the reasons, since this among other things depends on the quality of the card's driver development.
However, one tweak that has been used to reduce the chance of this happening is to find a driver setting to extend the timeout for the card to respond. Long-running kernels can take longer to complete and generate responses than the standard timeout. The OS assumes they've locked up, which is something I alluded to earlier.

pMax · Apr 30, 2014

3dilettante said:
...it's reserved for VSHELL.

Interesting, indeed. That'd allow then for a very fast overlay of the graphic interface, making the system very reactive.

3dilettante said:
Latencies were considered acceptable generally, but become very high when the GPU becomes heavily utilized.

I suspect this will be come one of the 'issues' of this generation.
hmmm... using GPU performance monitors to know when it's a good idea to fire a compute task? I dont think you can easily change AMD arbitration for queues inside the chip ...or maybe yes? Maybe making shaders shorter, allowing more and more arbitration and chances to get your kernel executed?
Mah.

Boxercide · May 1, 2014

I played around with the PS4 video after the update and found that it saves the video as 720p @8000 kbps for Lego Marvel Superheroes w/ 128 kbps stereo sound . 15 min of video ended up @ 872 megs for the entire clip.

Did we ever find out what part of the system is doing this? ie: part of the system reservation or a dedicated ARM chip?

I saw that devs can enable watermarking and strip out audio to avoid issues with copyrighted music. Can anybody add some more info?

Shifty Geezer · May 1, 2014

There's dedicated video encoding hardware. We don't know if there's a large file in RAM or if it's streamed from HDD.

London Geezer · May 1, 2014

Shifty Geezer said:
There's dedicated video encoding hardware. We don't know if there's a large file in RAM or if it's streamed from HDD.

Any other set top box simply records straight to HDD as it's a relatively slow process (8-10MB/s?). I'd be very, very surprised if PS4 reserved 1GB of its RAM to record video.

Shifty Geezer · May 1, 2014

That was one of the arguments back when discussing RAM utilisation. However, with 3+ GBs available for OS, maybe they are using it for video encoding at this point? There doesn't seem much need to, but it would reduce HDD interrupts for games. Is there an HDD light on PS4? If so, if you capture a game clip, does it show more accessing?

London Geezer · May 1, 2014

Shifty Geezer said:
That was one of the arguments back when discussing RAM utilisation. However, with 3+ GBs available for OS, maybe they are using it for video encoding at this point? There doesn't seem much need to, but it would reduce HDD interrupts for games. Is there an HDD light on PS4? If so, if you capture a game clip, does it show more accessing?

I haven't noticed a HDD light but again, using RAM for this function would be really, really retarded on their part. Which of course doesn't mean it wouldn't be happen, as we've seen Sony do some seriously silly things in the last few years.

Betanumerical · May 1, 2014

Shifty's Bitch said:
I haven't noticed a HDD light but again, using RAM for this function would be really, really retarded on their part. Which of course doesn't mean it wouldn't be happen, as we've seen Sony do some seriously silly things in the last few years.

it being 'retarded' really depends on what your aiming for, if you have excess ram and want games to be able to have a lower latency (on average) to the hard drive then storing the recorded video in the RAM is not that bad an idea. If you don't have enough ram then maybe it is a bad idea. Maybe some sort of hybrid solution would be better.

London Geezer · May 1, 2014

Betanumerical said:
it being 'retarded' really depends on what your aiming for, if you have excess ram and want games to be able to have a lower latency (on average) to the hard drive then storing the recorded video in the RAM is not that bad an idea. If you don't have enough ram then maybe it is a bad idea. Maybe some sort of hybrid solution would be better.

Well if we are counting the RAM which is unexplicably reserved as 'available RAM' - since there seems to be little reason so far for 3GB to be seemingly lost - then yeah I'd agree with you.

pMax · May 1, 2014

damienw said:
15 min of video ended up @ 872 megs for the entire clip

15*60=900 --> ~1MB/s.

cannot imagine any HDD having issues with that.

PlayStation 4 (codename Orbis) technical hardware investigation (news and rumours)

Pixel

GravityX

Brad Grenz

Philosopher & Poet

Shifty Geezer

uber-Troll!

3dilettante

Grall

Invisible Member

3dilettante

imaxx

Grall

Invisible Member

imaxx

3dilettante

pMax

Boxercide

Shifty Geezer

uber-Troll!

London Geezer

Shifty Geezer

uber-Troll!

London Geezer

Betanumerical

London Geezer

pMax

Similar threads