PlayStation 4 (codename Orbis) technical hardware investigation (news and rumours)

Status
Not open for further replies.
Still that figure is dreadful

...it is 190 cycles at supposed PS4 Jaguar core frequency, not desktop core frequencies (3/4 Ghz)...
if you normalize them both to 1 Ghz frequency, they look even worse, I think.

Anyway, for sake of comparison, you should not take intel's L2 (plainly too small to be comparable in function) but L3.
 
L2 cache blocks for Jaguar are half-rate IIRC.

edit: the interface seems full-speed though. :s

That would also apply to Xbox One and Jaguar based PC apus.

Wouldn't the L2 caches ramp up to full speed when needed?

I found this:

http://techreport.com/review/24856/amd-a4-5000-kabini-apu-reviewed
AMD has put some work into the L2 interface, which makes sense since it's the cores' only path to the rest of the system. The L2 interface runs at the full speed of the CPU cores and has built-in smarts, including the ability to store L2 tags, so it knows which portion of the cache to light up when the time comes to access one of its four 512KB banks. When those L2 cache banks aren't needed, they're clock gated to save power. AMD further conserves power by clocking the L2 arrays at half the frequency of the CPU cores.
 
Last edited by a moderator:
After Sony themselves confirmed it's not 14+4 but 18, why should we value the opinion of any third party dev?

Sony said that all of 18CU could be used for graphics rendering but this won't change what they said before that. If a developer wants to use all of 18CUs they would have minor boost from that 4CUs "extra" ALUs as this Japanese dev said. This could be a some kind of marketing strategy that Sony have been used to date for free. If they didn't said sth like that to developers why should we hear about that from a developer at this point?

Even Cerny said that but with downplaying the amount of those ALUs.

Mark Cerny: That comes from a leak and is not any form of formal evangelisation. The point is the hardware is intentionally not 100 per cent round. It has a little bit more ALU in it than it would if you were thinking strictly about graphics. As a result of that you have an opportunity, you could say an incentivisation, to use that ALU for GPGPU.
http://www.eurogamer.net/articles/digitalfoundry-face-to-face-with-mark-cerny

And if you take a look at the slide which I linked above you will see a "formal evangelization" from Sony.
 
Last edited by a moderator:
Sony said that all of 18CU could be used for graphics rendering but this won't change what they said before that. If a developer wants to use all of 18CUs they would have minor boost from that 4CUs "extra" ALUs as this Japanese dev said. This could be a some kind of marketing strategy that Sony have been used to date for free. If they didn't said sth like that to developers why should we hear about that from a developer at this point?
That dev also said:
"If the number of units increased, the possibilities of what we could do would increase as well."
 
At the risk of getting yelled art for bringing this up again...

This thing made an appearance in the recent naughty dog presentation as well..

https://www.youtube.com/watch?v=f8XdvIO8JxE

I've had it bookmarked and been meaning to watch the whole 45 minute thing for ages, but still have yet to get around to it.

Anyways again I meant to watch it, but AFAIK all the ND guy did was make a passing reference to it (dont think he ever mentioned 14+4 though), and say 18 CU's was more than needed for 1080P/60 FPS graphics, so they try to use some on compute, or something like that.

Which doesn't really clarify the issue (or make sense) at all, because clearly most PS4 games are not 1080P and 60 FPS simultaneously.

And if we took it to PC, I imagine there are games that can more than max out a 7850 at 1080P. So it's not some magical barrier that 1080P simply cannot usefully use>18 CU's, that I know of.

I guess. I mean the bottlenecks on PC could be anywhere, and I dont count ramping up to 8XMSAA or something as a way to stress the GPU. I would want the highest settings at modest or even no AA to overwhelm the GPU at 1080.

**Alstrong shrug**

Double fake edit: Thinking, the MS system architects claim they have "extra ALU" in the One, too, at it's lowly 12 CU's. Hmmm.

Could it be something like just that, you get so many mathematical operations per pixel at 1080P with say, 12 CU's, that you really enter into heavy diminishing returns?

The problem with this idea is, it kind of means PC GPU's over 12-18 CU's would be somewhat useless at 1080P, which seemingly is not the case. I mean there are vastly more powerful GPU's than that, none of which would be overkill for my 1080P monitor, I dont think. Then again, maybe PC is just really inefficient, in this case?

Third edit: Here is the MS quote about that, not quite as I remembered

The experiments we did showed that we had headroom on CUs as well. In terms of balance, we did index more in terms of CUs than needed so we have CU overhead. There is room for our titles to grow over time in terms of CU utilization,
 
It's a good point, but why would a dev, whose game just came out, still state something conflicting about a console's specs?

Sony said that all of 18CU could be used for graphics rendering but this won't change what they said before that...
How people choose to use it is down to them. YewYew's dev comment is plain wrong though. It's not a 14+4 configuration. It's an 18 CU, fully equal, fully programmable topology. Perhaps, lost in translation, they said they were using the equivalent of 4 CUs for non graphics work? Even then, you'd be better off interleaving workloads with graphics rather than dedicating four CUs to non graphics work, I imagine. Run graphics on 18 CUs for 7/9ths of the frame time, and then run 18 CUs for 2/9ths of the frame time on non graphics work.
 
Anyways again I meant to watch it, but AFAIK all the ND guy did was make a passing reference to it (dont think he ever mentioned 14+4 though), and say 18 CU's was more than needed for 1080P/60 FPS graphics, so they try to use some on compute, or something like that.

The message is simple, although apparently not simply enough judging by the 14+4 stuff!

Sony/AMD's engineers believe that based on the PS4's APU and 176Gb/sec bandwidth to GDDR5, that a competently executed game targeting 1080p will use up to 14 CU for graphics alone. Leaving 4 for compute. In the meantime, while devs are optimising their graphics code for PS4 or until compute usage increases, go nuts and use all 18.

Which doesn't really clarify the issue (or make sense) at all, because clearly most PS4 games are not 1080P and 60 FPS simultaneously.
Naughty Dog's first PS4 game, remastered The Last of Us, is aiming for 60fps.

14+4 should be a bannable offense at this point.
Amen!
 
14+4 should be a bannable offense at this point.

Secret sauce as well. :LOL:

Maybe, this (14+4) is the secret sauce for those who need to believe PS4 is crippled somehow, on validating their purchase. The art of projecting ...Secret Sauce found! :oops:
 
How people choose to use it is down to them. YewYew's dev comment is plain wrong though. It's not a 14+4 configuration. It's an 18 CU, fully equal, fully programmable topology. Perhaps, lost in translation, they said they were using the equivalent of 4 CUs for non graphics work? Even then, you'd be better off interleaving workloads with graphics rather than dedicating four CUs to non graphics work, I imagine. Run graphics on 18 CUs for 7/9ths of the frame time, and then run 18 CUs for 2/9ths of the frame time on non graphics work.

It's from Sony presentation on GDC and representing 18CUs workload.

86835170572942815330.png
 
It's from Sony presentation on GDC and representing 18CUs workload.
Yes, but that's for filling up empty space in the GPU workload. If you are dedicating 14 CUs to graphics and 4 CUs to compute, it should be better to use the whole GPU for each function instead and deal with timeslices.

Is it even possible to allocate CUs to dedicated, independent jobs? I thought a job is sent across available CUs, and you can't divide a GPU to handle parallel tasks other than how it's scheduled.
 
The dialogue accompanying the presentation probably puts more context on things. In that slide it looks like you have squeeze asynchronous compute jobs around the graphics usage however Cerny explained there are 64 command queues for the 18 compute units which allow prioritising any job - graphics or compute.

I wouldn't be surprised if several years down the line certain compute jobs, critical AI or physics for example, take priority over some less critical graphics jobs. Architecting a graphics rendering pipeline that includes 'do if you can but it's not essential' effects would be interesting.
 
Status
Not open for further replies.
Back
Top