PlayStation 4K - Codename Neo - Technical analysis

Recall

Newcomer
Thought I would come over here and see what you guys think this update will make to overall performance. I know true 4K will be impossible, but upscaling is going to be a good option. My question lies around the specs. From my understanding the PS4 is incredibly bottlenecked by its CPU. So I am surprised to see only a slight speed bump, yet they have gone with what looks like to be Polaris GPU.

Will the GPU really make much difference to raw fps? Will it help with things such as shadow resolution in The Division as I always thought this was cpu intensive?

Article with all the specs http://www.eurogamer.net/articles/digitalfoundry-2016-sonys-plan-for-playstation-4k-neo-revealed

 
This thread should cater to just technical discussion whereas the other is rumours, business impact, and fall-out. I guess the original title was a bit misleading given context, so I'll make it more focussed.
 
GCN has full fill rate when writing 64 bit render targets. Modern console games bit pack two 32 bit render targets to one 64 bit render target effectively doubling the fill rate (compared to most GPUs).

8 bytes/pixel (64 bpp) * 16 pixels/clock * 853 MHz = 109 GB/s.

Now this is just the ROP color writes. If you use depth buffering or sample any textures, you obviously need even more bandwidth to fully utilize 16 ROPs.

PS4 memory bandwith is 176 GB/s. Subtract the CPU (up to 20 GB/s) and convert theoretical max to actual achievable bandwidth and you see that 16 ROPs are enough. More than 16 ROPs give proper benefits only when you are not bandwidth bound. All 64 bpp modes are bandwidth bound on GCN (as GCN has full speed 64 bpp ROP output). This means that HDR output (4x 16 bit float, such as lighting, post processing, etc) and/or g-buffer rendering (bit packed to 64 bpp RTs) do not get any gains from extra ROPs. More than 16 ROPs gives you benefits when you are rendering to 32 bpp (R8G8B8A8, R11G11B10F, etc) target or when you are rendering depth only (such as shadow maps).

Also you can use compute shaders to completely avoid the ROP bottlenecks. Most games already do this for post processing. Also traditionally high bandwidth rasterizer jobs such as particle rendering can be nowadays done with compute shaders. See here: http://amd-dev.wpengine.netdna-cdn....dering-using-DirectCompute-Gareth-Thomas.ppsx. This technique requires zero memory bandwidth for backbuffer operations or blending, because it renders the particles in tiles using LDS (64 MB local memory inside CU) to hold the backbuffer tile for blending. The final output is written to memory (once) using raw memory writes instead of ROPs.

Having more ROPs is of course nice, but you can avoid the performance hit in most cases... Except in shadow map rendering. 16 ROPs sucks for shadow map rendering :(

What do you think of the rumored spec of a high end PS4? So little bandwith for so much CU. I know they will probably use colour compression but it is not work in every case...

Edit: I only talk the rumor nothing about more than the rumored specs. Many mystery about the CU for example...
 
Last edited:
I dont see how any analysis can take place when its still in the rumor state.
 
What do you think of the rumored spec of a high end PS4? So little bandwith for so much CU. I know they will probably use colour compression but it is not work in every case...

Edit: I only talk the rumor nothing about more than the rumored specs. Many mystery about the CU for example...

yes, they need more efficient CU's and other compression, but the BW and CU's compare favorably to a 380x.

New: 36 CU, 4.2 TFlops and 214 GB/ sec vs R380x: 32 CU, 3.9 Tflops, and 182 GB/sec.

Assuming they're using even more up to date GCN, I don't think bandwidth will be an issue.
 
I think balance was a non issue. Ps4 is the base hardware for games and sony seems to intend to keep it that way. So that's balanced, ps4k is the "overclocked" version, so in went as many cu's as they could fit. I think the fact none of that space went for extra jaguar cores is telling sony isn't too bothered with framerates improvements. They'll sure take 'em if they happen in a title here or there, but the way 4k hardware was designed tells me the focus was mostly on upping the res of ps4 games.
 
What we reasonably know for sure now is the 36CUs information. Right?

But how will they be integrated? Are we talking 36CUs as if it was a regular GPU or some kind of dual GPU trick? Is it possible to use a standard 36CU configuration while having half of them completely deactivated for compatibility with PS4 base games, and having a perfect BC?
 
I think balance was a non issue. Ps4 is the base hardware for games and sony seems to intend to keep it that way. So that's balanced, ps4k is the "overclocked" version, so in went as many cu's as they could fit. I think the fact none of that space went for extra jaguar cores is telling sony isn't too bothered with framerates improvements. They'll sure take 'em if they happen in a title here or there, but the way 4k hardware was designed tells me the focus was mostly on upping the res of ps4 games.
Why do you see the need for extra Jaguar cores? If the process will benefit from more parallel processing then it would benefit more from running as compute, and still I don't see the benefit to framerate. Increased clocks for quicker execution time on the other hand may have some benefit to framerates, though I'd like to see a discussion on what it is exactly that greater/better cpu capabilities have on framerate/resolution vs more CUs.
 
I think balance was a non issue. Ps4 is the base hardware for games and sony seems to intend to keep it that way. So that's balanced, ps4k is the "overclocked" version, so in went as many cu's as they could fit. I think the fact none of that space went for extra jaguar cores is telling sony isn't too bothered with framerates improvements. They'll sure take 'em if they happen in a title here or there, but the way 4k hardware was designed tells me the focus was mostly on upping the res of ps4 games.

I also think the primary use of the extra power is going to be to render at the highest resolution (with the highest quality presentation) they can while achieving equivalent framerate as the base version and while leaving some resources to upscale the rest of the way to 4K if needed. I think 60/30 fps splits are possible, but I don't think they will be common.
 
What we reasonably know for sure now is the 36CUs information. Right?

But how will they be integrated? Are we talking 36CUs as if it was a regular GPU or some kind of dual GPU trick? Is it possible to use a standard 36CU configuration while having half of them completely deactivated for compatibility with PS4 base games, and having a perfect BC?

The lower bandwidth to CU ratio means that this is almost certainly a single APU based on newer GCN tech. At least Tonga level and possibly Polaris. As for BC, it seems that the architecture is similar enough or the APIs are abstracted enough that it doesn't seem to be an issue. As you say, the system will probably downclock and not expose the additional CUs to non-Neo-aware games.
 
Why do you see the need for extra Jaguar cores? If the process will benefit from more parallel processing then it would benefit more from running as compute, and still I don't see the benefit to framerate. Increased clocks for quicker execution time on the other hand may have some benefit to framerates, though I'd like to see a discussion on what it is exactly that greater/better cpu capabilities have on framerate/resolution vs more CUs.

I've said this before, but when going from 30 to 60fps you are asking the CPU to do the same amount of frame-critical work in half the time. I'd be surprised if the increase in the workload from additional resolution is as linear as that.
 
Why do you see the need for extra Jaguar cores? If the process will benefit from more parallel processing then it would benefit more from running as compute, and still I don't see the benefit to framerate. Increased clocks for quicker execution time on the other hand may have some benefit to framerates, though I'd like to see a discussion on what it is exactly that greater/better cpu capabilities have on framerate/resolution vs more CUs.
I think without a doubling of cpu performance, it'd be hard to double framerates in most games. I also doubt most developers will put much effort in refactoring their engines too much for ps4k. That's why I think this was built with higher resolutions in mind more than higher fps. Extra jaguar cores, i believe would be almost as helpfull for cpu tasks as extra cus would be for gpu ones, for ps4 aplications. I think all games today are built with highly jobified code. Though admitedly all this is out of my depth. My technical understanding of all of this is very superficial, so don't take my layman's opinion too seriously.
 
The lower bandwidth to CU ratio means that this is almost certainly a single APU based on newer GCN tech. At least Tonga level and possibly Polaris. As for BC, it seems that the architecture is similar enough or the APIs are abstracted enough that it doesn't seem to be an issue. As you say, the system will probably downclock and not expose the additional CUs to non-Neo-aware games.

For one I think if games are really CPU limited we'd see parity when comparing XB1 with ps4. No scratch that because XB1 has a higher clock so XB1 should win rather than lose.

The fact that we have framerate differences with PS4 winning in almost every case points to the majority of the framerates limited by the GPU instead of the CPU.

That said I don't see the real need to upgrade the CPU that much.

Not to mention that we're also seeing a mild upclock on the GPU so even if they limit the past games to 18 CUs we should still see a minor improvement. I don't see why they'd downclock the hardware just to hit parity.
We've seen upclocks with other hardware (Think 3DS, PSP) and devs/games being perfectly fine with it.

I've also seen more comparisions with people aligning the PS4.5 more with 480 instead of the 380. I wonder what's everybody else's take on that?
 
For one I think if games are really CPU limited we'd see parity when comparing XB1 with ps4. No scratch that because XB1 has a higher clock so XB1 should win rather than lose.

By how much?
Unless someone knows otherwise, because these were the numbers spoken at the time, the Xbox One 7th core is up to 80% unlocked, but the PS4 one is up to 100%.
If this is true, since we have 112 Gflops on Xbox One Cpu ans 102,4 Gflops on PS4, we can see how much we have for games dividing that number by 8 cores and multiplying by 6.8 on Xbox One and 7 on PS4.
This would give 95,2 Gflops on Xbox One and 89,6 on PS4. This is a 6.49% diference! Not very significant!
 
Also I feel like this needs to be elaborated further, assuming the 36CUs are Polaris cores with 2.5x more efficiency, of course not necessarily reflected in all applications, what sort of ballpark would people put it in? =~ gtx 970?
 
Last edited by a moderator:
I wonder if Mark Cerny is again leading the archicture and engineering teams around the PS4 Neo.

The design is certainly very deliberate from people more experienced and knowledged than pretty much all of us in balancing cost and performance and creating an all around balanced microarchitecture design (in cooperation with AMD's own engineering teams no doubt). As the PS4 Neo certainly does have memory bandwidth limitations perhaps there are cost and architecture limitations preventing further improvements beyond the 30% bandwidth increase.

Perhaps they believe the extra CU normally idled in some rendering scenarios due to bandwidth constraints could be used for further enhancing gpu compute?

Sebbbi also mentioned about rendering techniques that scales resolution that are not so demanding on ALU. Perhaps these techniques also reduce typical bandwidth costs as well.
 
Last edited:
As the PS4 Neo certainly does have memory bandwidth limitations perhaps there are cost and architecture limitations preventing further improvements beyond the 30% bandwidth increase.

I think we're jumping the gun a little on believing that these are the finalized specs and/or performance deltas. These are "target specs" for devs to work within bounds of the current (more robust) PS4 SDK that's available, because there is no finalized NEO hardware to work with. I'm almost certain the documentation leaks are more keen to PS4 hardware (SDK) that's been overclocked within areas (CPU), with the latest GDDR5 memory modules, and maybe some early AMD Polaris/APU engineering samples within these beta NEO kits. From what's being said, the finalized kits will not be in the devs hands until mid October.

That being said, I'm not expecting a super leap in performance from the finalized kits either... just some solid step-ups on certain aspects that Sony hasn't disclosed as of yet to devs. Hell, there isn't any mention of the UHD drives and/or any integration of PSVR hardware (if any) in the documentation. Anyhow, I think we're only seeing 80% of the picture, it's the other 20% that might be more intriguing than what we know.
 
Back
Top