PlayStation 4K - Codename Neo - Technical analysis

Well, yes :). But even that. How can devs use the power to its fullest?

Also I feel like this needs to be elaborated further, assuming the 36CUs are Polaris cores with 2.5x more efficiency, of course not necessarily reflected in all applications, what sort of ballpark would people put it in? =~ gtx 970?

The 2.5x efficiency refers to performance per watt rather than performance per CU (at a given clock speed). Polaris could end up being no faster at all than GCN1.0 - 1.2 on a per CU/clock basis. We haven't really seen any improvement at the performance per CU level since GCN first launched (other areas such as memory bandwidth use have improved significantly though). If Polaris has the same performance per CU and other elements as GCN 1.2 then the PS4K's GPU will fall closely in line with the R9 380x - probably a few percent faster. That puts it around the same speed as a GTX 770 in modern games at 1080p (GTX 970 is about 50% faster):

http://www.techpowerup.com/reviews/Gigabyte/GTX_980_Ti_XtremeGaming/23.html

Of course it could be any amount faster than that depending on how much, if any more performance Polaris brings at the per CU (and other basic elements) level.
 
Last edited:
Also I feel like this needs to be elaborated further, assuming the 36CUs are Polaris cores with 2.5x more efficiency, of course not necessarily reflected in all applications, what sort of ballpark would people put it in? =~ gtx 970?
Stock GTX 980 is 4.6 TFlops and has 224 GB/s memory bandwidth, new PS4 is 4.2 TFlops with 218 GB/s memory bandwidth. I'd say these two are going to be comparable in performance.
 
I find it hard to believe the ALUs are notably faster. FMADDs are working at caluclations per clock. You can't improve that, only add more processing (hence the development of insanely wide GPUs in the first place). The only area of improvement AFAICS, unless I'm well wrong in my understanding, is data efficiencies and keeping things active, approaching the peak possibilities. Which we're supposed to be close to with well written code using compute anyway.

Thus I can only see moderate improvements per CU for new architectures, as it is with any processor. Big improvements come from wider or higher clocks or fixing bottlenecks.
 
Maybe under the right condition such as DX12 titles or the ones heavily utilize those ACE units will we see drastic improvements over the Nvidia gpus or at least matching them. If that's the case then it should be even more future proof.
 
I've also seen more comparisions with people aligning the PS4.5 more with 480 instead of the 380. I wonder what's everybody else's take on that?

If it's Polaris based then it will almost certainly be a 480 variant - probably an underclocked version. If it's Tonga based then the 380x is a very good performance comparison. The problem is that we have no idea what, if any performance improvements Polaris will bring over Tonga on a per CU / GB of bandwidth basis, and thus for now, all we can do is take the 380x performance as a baseline comparison point.

If you take out the 20GB/s of bandwidth that can be allocated to the CPU then the rumoured specs of the PS4K put it at 6% more CU throughput (flops and texturing) and 8% more memory bandwidth than the 380x, but only 94% of it's overall fill rate and geometry throughput. Obviously a move to Polaris may make those comparisons meaningless.

Stock GTX 980 is 4.6 TFlops and has 224 GB/s memory bandwidth, new PS4 is 4.2 TFlops with 218 GB/s memory bandwidth. I'd say these two are going to be comparable in performance.

They're completely different architectures and so not at all comparable in this way. Nvidia is much more efficient that AMD's current architectures in terms of bother performance/flop and performance/GB bandwidth. The 960 for example is only a few percent slower than the 380x at 1080p and it achieves that on 2.3 TFLOPS and 112 GB/s. The 980 is double that in both areas and as such clocks in at over 70% faster than the 380x.

Polaris may greatly improve performance per CU & GB of bandwidth compared with Tonga, but a 70% increase in both areas seems like a very unrealistic expectation.
 
I get that Nvidia cards are popular but I think you get a more accurate estimate when comparing to AMD cards. My guess is gpu will be in the range of: a bit faster than a 380X to up to in between a 380X and a 290.
 
I find it hard to believe the ALUs are notably faster. FMADDs are working at caluclations per clock. You can't improve that, only add more processing (hence the development of insanely wide GPUs in the first place). The only area of improvement AFAICS, unless I'm well wrong in my understanding, is data efficiencies and keeping things active, approaching the peak possibilities. Which we're supposed to be close to with well written code using compute anyway.

Thus I can only see moderate improvements per CU for new architectures, as it is with any processor. Big improvements come from wider or higher clocks or fixing bottlenecks.

I don't disagree with you there, what I'm really saying when I refer to CU efficiency improvements is 'the ability to achieve more with fewer CU's or GB/s'. As you say, that would probably be achieved by improvements to other areas of the chip which increase utilisation of the available CU's and bandwidth.

I'd guess we'll see some modest improvements from Polaris in this regard over Tonga so I wouldn't be surprised to see the PS4K GPU performing more in line with a 780 (nno Ti) on the Nvidia side rather than the 770, or as mpg1 says above, somewhere between the 280x and 290.
 
How can you even compare performance on console and PC?! One is low level close to metal low overhead the other one is a PC with all its API shortcomings and limits. Take a best case DX12 on PC scenario for AMD and add to that and maybe you come close to what you can extract out of a console APU.

Using TPU and somehow trying to extrapolate performance is ridiculous.
 
How can you even compare performance on console and PC?! One is low level close to metal low overhead the other one is a PC with all its API shortcomings and limits. Take a best case DX12 on PC scenario for AMD and add to that and maybe you come close to what you can extract out of a console APU.

Using TPU and somehow trying to extrapolate performance is ridiculous.

It's been shown numerous times, particularly by Digital Foundry that PC GPU's of a similar spec to the console GPU's perform similarly. The days of massive DX9 overheads are long gone, DX11 is far more efficient and DX12 will pretty much eliminate all remaining overhead (where developers choose to fully leverage it). Obviously there are still things that can be done on console that can't on PC, but the combined advantages of those in most cases (if developers even bother to optimise to that level) is probably going to show up in single digit percentage increases. If you're expecting a GPU with 380x specs to perform like a Fury because it's in a console then you're setting yourself up for disappointment.
 
I find it hard to believe the ALUs are notably faster. FMADDs are working at caluclations per clock. You can't improve that, only add more processing (hence the development of insanely wide GPUs in the first place). The only area of improvement AFAICS, unless I'm well wrong in my understanding, is data efficiencies and keeping things active, approaching the peak possibilities. Which we're supposed to be close to with well written code using compute anyway.

Thus I can only see moderate improvements per CU for new architectures, as it is with any processor. Big improvements come from wider or higher clocks or fixing bottlenecks.
A clue of how they are improving performance:

https://forum.beyond3d.com/posts/1909150/

1-culling no visible triangles. Kyro 2 is back!.
2-Wavefront management.

I think both are fields in which Nvidia has always been ahead.
 
Those who waited for Polaris 10 to be awesome PC card wanted to get ~Fury-like performance for a cheap price. :D

Well the full Polaris 10 is rumoured to feature 40CU's and we don't know what clock speed it's running at yet. It may only be running in the 900Mhz range in PS4K but Pascal suggests that the new node allows for significantly increased clock speeds on the PC side. So if it was clocked at 1400Mhz for example it could be pretty close to Fury performance from the core point of view. Memory is also an unknown. GDDR5x doesn't seem unrealistic given that it's rumoured to feature in GP104-400.
 
Yeah, I posted it at Polaris thread.Seems iffy. If true the only thing would make sense when Sony says improved GCN is what?...Tonga with its color compression?.


This is speculation territory, and they haven´t anounced nothing yet.

If their api is as low level as it appears to be, maybe changing from GCN 1.1 to 3 or 4, could break things.
Tonga is GCN 1.2 to Sea Islands that was 1.1, if i recall

The new PS4 could just be a shrink of Orbis doubling the CU count, We´ll see
 
Yeah, I posted it at Polaris thread.Seems iffy. If true the only thing would make sense when Sony says improved GCN is what?...Tonga with its color compression?.

Improved to have Hvec decoding possibly adding encoding also if the game DVR is 1080 60?
 
This is speculation territory, and they haven´t anounced nothing yet.

If their api is as low level as it appears to be, maybe changing from GCN 1.1 to 3 or 4, could break things.
Tonga is GCN 1.2 to Sea Islands that was 1.1, if i recall

The new PS4 could just be a shrink of Orbis doubling the CU count, We´ll see

I can't see it. Not when paired with that amount of memory bandwidth.
 
In addition, PC CPUs are generally faster than the consoles at present.

In PC land the CPUs are (at least from Intel) delivering discrete CPU power not sharing a heat spreader with a GPU while the consoles went the APU route because of the economics, thermals and wattage.

I have a question in that most of the pdfs I've read from Sony devs imply that they (being the coders working on the hardware) believed 60fps and 1080p (thereabouts) was definitely possible but discovered that there were some bottlenecks however the PDFs are based on the six cores NOT on recently announced unlocked seventh core which XboxOne earlier unlocked.

We know it takes time to code and revise code. Would it make sense to believe that the regular PS4 and Xbone will experience a boost or much more stable target frame rates in future games?

Uncharted 4 has yet to be released so do we know if there was any time to receive a boost? Usually when internal stuff gets revised it's way ahead of any announced reports.

Then since PS4K (I prefer PS4 Kaio Ken) has it's "paper" numbers benefiting from the die shrinks, new process tech improvements/efficiency and that all seven cores would be available in initial dev kits then obviously 1080p 60fps triple buffered V-sync is a standard but higher resolutions will be testing grounds.
 
Back
Top