PS4 Pro Official Specifications (Codename NEO)

Status
Not open for further replies.
Combining downsampling with upsampling/reconstruction seems like a very questionable idea to me.
I was interpreting that as the rendering at an intermediate resolution and the using checkerboarding route, and then downsampling. The base sampling rate of opaque geometry would be higher than straight 1080p, and then there is the data calculated from the projected quads.
Is the additional information with some delta from true sampling necessarily worse than having no data?

edit: The engadget article states that those titles go to 1080p supsersampled for HDTVs. A misstatement or misquote in one of the two articles?
 
Last edited:
2*FP16 throughput double confirmed for the PS4 Pro:

Mark Cerny said:
One of the features appearing for the first time is the handling of 16-bit variables - it's possible to perform two 16-bit operations at a time instead of one 32-bit operation.
In other words, at full floats, we have 4.2 teraflops. With half-floats, it's now double that, which is to say, 8.4 teraflops in 16-bit computation. This has the potential to radically increase performance.

This is him talking about features that are introduced in the console SoCs first and will be introduced later in PC GPUs, which seriously hints at Vega having this too.


There's also an extra 1GB of DDR3 on the Pro, which seems to be used as swap/cache for less demanding applications like Netflix to be kept alive:
"We felt games needed a little more memory - about 10 per cent more - so we added a gigabyte of slow, conventional DRAM to the console," Cerny reveals, confirming that it's DDR3 in nature. "On a standard model, if you're switching between an application, such as Netflix, and a game, Netflix is still in system memory even when you're playing the game. We use that architecture because it allows for a very quick swap between applications. Nothing needs to be loaded, it's already in memory."
 
I like the ID-mapping for better AA and checkerboard rendering, temporal hints for a better match. Maybe it looks simple, but in theory would this also help object-aware reprojection, parallax, translation, etc.. for VR? It could solve the remaining flaw of their reprojection algorithm (as far as I can tell). Maybe even allow 40 to 120 reprojection.
 
I was interpreting that as the rendering at an intermediate resolution and the using checkerboarding route, and then downsampling. The base sampling rate of opaque geometry would be higher than straight 1080p, and then there is the data calculated from the projected quads.
Is the additional information with some delta from true sampling necessarily worse than having no data?

There is no intermediate resolution with checkerboard rendering. You are sampling at half your targeted framebuffer, but you never "upsample" anything to create the final image. In the case of titles that target 1800p instead of the full 2160p and use cherckerboarding, that would scale directly down to 1080p on an HDTV set rather than up to 2160p un UHD sets.
 
Oh, and they answered the question about DCC
"DCC - which is short for delta colour compression - is a roadmap feature that's been improved for Polaris. It's making its PlayStation debut in PS4 Pro," Mark Cerny shares, confirming in the process that this feature was not implemented in the standard PS4 model.

And Cerny explained how checkerboard rendering is h/w assisted in the pro
Checkerboarding up to full 4K is more demanding and requires half the basic resolution - a 1920x2160 buffer - but with access to the triangle and object data in the ID buffer, beautiful things can happen as technique upon technique layers over the base checkerboard output.

"First, we can do the same ID-based colour propagation that we did for geometry rendering, so we can get some excellent spatial anti-aliasing before we even get into temporal, even without paying attention to the previous frame, we can create images of a higher quality than if our 4m colour samples were arranged in a rectangular grid... In other words, image quality is immediately better than 1530p," Cerny explains earnestly.

"Second, we can use the colours and the IDs from the previous frame, which is to say that we can do some pretty darn good temporal anti-aliasing. Clearly if the camera isn't moving we can insert the previous frame's colours and essentially get perfect 4K imagery. But even if the camera is moving or parts of the scene are moving, we can use the IDs - both object ID and triangle ID to hunt for an appropriate part of the previous frame and use that. So the IDs give us some certainty about how to use the previous frame. "

Seems like the ID buffer is the game changer in the Pro.
 
Last edited:
Seems like the ID buffer is the game changer in the Pro.
Certainly one of the more interesting changes.
I wonder if it's full 32bit ID-buffer, it should compress very nicely so it might be plausible.

Should be excellent for TAA and post processing etc.
Actually, it could be interesting in almost anything where you render to buffer and use later.
 
So it's just two PS4 GPU's, literally! That's how you got to 4.2 TF.

1.834X2X(911/800)=4.177.


I mean I guess it stands to reason, and I probably had read the CU count was exactly doubled, but I never thought of it like that before.

But apparently the software API isn't allowing them to do a "just works and takes advantage of extra power available" solution like Xbox S does with Xbox One games, so they have to do a 1.6/800 downclocked compatibility mode for PS4 titles. They dun goofed there. Likely also explains why PS4 slim got no clock bump at all while Xbox S did. It wouldn't have done them any good.
 
So it's just two PS4 GPU's, literally! That's how you got to 4.2 TF.

1.834X2X(911/800)=4.177.


I mean I guess it stands to reason, and I probably had read the CU count was exactly doubled, but I never thought of it like that before.

But apparently the software API isn't allowing them to do a "just works and takes advantage of extra power available" solution like Xbox S does with Xbox One games, so they have to do a 1.6/800 downclocked compatibility mode for PS4 titles. They dun goofed there. Likely also explains why PS4 slim got no clock bump at all while Xbox S did. It wouldn't have done them any good.
No about the GPU, not sure about the CPU yet.

We just turn off half the GPU and run it at something quite close to the original GPU.

"quite close" so probably slightly above 800mhz. Seems more complicated than it looks.
 
Nice array of add-ons. ID Buffers sounds like the biggest thing they added, a fully custom part for analysing data across frames.
 
Combining downsampling with upsampling/reconstruction seems like a very questionable idea to me.
They are rendering the '4K' buffer with checkboard, and using that for 1080p output by downsampling, instead of rendering a 1080p native buffer. Downsampling checkboard should be pretty effective, I imagine.
 
Secret Sauce Confirmed! :runaway:
I am also intrigued by one of the other secrets sauces, the new wavefront hardware scheduler derived from Vega. How much efficiency can be obtained from it.
Of course the half precision double rate secret sauce is also important as it will make the GPU behave better than a 1060/RX 480 if leveraged by devs and used where code allows to replace 32 bits ops by 16 bit ones without quality penalty.
 
Half-precision floating point is a relatively new binary floating-point format. Nvidia and Microsoft defined the half datatype in the Cg language, released in early 2002, and was the first to implement 16-bit floating point in silicon, with the GeForce FX, released in late 2002.
https://en.wikipedia.org/wiki/Half-precision_floating-point_format

Wasn't it also the Cell processor whose SPEs could do 4 16bit float ops in paralel? There are limitations to its use though - I could imagine that for 4k rendering you often need the precision of 32bit? But it is good to have the choice to use half where possible I suppose, and from the PS3 era I got the impression that 16bit is enough in many cases.

Certainly for HDR color operations I'd imagine using a 16bit float internally for every part of RGBA calculations could be very efficient, which is perhaps why it is also supported in Shaders?

I look forward to reading more comments from people unlike me that actually know what they are talking about. ;)
 
I'd imagine the heavier shaders are... well shaders (ie fragment programs), so running colour stuff @16bits will improve performance.
Unfortunately I do not have any way to test a similar setup at the time, I'd like to know how much it would help. HDR is FP16 most of the time.
 
I'd imagine the heavier shaders are... well shaders (ie fragment programs), so running colour stuff @16bits will improve performance.
Unfortunately I do not have any way to test a similar setup at the time, I'd like to know how much it would help. HDR is FP16 most of the time.
PS3 pixel shaders floating point ops were all 16 bits. So imagine the boost PS4 PRO supposses over PS3 programmed-like shaders with respect to the jump that suppossed OG PS4.
No wonder TLOU remaster can hit native 4K.

I foresee a lot of empty registers space in future base PS4 engines as taking advantage of 16 bits processing in PRO will force to use it also in OG PS4 as a base.
 
Wasn't it also the Cell processor whose SPEs could do 4 16bit float ops in paralel? There are limitations to its use though - I could imagine that for 4k rendering you often need the precision of 32bit? But it is good to have the choice to use half where possible I suppose, and from the PS3 era I got the impression that 16bit is enough in many cases.

Certainly for HDR color operations I'd imagine using a 16bit float internally for every part of RGBA calculations could be very efficient, which is perhaps why it is also supported in Shaders?

I look forward to reading more comments from people unlike me that actually know what they are talking about. ;)

https://forum.beyond3d.com/threads/nvidia-pascal-announcement.57763/page-95#post-1933199

Sebbbi gives some answers about int16/fp16 into this thread july 2016.
 
Wasn't it also the Cell processor whose SPEs could do 4 16bit float ops in paralel? There are limitations to its use though - I could imagine that for 4k rendering you often need the precision of 32bit? But it is good to have the choice to use half where possible I suppose, and from the PS3 era I got the impression that 16bit is enough in many cases.
Cell SPU had 128 wide SIMD (similar to SSE). Four 32 bit ALU ops (fp32 or int32) per cycle. It could also do eight 16 bit or sixteen 8 bit integer ops. It had no fp16 (half float) support.

PS3 GPU pixel shaders on the other hand had both fp16 and fp32 ALUs. fp32 was slow (in complex shaders). IIRC vertex shaders were pure fp32. It wasn't an unified shading architecture (it was based on 7000 series Geforce).
 
So it's just two PS4 GPU's, literally! That's how you got to 4.2 TF.

1.834X2X(911/800)=4.177.


I mean I guess it stands to reason, and I probably had read the CU count was exactly doubled, but I never thought of it like that before.

But apparently the software API isn't allowing them to do a "just works and takes advantage of extra power available" solution like Xbox S does with Xbox One games, so they have to do a 1.6/800 downclocked compatibility mode for PS4 titles. They dun goofed there. Likely also explains why PS4 slim got no clock bump at all while Xbox S did. It wouldn't have done them any good.

It's not, "just two PS4 GPUs, literally!" as half the Eurogamer article pointing out additional features on the new GPU shows quite clearly.

That an Xbox One game ""just works and takes advantage of extra power available" on the Xbox One S probably has a lot more to do with the hardware differences being minimal than it does with the system's APIs.
 
Status
Not open for further replies.
Back
Top