PS4 Pro Official Specifications (Codename NEO)

Status
Not open for further replies.
Is there a compliant logluv /Nao32 type of format that would have the same memory footprint as FP16 and could get decent results?
Would that require ALUs to natively accept something other than ints and standard FP? I mean it would need to convert the values back to linear-to-light before doing computations with it, and the result reconverted to this log format before storing?
 
This isn't as revolutionary as a couple of you guys think. There was a time when all 3dfx accelerator pci cards could only processes fp16.

Here is a comparison of alu possessing and blending color at fp32 (left) vs fp16 (right).
PowerVRGPU_PowerVR_SGX_RK3168_RGBA888-vs-RGB565-1.jpg


When you start using 16 bit precision in various aspects of your visuals you are limited to certain things otherwise you'll end up with visuals reminiscent of early 2000s games. Probably don't want to do too much shading at 16bit precision.
Not right. 16 bits in this case means 16 bits per pixel (total) = R5G6B5. 32 bits in this case means 32 bits per pixel = 8 bits per channel = R8G8B8A8.

Nobody uses 32 bit float render targets (R32B32G32A32 = 128 bits per pixel). 128 bpp rendering is very slow. ROP output is 1/4 rate and texture filtering is 1/4 rate (on both GTX 1080 and RX 480).

These old GPUs did math at 10/12 bit fixed point precision. Floating point HDR rendering was not supported at all. No fp16 and definitely no fp32. Radeons had only fp24 ALUs until DX10 mandated fp32 math. SM2.0 (DX9) was the first shader model to support floating point processing.

Also these marketing images have dithering disabled. With dither, the banding is greatly reduced. Dithering is still useful. Especially when combined with temporal antialiasing. TAA is excellent in filtering out dither (8xTAA recovers 3 bits of extra color depth from random dither).
 
Would that require ALUs to natively accept something other than ints and standard FP? I mean it would need to convert the values back to linear-to-light before doing computations with it, and the result reconverted to this log format before storing?
I've no idea ;) but I wonder. The format would be 5bits and 5 bits (U and V) and the log of the luminance would be stored on 6 bits.
 
Last edited:
Is there a compliant logluv /Nao32 type of format that would have the same memory footprint as FP16 and could get decent results?
RGBA16f requires 2x bandwidth compared to LogLUV (and is slightly higher quality). Similar format to LogLUV exists (DXGI_FORMAT_R9G9B9E5_SHAREDEXP). See below.
Would that require ALUs to natively accept something other than ints and standard FP? I mean it would need to convert the values back to linear-to-light before doing computations with it, and the result reconverted to this log format before storing?
LogLUV is not directly compatible with fixed point and floating point math. LogLUV luminance (16 bit) is logarithmic, while the UV (8 bits each) are normalized integers (= fixed point). Floating point on the other hand is piecewise linear approximation of logarithmic (exponent is logarithmic and mantissa is linear). You could get similar results as LogLUV with an 32 bit (per pixel) image format consisting of 16 bit float luminance and 8+8 bit UV (float however loses one bit for sign).

DXGI_FORMAT_R9G9B9E5_SHAREDEXP is similar to LogLUV. 32 bits per pixel and has shared "luminance". Shared exponent is 5 bits (just like fp16 exponent), and mantissas (for each rgb channel) are 9 bit (vs 10 bits in fp16). GPUs can natively filter textures of this format, but unfortunately cannot render to it. It is close in quality to RGBA16f, but requires only half the bandwidth.

There's only float and integer ALUs in GPUs, but texture filtering and ROPs have format conversions to other formats. For example sRGB is not linear, but you can still filter and render to sRGB formats. Texture filtering unit converts texels to floating point (or fixed point) before filtering them. ROPs do the same. LogLUV could be supported as a texture/RT format, but I doubt this will happen, since DXGI_FORMAT_R9G9B9E5_SHAREDEXP practically the same thing and is straightforward to convert to RGBA16f (bitscan left to find first set bits of each rgb channel to convert denormal numbers to normalized floats). GPUs already have filtering and ROP blend hardware to handle RGBA16f, so this shouldn't cost much extra. Logarithmic math units for texture
 
These old GPUs did math at 10/12 bit fixed point precision. Floating point HDR rendering was not supported at all. No fp16 and definitely no fp32. Radeons had only fp24 ALUs until DX10 mandated fp32 math. SM2.0 (DX9) was the first shader model to support floating point processing.

The Geforce FX / NV3x (Radeon R300 contemporaries) pixel shaders actually had FP32 ALUs which supported FP16 or FP32 operations.
But they were dirt-slow at FP32, supposedly because of memory bandwidth limitations. And when standard SM2.0 24bit shaders were used in games, those GPUs had to do all FP24 operations at FP32 precision, which is (one of the reasons) why performance in NV3x cards was generally pretty bad at DX9.

I don't know the exact reason why nvidia went with FP32 ALUs back in the day. Maybe OpenGL 2.0's fragment shaders supported both FP32 and FP16 and nvidia was shooting for full compliance on both APIs?
 
The Geforce FX / NV3x (Radeon R300 contemporaries) pixel shaders actually had FP32 ALUs which supported FP16 or FP32 operations.
But they were dirt-slow at FP32, supposedly because of memory bandwidth limitations. And when standard SM2.0 24bit shaders were used in games, those GPUs had to do all FP24 operations at FP32 precision, which is (one of the reasons) why performance in NV3x cards was generally pretty bad at DX9.
Yes, Geforce FX 5800 was the first Nvidia card with floating point ALUs. It was the first Nvidia DX9 SM 2.0 compatible (SM 2.X actually) card. Nvidia kept their FP16/FP32 design (half rate FX32) in FX 6000 and FX 7000 series (PS3 GPU is based on FX 7000 series).

Geforce 4000 series was DX8 / SM 1.2 (IIRC) and Radeon 8000 series was DX8 / SM 1.4. IIRC the fixed point type in SM1 (1.0-1.4) was limited to [-8,+8] range. 12 bit fixed point math was thus enough. IIRC texture tiling (UV range) was also limited to 8 (could not repeat texture more than that as ALUs wouldn't have had range to calculate UVs).

Vertex shaders had 32 bit floating point ALUs in DX8 (I think this was mandated). Coordinate transformation needs good precision float math. Vertex shaders also had separate instruction set (and GPUs had separate vertex shader and pixel shader hardware). I remember SM 2.0 very well. It allowed 64 instructions and float math to pixel shaders. A huge improvement over previous shaders. I still remember writing hand tuned DX ASM to fit my lighting math to those 64 intruction slots. It felt like solving a puzzle :D

Result, running on Radeon 9700 Pro (too bad there's no better quality video available):
 
Yes, Geforce FX 5800 was the first Nvidia card with floating point ALUs. It was the first Nvidia DX9 SM 2.0 compatible (SM 2.X actually) card. Nvidia kept their FP16/FP32 design (half rate FX32) in FX 6000 and FX 7000 series (PS3 GPU is based on FX 7000 series).

Geforce 4000 series was DX8 / SM 1.2 (IIRC) and Radeon 8000 series was DX8 / SM 1.4. IIRC the fixed point type in SM1 (1.0-1.4) was limited to [-8,+8] range. 12 bit fixed point math was thus enough. IIRC texture tiling (UV range) was also limited to 8 (could not repeat texture more than that as ALUs wouldn't have had range to calculate UVs).

Vertex shaders had 32 bit floating point ALUs in DX8 (I think this was mandated). Coordinate transformation needs good precision float math. Vertex shaders also had separate instruction set (and GPUs had separate vertex shader and pixel shader hardware). I remember SM 2.0 very well. It allowed 64 instructions and float math to pixel shaders. A huge improvement over previous shaders. I still remember writing hand tuned DX ASM to fit my lighting math to those 64 intruction slots. It felt like solving a puzzle :D

Result, running on Radeon 9700 Pro (too bad there's no better quality video available):

I felt like I was watching something that overtime became Trials


Sent from my iPhone using Tapatalk
 
That's quite a poorly written article indeed. I don't think the author knows what is exactly checkerboard rendering, he still think it's 4 pixels extrapolated to 16 pixels.

And the parts where he 'recites' Microsoft PR about their Scorpio spec sheet are quite funny.
 

They're doubling down on 2*SP rate on the Pro spec:

But one comment in particular from a Sony developer I spoke to at the PlayStation Meeting stands out - that there's more to the PlayStation 4 Pro than the checkerboard upscaling alone, along with Mark Cerny's comment that the new hardware has "adopted many new features from the AMD Polaris architecture as well as several even beyond it". We already know that the revised AMD GCN cores available in the PS4 Pro are able to process two 16-bit floating point operations in the time taken for the base PS4 hardware to complete one, meaning that revised, Pro-optimised shader code can be much faster.

I wonder how much of a game changer this could be.
If developers were already using some FP16 calculations on the original PS4 (assuming Liverpool already had GCN2's FP16-specific instructions, IIRC for lower bandwidth requirements and lower latencies?), then maybe this code will just naturally run faster on the Pro.
 
That's quite a poorly written article indeed. I don't think the author knows what is exactly checkerboard rendering, he still think it's 4 pixels extrapolated to 16 pixels.

And the parts where he 'recites' Microsoft PR about their Scorpio spec sheet are quite funny.

I put my blond wig on.. is this checkerboard stuff something like this:

http://twvideo01.ubm-us.net/o1/vaul...s/El_Mansouri_Jalal_Rendering_Rainbow_Six.pdf

Sounds complex to me. It's not just a spatial interpolation, but temporal as well. Looks like developers have already done this to fix AA issues
(and save GPU cycles). So Sony gives the impression some algo has basically been "baked" into the GPU ? Is this really true ? Isn't it just some post-processing offered via a standard library ?
 
I put my blond wig on.. is this checkerboard stuff something like this:

http://twvideo01.ubm-us.net/o1/vaul...s/El_Mansouri_Jalal_Rendering_Rainbow_Six.pdf

Sounds complex to me. It's not just a spatial interpolation, but temporal as well. Looks like developers have already done this to fix AA issues
(and save GPU cycles). So Sony gives the impression some algo has basically been "baked" into the GPU ? Is this really true ? Isn't it just some post-processing offered via a standard library ?
Spatial + temporal just like interlacing, but better. This GDC presentation has been already discussed in another thread. It's a good technique. I am sure many games will adapt it in future and not just for 4K rendering. It works fine at 1080p. It is also a nice technique for PC, as it allows older GPUs to reach native 1440p and 4K monitor outputs (without scaling).
 
Nobody uses 32 bit float render targets (R32B32G32A32 = 128 bits per pixel). 128 bpp rendering is very slow. ROP output is 1/4 rate and texture filtering is 1/4 rate (on both GTX 1080 and RX 480).

Are you sure about the RX 480 figure? GCN SI is 1/2 rate for non-blended writes, or were you referring specifically to blended 128bpp writes (where 1/4 rate is correct).
 
Are you sure about the RX 480 figure? GCN SI is 1/2 rate for non-blended writes, or were you referring specifically to blended 128bpp writes (where 1/4 rate is correct).
Yes. Blended writes and bilinear filtered reads. IIRC all non-blended (float, int and unorm) 32 bit & 64 bit writes are full rate and 128 bit writes are half rate. Correct me if I am wrong. I mostly write compute shaders nowadays. If I ROP output something, it is mostly bit packed g-buffer data to 64 bpp uint target (full rate on GCN).
 
Rainbow Six Siege implemented it already. It seems to rely on the hardware being programmable enough to render on the samples of a 2x MSAA target, plus the flexibility to calculate values for the pixels being projected.

It seems like the Pro might have features at the platform or hardware level that help facilitate it, but it's not new or isolated to refresh consoles.
 
I wonder when checkerboard rendering was invented and now that the PS3 and X360 era is gone, why hasn't been used before? Is it only suited for or useful for 4k consoles?
I suggest reading this GDC presentation by Jalal Eddine El Mansouri. It has detailed description of the checkerboard rendering technique: http://www.gdcvault.com/play/1022990/Rendering-Rainbow-Six-Siege

As far as I know this particular kind of checkerboard rendering was invented a few years ago at Ubisoft. Of course every new rendering technique borrows/adapts ideas from others, so it is hard to say who exactly invented it. Killzone Shadow Fall's interlacing (1080p 60 fps multiplayer mode on PS4) was similar, but a slightly less advanced technique. I believe Drobot's research (he was also working for Killzone SF before joining Ubisoft) influenced the whole real time rendering field. His 2014 article (https://michaldrobot.com/2014/08/13/hraa-siggraph-2014-slides-available/) is a must read. This technique used both MSAA subsample tricks and temporal reconstruction. Brian Karis' (Epic/Unreal) temporal supersampling article was also highly influential (https://de45xmedrsdbp.cloudfront.net/Resources/files/TemporalAA_small-59732822.pdf). Jalal's presentation also mentions my research (as a reference): http://advances.realtimerendering.c...siggraph2015_combined_final_footer_220dpi.pdf. Our 8xMSAA trick (two samples per pixel) could be seen as subpixel checkerboarding (regarding to antialiasing).

Rainbow Six Siege was released one year ago on Xbox One and PS4. It used 1080p checkerboard rendering. 4K obviously makes pixels 4x smaller, making checkerboarding even more valid technique. Even if reprojection fails (= areas not visible last frame), 4K checkerboard still results in 2x higher pixel density than 1080p.
Rainbow Six Siege implemented it already. It seems to rely on the hardware being programmable enough to render on the samples of a 2x MSAA target, plus the flexibility to calculate values for the pixels being projected.
Yes. You need per sample frequency shading if you are going to use the common (2xMSAA) way of implementing it. You don't need programmable sampling patterns, since you can shift the render target by one pixel to the left (0<->1 pixel alternating projection matrix x offset). The standard 2xMSAA pattern is exactly what you want (https://msdn.microsoft.com/en-us/library/windows/desktop/ff476218(v=vs.85).aspx). 2xMSAA checkerboarding requires one pixel wider render target (first or last row is alternatively discarded). Jalal's presentation has some images about this.

DirectX 10 added support for sample frequency shading (SV_SampleIndex). DX10 also added support for reading individual samples from MSAA textures (Texture2DMS.Load). Any DirectX 10 compatible hardware is able to perform checkerboard rendering. Last gen consoles had DX9 feature sets, didn't support per sample shading and didn't have standardized MSAA patterns. As there were no DX10 consoles, this makes Xbox One and PS4 the first consoles capable of this technique.
 
Last edited:
Status
Not open for further replies.
Back
Top