PS4 Pro Official Specifications (Codename NEO)

Discussion in 'Console Technology' started by Clukos, Sep 7, 2016.

Tags:
Thread Status:
Not open for further replies.
  1. MrFox

    MrFox Deludedly Fantastic
    Legend Veteran

    Joined:
    Jan 7, 2012
    Messages:
    5,424
    Likes Received:
    3,929
    Would that require ALUs to natively accept something other than ints and standard FP? I mean it would need to convert the values back to linear-to-light before doing computations with it, and the result reconverted to this log format before storing?
     
  2. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    Not right. 16 bits in this case means 16 bits per pixel (total) = R5G6B5. 32 bits in this case means 32 bits per pixel = 8 bits per channel = R8G8B8A8.

    Nobody uses 32 bit float render targets (R32B32G32A32 = 128 bits per pixel). 128 bpp rendering is very slow. ROP output is 1/4 rate and texture filtering is 1/4 rate (on both GTX 1080 and RX 480).

    These old GPUs did math at 10/12 bit fixed point precision. Floating point HDR rendering was not supported at all. No fp16 and definitely no fp32. Radeons had only fp24 ALUs until DX10 mandated fp32 math. SM2.0 (DX9) was the first shader model to support floating point processing.

    Also these marketing images have dithering disabled. With dither, the banding is greatly reduced. Dithering is still useful. Especially when combined with temporal antialiasing. TAA is excellent in filtering out dither (8xTAA recovers 3 bits of extra color depth from random dither).
     
    Cyan, Clukos, Aaron Elfassy and 4 others like this.
  3. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,723
    Likes Received:
    193
    Location:
    Stateless
    I've no idea ;) but I wonder. The format would be 5bits and 5 bits (U and V) and the log of the luminance would be stored on 6 bits.
     
    #283 liolio, Oct 5, 2016
    Last edited: Oct 5, 2016
  4. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    RGBA16f requires 2x bandwidth compared to LogLUV (and is slightly higher quality). Similar format to LogLUV exists (DXGI_FORMAT_R9G9B9E5_SHAREDEXP). See below.
    LogLUV is not directly compatible with fixed point and floating point math. LogLUV luminance (16 bit) is logarithmic, while the UV (8 bits each) are normalized integers (= fixed point). Floating point on the other hand is piecewise linear approximation of logarithmic (exponent is logarithmic and mantissa is linear). You could get similar results as LogLUV with an 32 bit (per pixel) image format consisting of 16 bit float luminance and 8+8 bit UV (float however loses one bit for sign).

    DXGI_FORMAT_R9G9B9E5_SHAREDEXP is similar to LogLUV. 32 bits per pixel and has shared "luminance". Shared exponent is 5 bits (just like fp16 exponent), and mantissas (for each rgb channel) are 9 bit (vs 10 bits in fp16). GPUs can natively filter textures of this format, but unfortunately cannot render to it. It is close in quality to RGBA16f, but requires only half the bandwidth.

    There's only float and integer ALUs in GPUs, but texture filtering and ROPs have format conversions to other formats. For example sRGB is not linear, but you can still filter and render to sRGB formats. Texture filtering unit converts texels to floating point (or fixed point) before filtering them. ROPs do the same. LogLUV could be supported as a texture/RT format, but I doubt this will happen, since DXGI_FORMAT_R9G9B9E5_SHAREDEXP practically the same thing and is straightforward to convert to RGBA16f (bitscan left to find first set bits of each rgb channel to convert denormal numbers to normalized floats). GPUs already have filtering and ROP blend hardware to handle RGBA16f, so this shouldn't cost much extra. Logarithmic math units for texture
     
    Cyan, BRiT, Heinrich4 and 2 others like this.
  5. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,839
    Likes Received:
    4,455
    The Geforce FX / NV3x (Radeon R300 contemporaries) pixel shaders actually had FP32 ALUs which supported FP16 or FP32 operations.
    But they were dirt-slow at FP32, supposedly because of memory bandwidth limitations. And when standard SM2.0 24bit shaders were used in games, those GPUs had to do all FP24 operations at FP32 precision, which is (one of the reasons) why performance in NV3x cards was generally pretty bad at DX9.

    I don't know the exact reason why nvidia went with FP32 ALUs back in the day. Maybe OpenGL 2.0's fragment shaders supported both FP32 and FP16 and nvidia was shooting for full compliance on both APIs?
     
  6. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    Yes, Geforce FX 5800 was the first Nvidia card with floating point ALUs. It was the first Nvidia DX9 SM 2.0 compatible (SM 2.X actually) card. Nvidia kept their FP16/FP32 design (half rate FX32) in FX 6000 and FX 7000 series (PS3 GPU is based on FX 7000 series).

    Geforce 4000 series was DX8 / SM 1.2 (IIRC) and Radeon 8000 series was DX8 / SM 1.4. IIRC the fixed point type in SM1 (1.0-1.4) was limited to [-8,+8] range. 12 bit fixed point math was thus enough. IIRC texture tiling (UV range) was also limited to 8 (could not repeat texture more than that as ALUs wouldn't have had range to calculate UVs).

    Vertex shaders had 32 bit floating point ALUs in DX8 (I think this was mandated). Coordinate transformation needs good precision float math. Vertex shaders also had separate instruction set (and GPUs had separate vertex shader and pixel shader hardware). I remember SM 2.0 very well. It allowed 64 instructions and float math to pixel shaders. A huge improvement over previous shaders. I still remember writing hand tuned DX ASM to fit my lighting math to those 64 intruction slots. It felt like solving a puzzle :D

    Result, running on Radeon 9700 Pro (too bad there's no better quality video available):
     
  7. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    7,771
    Likes Received:
    6,056
    I felt like I was watching something that overtime became Trials


    Sent from my iPhone using Tapatalk
     
  8. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    Time goes so fast. Can't believe that was 14 years ago :)
     
    iroboto likes this.
  9. Heinrich4

    Regular

    Joined:
    Aug 11, 2005
    Messages:
    596
    Likes Received:
    9
    Location:
    Rio de Janeiro,Brazil
    Maybe belongs here (maybe I dont understand,translate etc: ps4 pro not use 16nm? ): http://m.pclab.pl/art71414.html

    Edited: sorry all,my bad...Its only (good) article about Ps4 slim.
     
    #289 Heinrich4, Oct 6, 2016
    Last edited: Oct 6, 2016
  10. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    40,601
    Likes Received:
    11,013
    Location:
    Under my bridge
  11. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    6,975
    Likes Received:
    3,051
    Location:
    Pennsylvania
    Doesn't paint a very good picture of the imminent Pro.
     
    RootKit likes this.
  12. Globalisateur

    Globalisateur Globby
    Veteran Regular

    Joined:
    Nov 6, 2013
    Messages:
    2,898
    Likes Received:
    1,627
    Location:
    France
    That's quite a poorly written article indeed. I don't think the author knows what is exactly checkerboard rendering, he still think it's 4 pixels extrapolated to 16 pixels.

    And the parts where he 'recites' Microsoft PR about their Scorpio spec sheet are quite funny.
     
    sebbbi likes this.
  13. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,839
    Likes Received:
    4,455
    They're doubling down on 2*SP rate on the Pro spec:

    I wonder how much of a game changer this could be.
    If developers were already using some FP16 calculations on the original PS4 (assuming Liverpool already had GCN2's FP16-specific instructions, IIRC for lower bandwidth requirements and lower latencies?), then maybe this code will just naturally run faster on the Pro.
     
  14. loekf

    Regular

    Joined:
    Jun 29, 2003
    Messages:
    613
    Likes Received:
    61
    Location:
    Nijmegen, The Netherlands
    I put my blond wig on.. is this checkerboard stuff something like this:

    http://twvideo01.ubm-us.net/o1/vaul...s/El_Mansouri_Jalal_Rendering_Rainbow_Six.pdf

    Sounds complex to me. It's not just a spatial interpolation, but temporal as well. Looks like developers have already done this to fix AA issues
    (and save GPU cycles). So Sony gives the impression some algo has basically been "baked" into the GPU ? Is this really true ? Isn't it just some post-processing offered via a standard library ?
     
    Cyan likes this.
  15. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    Spatial + temporal just like interlacing, but better. This GDC presentation has been already discussed in another thread. It's a good technique. I am sure many games will adapt it in future and not just for 4K rendering. It works fine at 1080p. It is also a nice technique for PC, as it allows older GPUs to reach native 1440p and 4K monitor outputs (without scaling).
     
    Heinrich4, BRiT, DavidGraham and 2 others like this.
  16. ajmiles

    Newcomer

    Joined:
    Feb 4, 2014
    Messages:
    7
    Likes Received:
    1
    Location:
    UK
    Are you sure about the RX 480 figure? GCN SI is 1/2 rate for non-blended writes, or were you referring specifically to blended 128bpp writes (where 1/4 rate is correct).
     
  17. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    Yes. Blended writes and bilinear filtered reads. IIRC all non-blended (float, int and unorm) 32 bit & 64 bit writes are full rate and 128 bit writes are half rate. Correct me if I am wrong. I mostly write compute shaders nowadays. If I ROP output something, it is mostly bit packed g-buffer data to 64 bpp uint target (full rate on GCN).
     
  18. Cyan

    Cyan orange
    Legend Veteran

    Joined:
    Apr 24, 2007
    Messages:
    8,572
    Likes Received:
    2,290
    I wonder when checkerboard rendering was invented and now that the PS3 and X360 era is gone, why hasn't been used before? Is it only suited for or useful for 4k consoles?
     
  19. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    Rainbow Six Siege implemented it already. It seems to rely on the hardware being programmable enough to render on the samples of a 2x MSAA target, plus the flexibility to calculate values for the pixels being projected.

    It seems like the Pro might have features at the platform or hardware level that help facilitate it, but it's not new or isolated to refresh consoles.
     
    I.S.T. and egoless like this.
  20. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    I suggest reading this GDC presentation by Jalal Eddine El Mansouri. It has detailed description of the checkerboard rendering technique: http://www.gdcvault.com/play/1022990/Rendering-Rainbow-Six-Siege

    As far as I know this particular kind of checkerboard rendering was invented a few years ago at Ubisoft. Of course every new rendering technique borrows/adapts ideas from others, so it is hard to say who exactly invented it. Killzone Shadow Fall's interlacing (1080p 60 fps multiplayer mode on PS4) was similar, but a slightly less advanced technique. I believe Drobot's research (he was also working for Killzone SF before joining Ubisoft) influenced the whole real time rendering field. His 2014 article (https://michaldrobot.com/2014/08/13/hraa-siggraph-2014-slides-available/) is a must read. This technique used both MSAA subsample tricks and temporal reconstruction. Brian Karis' (Epic/Unreal) temporal supersampling article was also highly influential (https://de45xmedrsdbp.cloudfront.net/Resources/files/TemporalAA_small-59732822.pdf). Jalal's presentation also mentions my research (as a reference): http://advances.realtimerendering.c...siggraph2015_combined_final_footer_220dpi.pdf. Our 8xMSAA trick (two samples per pixel) could be seen as subpixel checkerboarding (regarding to antialiasing).

    Rainbow Six Siege was released one year ago on Xbox One and PS4. It used 1080p checkerboard rendering. 4K obviously makes pixels 4x smaller, making checkerboarding even more valid technique. Even if reprojection fails (= areas not visible last frame), 4K checkerboard still results in 2x higher pixel density than 1080p.
    Yes. You need per sample frequency shading if you are going to use the common (2xMSAA) way of implementing it. You don't need programmable sampling patterns, since you can shift the render target by one pixel to the left (0<->1 pixel alternating projection matrix x offset). The standard 2xMSAA pattern is exactly what you want (https://msdn.microsoft.com/en-us/library/windows/desktop/ff476218(v=vs.85).aspx). 2xMSAA checkerboarding requires one pixel wider render target (first or last row is alternatively discarded). Jalal's presentation has some images about this.

    DirectX 10 added support for sample frequency shading (SV_SampleIndex). DX10 also added support for reading individual samples from MSAA textures (Texture2DMS.Load). Any DirectX 10 compatible hardware is able to perform checkerboard rendering. Last gen consoles had DX9 feature sets, didn't support per sample shading and didn't have standardized MSAA patterns. As there were no DX10 consoles, this makes Xbox One and PS4 the first consoles capable of this technique.
     
    #300 sebbbi, Oct 12, 2016
    Last edited: Oct 12, 2016
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...