General Next Generation Rumors and Discussions [Post GDC 2020]

Discussion in 'Console Industry' started by BRiT, Mar 18, 2020.

  1. Barrabas

    Regular Newcomer

    Joined:
    Jul 29, 2005
    Messages:
    315
    Likes Received:
    272
    Location:
    Norway
    To be fair Sony managed to get PS4 games (18CU) to work on Pro's 36 CU's (enhanced mode), but I agree that it seems Sony might have a tougher road ahead for BC than MS. Maybe as it stands now they need SE's with a multiply of 18 CU's like 36, 54, 72 and so on? if so that will certainly limit their choices. If this really is a problem for Sony I wonder if BC is worth it at all for them. Time will tell.
     
  2. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    2,093
    Likes Received:
    954
    Location:
    Earth
    How certain are we that ps5 CU count is due to BC and not due to designing around specific pricepoint? It seems inevitable that there is going to be pro model with higher CU count. If that model is not BC then it's all kinds of bad for sony.

    If sony had higher cu count wouldn't they be able to disable some cu's with software when running in BC mode? i.e. limit cu count with sw not by limiting hw design to 36CU.
     
  3. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    10,460
    Likes Received:
    10,138
    Location:
    The North
    Its probably the right time to talk about SFS since we're on texture streaming I do have a question to postulate:

    Some background first:
    https://microsoft.github.io/DirectX-Specs/d3d/SamplerFeedback.html

    Sampler feedback is one feature with two distinct usage scenarios: streaming and texture-space shading.
    Use of sampler feedback with streaming is sometimes abbreviated as SFS. It is also sometimes called sparse feedback textures, or SFT, or PRT+, which stands for “partially resident textures”.

    The coles notes on it is that:
    GPUs loading only portions of textures demanded. So it's asking the SSD to pull only the parts of the textures it needs into memory not the whole texture. For streaming worlds where textures can be in the GB range for a large areas, this will save a lot of bandwidth and space.

    My question:
    I think this is an interesting bit because the textures are compressed for XSX and likely won't be on the PC space (since it's going to just pull from main memory into video memory I suspect)
    XSX leveraging SFS with compressed textures, so are they:
    a) recalling the entire compressed texture, decompressing it and then picking what it needs and loading that into memory? (not that impressive)
    b) only recalling the part of the texture it needs even while compressed? (very impressive)
     
    #583 iroboto, Mar 25, 2020
    Last edited: Mar 25, 2020
  4. BRiT

    BRiT Verified (╯°□°)╯
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    15,543
    Likes Received:
    14,093
    Location:
    Cleveland
    We don't, but if it was targeting price-point first, why go chasing TFlops that may possibly require extensive cooling solutions?
     
    egoless, PSman1700 and AzBat like this.
  5. Barrabas

    Regular Newcomer

    Joined:
    Jul 29, 2005
    Messages:
    315
    Likes Received:
    272
    Location:
    Norway
    They can always butterfly the PS5 Pro for 72 CU's:shock:, but Sony seems to have chosen a path on narrower with higher clocks. I think the disabling must be whole streaming element SE?
     
  6. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    2,093
    Likes Received:
    954
    Location:
    Earth
    We don't know enough. Could be it was cheapest solution to get the performance sony needed. Could be MS surprised sony and they had to do what they had to do with the chip they have. Could also be that sony wanted to leave more room between hypotethical pro model and base model to make selling the pro model easier(and make base model cheaper if possible)
     
    turkey likes this.
  7. BRiT

    BRiT Verified (╯°□°)╯
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    15,543
    Likes Received:
    14,093
    Location:
    Cleveland
    @manux right, we don't know enough. Unfortunately Sony has not shown us the retail case or the cooling. So we'll just have to wait until such a time that we can analyze. :(
     
    manux likes this.
  8. Silenti

    Regular

    Joined:
    May 25, 2005
    Messages:
    507
    Likes Received:
    94
    Looking forward to more on this. Question, and you seem the person to ask. What about combining the above with the ML texture upscaling? They were talking about shipping with low-res textures and just letting the ML upscale it at runtime and the result was "scary good". If the ML must be trained on each different set of textures, which is what was stated in the interview, is that something that studios beyond 1st party and the AAA industry will be able to afford? From some of comments around, this may be quite expensive. Just thought I would pick your brain on this one.
     
  9. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    10,460
    Likes Received:
    10,138
    Location:
    The North
    Assuming they could have a model that would be successful in doing that; you're trading off compute power as well as restricting yourself to a set amount of time for that upscale to complete. ML algorithms have generally the same performance despite their inputs. To upscale the the textures into memory is possible but you're always going to be dedicating some portion of your compute power to do it. This doesn't need to be done by the GPU necessarily, so it is a function that the CPU could do in theory. If the goal is to upscale only a fraction of a texture this becomes increasingly feasible.

    So if you asked to perform ML upscale on a massive 4K texture, 8MB worth of data, this will take too long to be usable.
    If you are virtual texturing and your set the tile size to something manageable, suddenly ML/AI up-resolution becomes more believable from a performance standpoint. The quality of the scale will depend on how good the training is.
     
    Silenti, PSman1700 and AzBat like this.
  10. Esrever

    Regular Newcomer

    Joined:
    Feb 6, 2013
    Messages:
    766
    Likes Received:
    527
    The Memory bandwidth to the GPU is going to be the biggest bottleneck and doesn't matter what the SSD speed is for rendering. You can load things from disk twice as fast but if there isn't bandwidth to the GPU, how are you going to even use it?
     
  11. DSoup

    DSoup meh
    Legend Veteran Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    12,484
    Likes Received:
    7,727
    Location:
    London, UK
    All things being equal I would expect an drop in size of steaming buffers relative to the available RAM pool size. When your I/O is so slow, and you risk ugly LOD and textures, devs are probably quite pessimistic in their algorithms about what data might be needed in 10-30 seconds, so you cast that net wider. You're likely streaming in lots of stuff that you don't need. Removing the I/O constraint should reduce this. There is talk about pulling in data for things directly behind you whilst you're avatar turning on the spot and this is fairly nuts compared to what we have now.

    You could definitely design a game that would do this, but I'm not sure why you would. Imagine if you could move as fast as in Wipeout through Horizons Zero Dawn's world, would 5Gb/sec cut it? I don't know, but there is always a point when with X bytes more won't fit in RAM or within your streaming budget or your available RAM bandwidth.
     
    VitaminB6 and iroboto like this.
  12. Jay

    Jay
    Veteran Regular

    Joined:
    Aug 3, 2013
    Messages:
    2,612
    Likes Received:
    1,674
    Should have the link to article with lot more detail
    https://forum.beyond3d.com/posts/2113795/
     
    Silenti likes this.
  13. Lurkmass

    Newcomer

    Joined:
    Mar 3, 2020
    Messages:
    111
    Likes Received:
    98
    Then there's the fact that consoles practice 'offline' compilation model where games compile their HLSL/PSSL shaders into native bytecode so games automatically ship GCN2 binaries.

    On PC, developers practice an 'online' compilation model where they compile HLSL/GLSL shaders into an intermediate representation such as DXIL or SPIR-V. This intermediate representation is then further compiled by each different vendor's shader compiler during runtime.

    Sometimes it's easier for developers to be not technically curated so often since constant software maintenance is a burden otherwise you end up with Apple's ecosystem where software compatibility just consistently breaks.

    It's definitely convenient to make BC software more scalable this way but at the end of the day Sony is still promising BC with PS4 software even if they may not necessarily have improved performance.
     
    Barrabas and DavidGraham like this.
  14. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,804
    Likes Received:
    1,092
    Location:
    Guess...
    PCIe 4.0 drives are already hitting 5GB/s with 7GB/s expected by the time the new consoles hit the market. PCIe 5.0 is due out in 2021 with first gen drives likely to be hitting 10+ GB/s.
     
    egoless, Proelite and PSman1700 like this.
  15. dobwal

    Legend Veteran

    Joined:
    Oct 26, 2005
    Messages:
    5,376
    Likes Received:
    1,407
    Texture block compression has been around for decades with support in windows and nvidia/amd gpus. The utility of block compression is it offers random access as each texture is broken into tiles during compression and can be read into the gpu and decompressed. The traditional problem with texture block compression is quality is easily lost the more you compress the texture in these format. JPEG is easily more compressible while maintaining quality but you have to decompress at runtime. To get around this, new solutions were developed. One of the most notable is to use multiple compression steps involving block compression+RDO and lossless compression. First the texture is compressed into a block compression format using rate-distortion optimization (RDO). RDO basically acts a quality metric that helps determine how readily a tile can be compressed with minimal quality loss. You end up with a more highly compressed texture. The texture is further compressed with a lossless format into what some call a super compressed texture.
     
    turkey, iroboto, VitaminB6 and 2 others like this.
  16. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,804
    Likes Received:
    1,092
    Location:
    Guess...
    Presumably though the faster the raw speed of the drive, the more processing power it takes to decompress the stream? So how would current decompression solutions handle streaming from a top end NMVe drive vs a HDD for example?
     
  17. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    10,460
    Likes Received:
    10,138
    Location:
    The North
    Are you referring to DXT?
     
  18. MrFox

    MrFox Deludedly Fantastic
    Legend Veteran

    Joined:
    Jan 7, 2012
    Messages:
    6,427
    Likes Received:
    5,836
    Which happens when you stop turning after you turned 180, or 90 which happens all the time. If it takes a quarter of a second more to load the last mipmap level, you get blurring after every turn you make and it settles back sharp after a quarter second.

    In any case there will be a maximum turning speed allowed with 5.5GB/s so that the blurring and artifacting is not perceptible if they release half of the assets behind the player continuously, and 2.4GB/s will have a maximum turning speed over twice as slow to avoid artifacts.

    I don't know what the limits are, nor what compression details will change the raw figures into a real world benchmark, but it can't be 5.5 is not enough to do anything other than loading faster, and at the same time 2.4 is more tha enough for streaming frustum assets in the way Cerny was presenting at GDC. PS5 can't be both wastefully fast and too slow.
     
    egoless and zupallinere like this.
  19. dobwal

    Legend Veteran

    Joined:
    Oct 26, 2005
    Messages:
    5,376
    Likes Received:
    1,407
    Intel released a paper showing a single core i5 decompressing zlib at 4.5 GB/s

    Yep. BC1, BC2 and BC3 are just DXT1, DXT3 and DXT5. People have found find new ways to overcome the 1:6 (BC1) to 1:4 (BC2-BC7) compression ratios.
     
    BRiT and iroboto like this.
  20. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    10,460
    Likes Received:
    10,138
    Location:
    The North
    You’re going to have to give me an example in a game in which turning fast enough that the texturing can’t keep up and loads in blurred before regaining focus. I haven’t seen it before. I feel like we’re discussing MIPs quality instead of access speed.
     
    PSman1700 likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...