General Next Generation Rumors and Discussions [Post GDC 2020]

Barrabas · Mar 25, 2020

Shifty Geezer said:
MS's abstraction allows them to run XB1 games on a GPU with a completely different CU count, where Sony's apparently does not.

To be fair Sony managed to get PS4 games (18CU) to work on Pro's 36 CU's (enhanced mode), but I agree that it seems Sony might have a tougher road ahead for BC than MS. Maybe as it stands now they need SE's with a multiply of 18 CU's like 36, 54, 72 and so on? if so that will certainly limit their choices. If this really is a problem for Sony I wonder if BC is worth it at all for them. Time will tell.

manux · Mar 25, 2020

How certain are we that ps5 CU count is due to BC and not due to designing around specific pricepoint? It seems inevitable that there is going to be pro model with higher CU count. If that model is not BC then it's all kinds of bad for sony.

If sony had higher cu count wouldn't they be able to disable some cu's with software when running in BC mode? i.e. limit cu count with sw not by limiting hw design to 36CU.

iroboto · Mar 25, 2020

Its probably the right time to talk about SFS since we're on texture streaming I do have a question to postulate:

Some background first:
https://microsoft.github.io/DirectX-Specs/d3d/SamplerFeedback.html

Sampler feedback is one feature with two distinct usage scenarios: streaming and texture-space shading.
Use of sampler feedback with streaming is sometimes abbreviated as SFS. It is also sometimes called sparse feedback textures, or SFT, or PRT+, which stands for “partially resident textures”.

The coles notes on it is that:
GPUs loading only portions of textures demanded. So it's asking the SSD to pull only the parts of the textures it needs into memory not the whole texture. For streaming worlds where textures can be in the GB range for a large areas, this will save a lot of bandwidth and space.

My question:
I think this is an interesting bit because the textures are compressed for XSX and likely won't be on the PC space (since it's going to just pull from main memory into video memory I suspect)
XSX leveraging SFS with compressed textures, so are they:
a) recalling the entire compressed texture, decompressing it and then picking what it needs and loading that into memory? (not that impressive)
b) only recalling the part of the texture it needs even while compressed? (very impressive)

BRiT · Mar 25, 2020

manux said:
How certain are we that ps5 CU count is due to BC and not due to designing around specific pricepoint?

We don't, but if it was targeting price-point first, why go chasing TFlops that may possibly require extensive cooling solutions?

Barrabas · Mar 25, 2020

manux said:
How certain are we that ps5 CU count is due to BC and not due to designing around specific pricepoint? It seems inevitable that there is going to be pro model with higher CU count. If that model is not BC then it's all kinds of bad for sony.

If sony had higher cu count wouldn't they be able to disable some cu's with software when running in BC mode? i.e. limit cu count with sw not by limiting hw design to 36CU.

They can always butterfly the PS5 Pro for 72 CU's

, but Sony seems to have chosen a path on narrower with higher clocks. I think the disabling must be whole streaming element SE?

manux · Mar 25, 2020

BRiT said:
We don't, but if it was targeting price-point first, why go chasing TFlops that may possibly require extensive cooling solutions?

We don't know enough. Could be it was cheapest solution to get the performance sony needed. Could be MS surprised sony and they had to do what they had to do with the chip they have. Could also be that sony wanted to leave more room between hypotethical pro model and base model to make selling the pro model easier(and make base model cheaper if possible)

BRiT · Mar 25, 2020

@manux right, we don't know enough. Unfortunately Sony has not shown us the retail case or the cooling. So we'll just have to wait until such a time that we can analyze.

Silenti · Mar 25, 2020

iroboto said:
Its probably the right time to talk about SFS since we're on texture streaming I do have a question to postulate:

Some background first:
https://microsoft.github.io/DirectX-Specs/d3d/SamplerFeedback.html

Sampler feedback is one feature with two distinct usage scenarios: streaming and texture-space shading.
Use of sampler feedback with streaming is sometimes abbreviated as SFS. It is also sometimes called sparse feedback textures, or SFT, or PRT+, which stands for “partially resident textures”.

The coles notes on it is that:
GPUs loading only portions of textures demanded. So it's asking the SSD to pull only the parts of the textures it needs into memory not the whole texture. For streaming worlds where textures can be in the GB range for a large areas, this will save a lot of bandwidth and space.

My question:
I think this is an interesting bit because the textures are compressed for XSX and likely won't be on the PC space (since it's going to just pull from main memory into video memory I suspect)
XSX leveraging SFS with compressed textures, so are they:
a) recalling the entire compressed texture, decompressing it and then picking what it needs and loading that into memory?
b) only recalling the part of the texture it needs even while compressed?

Looking forward to more on this. Question, and you seem the person to ask. What about combining the above with the ML texture upscaling? They were talking about shipping with low-res textures and just letting the ML upscale it at runtime and the result was "scary good". If the ML must be trained on each different set of textures, which is what was stated in the interview, is that something that studios beyond 1st party and the AAA industry will be able to afford? From some of comments around, this may be quite expensive. Just thought I would pick your brain on this one.

iroboto · Mar 25, 2020

Silenti said:
Looking forward to more on this. Question, and you seem the person to ask. What about combining the above with the ML texture upscaling? They were talking about shipping with low-res textures and just letting the ML upscale it at runtime and the result was "scary good". If the ML must be trained on each different set of textures, which is what was stated in the interview, is that something that studios beyond 1st party and the AAA industry will be able to afford? From some of comments around, this may be quite expensive. Just thought I would pick your brain on this one.

Assuming they could have a model that would be successful in doing that; you're trading off compute power as well as restricting yourself to a set amount of time for that upscale to complete. ML algorithms have generally the same performance despite their inputs. To upscale the the textures into memory is possible but you're always going to be dedicating some portion of your compute power to do it. This doesn't need to be done by the GPU necessarily, so it is a function that the CPU could do in theory. If the goal is to upscale only a fraction of a texture this becomes increasingly feasible.

So if you asked to perform ML upscale on a massive 4K texture, 8MB worth of data, this will take too long to be usable.
If you are virtual texturing and your set the tile size to something manageable, suddenly ML/AI up-resolution becomes more believable from a performance standpoint. The quality of the scale will depend on how good the training is.

Esrever · Mar 25, 2020

The Memory bandwidth to the GPU is going to be the biggest bottleneck and doesn't matter what the SSD speed is for rendering. You can load things from disk twice as fast but if there isn't bandwidth to the GPU, how are you going to even use it?

Deleted member 11852 · Mar 25, 2020

iroboto said:
Yes, but all streaming still has buffering. In both cases XSX or PS5 they will be able to stream so fast you will not see pop in. The extra speed provided by PS5 will be used extensively for reducing load times.

All things being equal I would expect an drop in size of steaming buffers relative to the available RAM pool size. When your I/O is so slow, and you risk ugly LOD and textures, devs are probably quite pessimistic in their algorithms about what data might be needed in 10-30 seconds, so you cast that net wider. You're likely streaming in lots of stuff that you don't need. Removing the I/O constraint should reduce this. There is talk about pulling in data for things directly behind you whilst you're avatar turning on the spot and this is fairly nuts compared to what we have now.

I would call complete bullshit on a developer if they said that they could no longer design a game with less than PS5 SSD speeds because the SSD was too restrictive. We literally ran off DVD last generation and didn't have any hard drive at all.

You could definitely design a game that would do this, but I'm not sure why you would. Imagine if you could move as fast as in Wipeout through Horizons Zero Dawn's world, would 5Gb/sec cut it? I don't know, but there is always a point when with X bytes more won't fit in RAM or within your streaming budget or your available RAM bandwidth.

Jay · Mar 25, 2020

Silenti said:
Looking forward to more on this. Question, and you seem the person to ask. What about combining the above with the ML texture upscaling? They were talking about shipping with low-res textures and just letting the ML upscale it at runtime and the result was "scary good". If the ML must be trained on each different set of textures, which is what was stated in the interview, is that something that studios beyond 1st party and the AAA industry will be able to afford? From some of comments around, this may be quite expensive. Just thought I would pick your brain on this one.

Should have the link to article with lot more detail
https://forum.beyond3d.com/posts/2113795/

Lurkmass · Mar 25, 2020

Shifty Geezer said:
That's the first I've heard of that. So MS will face similar problems in future BC for some titles.

Then there's the fact that consoles practice 'offline' compilation model where games compile their HLSL/PSSL shaders into native bytecode so games automatically ship GCN2 binaries.

On PC, developers practice an 'online' compilation model where they compile HLSL/GLSL shaders into an intermediate representation such as DXIL or SPIR-V. This intermediate representation is then further compiled by each different vendor's shader compiler during runtime.

Which is really bad practice these days! I hope devs drop that. the efficiency gains aren't worth it in the long run. A game should never break because a workload can be completed faster

Sometimes it's easier for developers to be not technically curated so often since constant software maintenance is a burden otherwise you end up with Apple's ecosystem where software compatibility just consistently breaks.

But that said, that's not a problem for a BC architecture that can clock lower for those titles. MS's abstraction allows them to run XB1 games on a GPU with a completely different CU count, where Sony's apparently does not.

It's definitely convenient to make BC software more scalable this way but at the end of the day Sony is still promising BC with PS4 software even if they may not necessarily have improved performance.

pjbliverpool · Mar 25, 2020

iroboto said:
The XSX SSD is already faster than what's on the market on PC space. You're talking about a second leap above that.

iroboto said:
And PS5 went further to make something that likely won't be on the PC space for a great deal of many years. That's nice future proofing, but I frankly believe we won't get there this generation. Everything else needs to go up with it.

PCIe 4.0 drives are already hitting 5GB/s with 7GB/s expected by the time the new consoles hit the market. PCIe 5.0 is due out in 2021 with first gen drives likely to be hitting 10+ GB/s.

dobwal · Mar 25, 2020

iroboto said:
My question:
I think this is an interesting bit because the textures are compressed for XSX and likely won't be on the PC space (since it's going to just pull from main memory into video memory I suspect)
XSX leveraging SFS with compressed textures, so are they:
a) recalling the entire compressed texture, decompressing it and then picking what it needs and loading that into memory? (not that impressive)
b) only recalling the part of the texture it needs even while compressed? (very impressive)

Texture block compression has been around for decades with support in windows and nvidia/amd gpus. The utility of block compression is it offers random access as each texture is broken into tiles during compression and can be read into the gpu and decompressed. The traditional problem with texture block compression is quality is easily lost the more you compress the texture in these format. JPEG is easily more compressible while maintaining quality but you have to decompress at runtime. To get around this, new solutions were developed. One of the most notable is to use multiple compression steps involving block compression+RDO and lossless compression. First the texture is compressed into a block compression format using rate-distortion optimization (RDO). RDO basically acts a quality metric that helps determine how readily a tile can be compressed with minimal quality loss. You end up with a more highly compressed texture. The texture is further compressed with a lossless format into what some call a super compressed texture.

pjbliverpool · Mar 25, 2020

dobwal said:
Texture block compression has been around for decades with support in windows and nvidia/amd gpus. The utility of block compression is it offers random access as each texture is broken into tiles during compression and can be read into the gpu and decompressed. The traditional problem with texture block compression is quality is easily lost the more you compress the texture in these format. JPEG is easily more compressible while maintaining quality but you have to decompress at runtime. To get around this, new solutions were developed. One of the most notable is to use multiple compression steps involving block compression+RDO and lossless compression. First the texture is compressed into a block compression format using rate-distortion optimization (RDO). RDO basically acts a quality metric that helps determine how readily a tile can be compressed with minimal quality loss. You end up with a more highly compressed texture. The texture is further compressed with a lossless format into what some call a super compressed texture.

Presumably though the faster the raw speed of the drive, the more processing power it takes to decompress the stream? So how would current decompression solutions handle streaming from a top end NMVe drive vs a HDD for example?

iroboto · Mar 25, 2020

dobwal said:
Texture block compression has been around for decades with support in windows and nvidia/amd gpus. The utility of block compression is it offers random access as each texture is broken into tiles during compression and can be read into the gpu and decompressed. The traditional problem with texture block compression is quality is easily lost the more you compress the texture in these format. JPEG is easily more compressible while maintaining quality but you have to decompress at runtime. To get around this, new solutions were developed. One of the most notable is to use multiple compression steps involving block compression+RDO and lossless compression. First the texture is compressed into a block compression format using rate-distortion optimization (RDO). RDO basically acts a quality metric that helps determine how readily a tile can be compressed with minimal quality loss. You end up with a more highly compressed texture. The texture is further compressed with a lossless format into what some call a super compressed texture.

Are you referring to DXT?

MrFox · Mar 25, 2020

iroboto said:
You need high frame rate (and display refresh rate) to clear up motion blurriness from turning in combination.

Which happens when you stop turning after you turned 180, or 90 which happens all the time. If it takes a quarter of a second more to load the last mipmap level, you get blurring after every turn you make and it settles back sharp after a quarter second.

In any case there will be a maximum turning speed allowed with 5.5GB/s so that the blurring and artifacting is not perceptible if they release half of the assets behind the player continuously, and 2.4GB/s will have a maximum turning speed over twice as slow to avoid artifacts.

I don't know what the limits are, nor what compression details will change the raw figures into a real world benchmark, but it can't be 5.5 is not enough to do anything other than loading faster, and at the same time 2.4 is more tha enough for streaming frustum assets in the way Cerny was presenting at GDC. PS5 can't be both wastefully fast and too slow.

dobwal · Mar 25, 2020

pjbliverpool said:
Presumably though the faster the raw speed of the drive, the more processing power it takes to decompress the stream? So how would current decompression solutions handle streaming from a top end NMVe drive vs a HDD for example?

Intel released a paper showing a single core i5 decompressing zlib at 4.5 GB/s

iroboto said:
Are you referring to DXT?

Yep. BC1, BC2 and BC3 are just DXT1, DXT3 and DXT5. People have found find new ways to overcome the 1:6 (BC1) to 1:4 (BC2-BC7) compression ratios.

iroboto · Mar 25, 2020

MrFox said:
Which happens when you stop turning after you turned 180, or 90 which happens all the time. If it takes a quarter of a second more to load the last mipmap level, you get blurring after every turn you make and it settles back sharp after a quarter second.

In any case there will be a maximum turning speed allowed with 5.5GB/s so that the blurring and artifacting is not perceptible if they release half of the assets behind the player continuously, and 2.4GB/s will have a maximum turning speed over twice as slow to avoid artifacts.

I don't know what the limits are, nor what compression details will change the raw figures into a real world benchmark, but it can't be 5.5 is not enough to do anything other than loading faster, and at the same time 2.4 is more tha enough for streaming frustum assets in the way Cerny was presenting at GDC. PS5 can't be both wastefully fast and too slow.

You’re going to have to give me an example in a game in which turning fast enough that the texturing can’t keep up and loads in blurred before regaining focus. I haven’t seen it before. I feel like we’re discussing MIPs quality instead of access speed.

General Next Generation Rumors and Discussions [Post GDC 2020]

Barrabas

manux

iroboto

Daft Funk

BRiT

(>• •)>⌐■-■ (⌐■-■)

Barrabas

manux

BRiT

(>• •)>⌐■-■ (⌐■-■)

Silenti

iroboto

Daft Funk

Esrever

Deleted member 11852

Guest

Jay

Lurkmass

pjbliverpool

B3D Scallywag

dobwal

pjbliverpool

B3D Scallywag

iroboto

Daft Funk

MrFox

Deludedly Fantastic

dobwal

iroboto

Daft Funk

Similar threads