Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
We know from Doom Eternal that it can.
It's just that most of the developers don't bother.
But anyway, game logic will not draw much power, at all.



On PC. On console you can write and read to the same coherent RAM.
But I think I know where you were getting at: multiplatform games on PC indeed cannot use HSA RAM correctly, because the CPU->GPU sync will eat up all the dividends.
But I hope this gen PC won't drag consoles down too much, like it was last gen.
Did you just say PC dragged down consoles last gen? Are you being Real?
Did you miss all of DF's coverage of multiplatform games ever?

GPUs with nominally worse performance than the GPUs in x1 and ps4 were besting them day 1.
 
It seems much more straight forward to simply allow the system to adjust dynamically in realtime and optimize performance in given scenes as normal
The way games are developed is to budget your resources well in advance and build your game around that. So you determine your player speeds, triangles per second, viewing angles etc. If this is fixed, this becomes straight forward. The budget can be blown up by the world builders but generally speaking they have a target to adhere to. By just letting the system balance the budget, your target setting now has to be less than the fixed number because you don’t know when your code will start reducing clocks. So you need to have some sort of buffer.

Are you sure you know what word "hardware" means?
Because "VRS" and "Mesh Shaders" are both DirectX software APIs.
What part of them is really implemented in hardware and how it's called in hardware we don't know yet.

if it was software RDNA1 would also support these features as well as Pascal. So there is definitely a hardware component requirement.
 
The way games are developed is to budget your resources well in advance and build your game around that. So you determine your player speeds, triangles per second, viewing angles etc. If this is fixed, this becomes straight forward. The budget can be blown up by the world builders but generally speaking they have a target to adhere to. By just letting the system balance the budget, your target setting now has to be less than the fixed number because you don’t know when your code will start reducing clocks. So you need to have some sort of buffer.



if it was software RDNA1 would also support these features as well as Pascal. So there is definitely a hardware component requirement.

I think his point is that mesh shaders is a Direct X term. PS5 should have the same capabilities but they are just not calling it out.
 
if it was software RDNA1 would also support these features as well as Pascal. So there is definitely a hardware component requirement.

This statement is not necessarily true since a driver only has to 'pretend' to be compatible with APIs like we see in the case with Pascal and DirectX Raytracing support. The former in this case doesn't necessarily have any hardware accelerated component for the latter.

The mesh shader pipeline that we take for an example can be entirely emulated on current hardware with compute shaders and each having an indirect dispatch for the amplification shader or the mesh shader stage.
 
As higher rez mips are needed only for angled textures, and only for a small part at that, here things like "sampler feedback" would be nice to implement. Which means although you can use 16K texture, only a small part of it will be loaded into RAM at any given time.
Although I would like to see real texture space shading here (people told me that what is called TSS in modern NV and AMD is in fact just a fancy name for MSAA samples access).
Okay but then you'll balloon your texture size 20x and 40x fold depending what compression you are using
Most games as I understand it today have 2K textures.
Not to mention, at 8K textures (you said we should be 5x 4K for oblique angles resolution in your earlier messaging, so really we should be talking about 16K textures which are 80x and 160x larger than 2K textures)

PS5's SSD solution can only manage at maximum
at most 140 MB per/frame at 60fps compressed information.
so using your math, 80MB * 0.17 = is still 13 MB. Even at an oblique angle at MIP 0.
10 of those and you've used your whole budget. I have doubts. We're not even started on 16K textures or how large a footprint this would be on 825GB hard drive.

X1X is about 40x faster drive. That means you jump to 8K textures, we basically have the same loading and streaming issues we have today, still slightly better though (2x better loading and streaming still around). PS5 you jump to 16K textures, we basically have the same loading and streaming issues we have today. Loading times would be identical by pushing to these values. Goodbye instant loads.

TLDR; even if I got nearly everything wrong.
Moving to 16K textures as per your suggestion is 64x larger in size than 2K textures. You've ballooned the load by 64x.
The PS5's 100x speed improvement over existing hardware has been reduced significantly. At 50x you may have had enough there to still reduce the frame time in 1/2. But you can't at 16K texture sizes.

Choose 1:
Instant loads + just in time streaming
or massively insane texture sizes.

You can't do both. The realistic place for both consoles to sit is around 4K and 8K texture resolution.
Even then, if the goal is to have 60 fps, bandwidth restrictions will be even tighter. I'm not sure what AF settings would be at 8K texture resolution.

Resolution / # of MIPS / DXT 1/ DXT 5
OaszHBT.png
 
Last edited:
This statement is not necessarily true since a driver only has to 'pretend' to be compatible with APIs like we see in the case with Pascal and DirectX Raytracing support. The former in this case doesn't necessarily have any hardware accelerated component for the latter.

The mesh shader pipeline that we take for an example can be entirely emulated on current hardware with compute shaders and each having an indirect dispatch for the amplification shader or the mesh shader stage.
I hear you, but that's like saying as long as every video card uses DX Warp then every feature is supported too.
But if it's not going to be any faster, is there any point to it?
 
I hear you, but that's like saying as long as every video card uses DX Warp then every feature is supported too.
But if it's not going to be any faster, is there any point to it?

This is not necessiraly much faster with mesh shader here it cost more memory on compute side but performance is nearly the same.

http://www.humus.name/

Read metaballs demo but it is better with mesh shader because it consumes less memory.
 
I hear you, but that's like saying as long as every video card uses DX Warp then every feature is supported too.
But if it's not going to be any faster, is there any point to it?

Compared to a purely software implementation on the CPU ? Maybe not in that case but then again I know Nvidia half advertises ASTC support in their Vulkan drivers this way despite their desktop graphics hardware having no hardware support for it ...

Other reasons aside from performance, might also be for compatibility purposes as well. Developers might even still want to use DXR on GPUs with no hardware support and having that code be compatible on both sets of hardware is helpful sometimes for ease of development.

Again one should never confuse the APIs with real hardware implementation details when talking about feature sets.
 
I seriously question Cerny honesty in his claims. While they are true, they are only honest if you view them from a manufacturer POV (costs) but dishonest when seen from the gpu/cpu numbers game.


To elaborate, what use is there to reach 3.5/2.23ghz if that means developers will have to cap their code to obey the power draw envelope. AKA reduce work per cycle!



Suddenly, performance is determined by developers regulating their code, instead of letting the GPU regulating its power draw/temps curve. That is how it became "deterministic" and "predictable". "MHZ" numbers became meaningless for comparative against series X.



Whats the win/win for Sony:

1 - Frequency numbers on PS5 became superfluous because its no longer frequency that determines performance, but still allows them to claim 10TF and appear to the general public like they are still in the ball park.

2 - Cut BOM down (or better distribute the BOM to what most needs it) because you no longer need to overestimate future workloads.

It would be better to look it at this way:
- Developers will get better perf out of the box thanks to higher clocks. i.e. happy developer
- Using instructions like avx2 will lower clock but there is no reason to avoid them other than extra work on implementing algorithms using those instructions. Those instructions with lower clock will still get more work done than naive implementation at higher clocks would
- We have hit point where using old strategy of having more transistors doesn't work at console price points. It's very smart to maximize the potential of given hw. Expect this to happen even more in next next gen.

It's really interesting to eventually see BOM and retail prices of new xbox and playstation. if prices are equal it's easy choice for those who don't have pc/don't care about sony exclusives. If price difference is 50$, 100$ what to buy gets interesting. If the cheapest console is 499$ that has big implications on potential amount of consoles sold(hint, old consoles will stay very relevant for long time as they have mass market appeal and price point).
 
The Zen 2 CPUs on consoles also support PDEP/PEXT instructions but it's common knowledge at this point that they're emulated using microcode because hey guess what even AMD wants to 'pretend' that their processors are also Intel processors to run their code!
 
It would be better to look it at this way:
- Developers will get better perf out of the box thanks to higher clocks. i.e. happy developer
- Using instructions like avx2 will lower clock but there is no reason to avoid them other than extra work on implementing algorithms using those instructions. Those instructions with lower clock will still get more work done than naive implementation at higher clocks would
- We have hit point where using old strategy of having more transistors doesn't work at console price points. It's very smart to maximize the potential of given hw. Expect this to happen even more in next next gen.

It's really interesting to eventually see BOM and retail prices of new xbox and playstation. if prices are equal it's easy choice for those who don't have pc/don't care about sony exclusives. If price difference is 50$, 100$ what to buy gets interesting. If the cheapest console is 499$ that has big implications on potential amount of consoles sold(hint, old consoles will stay very relevant for long time as they have mass market appeal and price point).

The Mhz number is no longer what determines performance for PS5. Whats to gain from hitting 2.23ghz when you have to reduce the workload to reach it. You might as well increase the workload and have lower Mhz. The end result will be the same FPS. Its just that in once scenario you can claim 10TF and in the other you cant.


A game struggling to hit 60FPS on PS4:

- Devs are limited by fixed frequency
- Devs find ways to increase the workload per cycle. More power is drawn and more heat is generated (same as Series X and every console so far).


A game struggling to hit 60FPS on PS5:

- Devs are limited by TDP
- Increasing workload per cycle increases TDP, reduces Mhz = 60FPS unreachable.
- Reducing workload per cycle decreases TDP, increases Mhz = 60FPS uncreachable.
- Are forced to optimize code without increasing workloads or make concessions to the picture quality.
 
The Mhz number is no longer what determines performance for PS5. Whats to gain from hitting 2.23ghz when you have to reduce the workload to reach it. You might as well increase the workload and have lower Mhz. The end result will be the same FPS. Its just that in once scenario you can claim 10TF and in the other you cant.


A game struggling to hit 60FPS on PS4:

- Devs are limited by fixed frequency
- Devs find ways to increase the workload per cycle. More power is drawn and more heat is generated (same as Series X and every console so far).


A game struggling to hit 60FPS on PS5:

- Devs are limited by TDP
- Increasing workload per cycle increases TDP, reduces Mhz = 60FPS unreachable.
- Reducing workload per cycle decreases TDP, increases Mhz = 60FPS uncreachable.
- Are forced to optimize code without increasing workloads or make concessions to the picture quality.

Some workloads benefit from higher clock. Some workloads benefit from lower clock and higher energy density instructions.

The alternative would be to make more expensive console. Or always have lower clock. When optimizing for fixed price point it makes sense to maximize what one can get out of that hw. Sony did just that. To not have to lower clock sony would have to add better power supply and better cooling which would add price to console. And/Or add bigger lower clocked chip again adding to price.

Even MS had to do similar(not same) thing. Using SMT lowers clock of cpu. avx2 on xbox we don't know but it would be good to assume it will not lower clock.
 
Yes, and for XSX for example the only custom hardware is the data decompressor. The remainining seem stock RDNA2 and DX12 ultimate API things.

We don't know this for sure. In fact, there is at least one other component I am aware of that DF describes as "bespoke".

DF said:
A technique called Sampler Feedback Streaming - SFS - was built to more closely marry the memory demands of the GPU, intelligently loading in the texture mip data that's actually required with the guarantee of a lower quality mip available if the higher quality version isn't readily available, stopping GPU stalls and frame-time spikes. Bespoke hardware within the GPU is available to smooth the transition between mips, on the off-chance that the higher quality texture arrives a frame or two later.
 
Even MS had to do similar(not same) thing. Using SMT lowers clock of cpu. avx2 on xbox we don't know but it would be good to assume it will not lower clock.
100 Mhz clock drop to have all 8 cores under load while supporting SMT is very good.
Other CPUs would likely drop a lot more; but then again they are super boosted.
 
As higher rez mips are needed only for angled textures, and only for a small part at that, here things like "sampler feedback" would be nice to implement. Which means although you can use 16K texture, only a small part of it will be loaded into RAM at any given time.
Although I would like to see real texture space shading here (people told me that what is called TSS in modern NV and AMD is in fact just a fancy name for MSAA samples access).

I think on the hardware side the only real change is probably feedback from the texture samplers. But the APIs should allow for real texture space shading.

 
Status
Not open for further replies.
Back
Top