Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
Okay but then you'll balloon your texture size 20x and 40x fold depending what compression you are using
Most games as I understand it today have 2K textures.
Not to mention, at 8K textures (you said we should be 5x 4K for oblique angles resolution in your earlier messaging, so really we should be talking about 16K textures which are 80x and 160x larger than 2K textures)

PS5's SSD solution can only manage at maximum
at most 140 MB per/frame at 60fps compressed information.
so using your math, 80MB * 0.17 = is still 13 MB. Even at an oblique angle at MIP 0.
10 of those and you've used your whole budget. I have doubts. We're not even started on 16K textures or how large a footprint this would be on 825GB hard drive.

X1X is about 40x faster drive. That means you jump to 8K textures, we basically have the same loading and streaming issues we have today, still slightly better though (2x better loading and streaming still around). PS5 you jump to 16K textures, we basically have the same loading and streaming issues we have today. Loading times would be identical by pushing to these values. Goodbye instant loads.

TLDR; even if I got nearly everything wrong.
Moving to 16K textures as per your suggestion is 64x larger in size than 2K textures. You've ballooned the load by 64x.
The PS5's 100x speed improvement over existing hardware has been reduced significantly. At 50x you may have had enough there to still reduce the frame time in 1/2. But you can't at 16K texture sizes.

Choose 1:
Instant loads + just in time streaming
or massively insane texture sizes.

You can't do both. The realistic place for both consoles to sit is around 4K and 8K texture resolution.
Even then, if the goal is to have 60 fps, bandwidth restrictions will be even tighter. I'm not sure what AF settings would be at 8K texture resolution.

Resolution / # of MIPS / DXT 1/ DXT 5
OaszHBT.png

That's all cool and stuff.
But do you remember that for oblique angles the mip 0 is used on a pretty small portion of the screen?
All the farther pixels will use higher mips.
That's why if you could possibly load just a part of the texture, tiny part of that 16K, the life will be so much easier... wait, that's what SFS does. Is it not?
 
The Mhz number is no longer what determines performance for PS5. Whats to gain from hitting 2.23ghz when you have to reduce the workload to reach it. You might as well increase the workload and have lower Mhz. The end result will be the same FPS. Its just that in once scenario you can claim 10TF and in the other you cant.


A game struggling to hit 60FPS on PS4:

- Devs are limited by fixed frequency
- Devs find ways to increase the workload per cycle. More power is drawn and more heat is generated (same as Series X and every console so far).


A game struggling to hit 60FPS on PS5:

- Devs are limited by TDP
- Increasing workload per cycle increases TDP, reduces Mhz = 60FPS unreachable.
- Reducing workload per cycle decreases TDP, increases Mhz = 60FPS uncreachable.
- Are forced to optimize code without increasing workloads or make concessions to the picture quality.
The Mhz number is no longer what determines performance for PS5. Whats to gain from hitting 2.23ghz when you have to reduce the workload to reach it. You might as well increase the workload and have lower Mhz. The end result will be the same FPS. Its just that in once scenario you can claim 10TF and in the other you cant.


A game struggling to hit 60FPS on PS4:

- Devs are limited by fixed frequency
- Devs find ways to increase the workload per cycle. More power is drawn and more heat is generated (same as Series X and every console so far).


A game struggling to hit 60FPS on PS5:

- Devs are limited by TDP
- Increasing workload per cycle increases TDP, reduces Mhz = 60FPS unreachable.
- Reducing workload per cycle decreases TDP, increases Mhz = 60FPS uncreachable.
- Are forced to optimize code without increasing workloads or make concessions to the picture quality.
It seems that Sony are making life harder for devs which goes against the whole Cerny approach of making life easier for devs?
 
Is the PS5 GPU generation so much more heat than XBSX's such that XBSX can run GPU at full speed and CPU at full speed without overheating but PS5 can't? If so, PS5's design really is poor by going with only 36 CUs.
It does sound like the PS5 is pushing things, although we don't know what the relative power budgets are for the consoles. An early decision for a certain die size and power budget could have left the PS5 in this position.

If you take the "couple % drop in clocks can save 10% in power" and map it to a curve of any power scaling graph, you should be able to estimate where in the curve PS5 is sitting in regards to that quote.
I'm assuming that should apply more to the GPU than CPU, since the CPU clocks look like they're more in the linear part of the Zen clock curve.
On the other hand, I have seen some attempts at graphing the 5700 XT's clock and power curve, and some of the overclocks that nominally exceed 2 GHz get a slope like that at the end. I'm not sure how scientific those attempts were, but if representative they also show that RDNA2's clock curve isn't significantly offset at the upper extreme.
 
That's all cool and stuff.
But do you remember that for oblique angles the mip 0 is used on a pretty small portion of the screen?
All the farther pixels will use higher mips.
That's why if you could possibly load just a part of the texture, tiny part of that 16K, the life will be so much easier... wait, that's what SFS does. Is it not?
Sure and what happens when you're not looking at oblique textures? Have you done the math on what it takes to render 16K textures? Or you just assuming it's capable because I/O isn't the limiter anymore?

Can someone else provide some thoughts here? I don't want to be dismissive, but I have found that texture performance for a game is mainly a factor of I/O, Memory Capacity and available Memory bandwidth - which the latter 2 will be severely lacking with respect to the boost we got in I/O

Some metrics on movie scene quality here:
source: http://theillusionden.blogspot.com/2016/03/
Film characters and models can have 8k+ texture maps or even hundreds of 4-8k maps per model

“Almost every asset rendered by Weta for Avatar was painted to some extent in MARI. A typical character was around 150 to 170 patches, with 30 or more channels (specular, diffuse, sub-surface, etc) and 500k polygons at Sub-division level one. The full texture set ran to several tens of gigabytes, all of which could be loaded in MARI at the same time. The biggest asset I saw being painted was the shuttle, which came in at 30Gb per channel for the fine displacement detail (500, 4K textures). Assets of over 20M polys can be painted.” - Jack Greasley

You think PS5 is capable of this in real time because of their SSD?
 
Last edited:
At least PS4 was over a Radeon 5850.

It was closest to a 7850 which was quite abit more powerfull then a 5850 yes. A GPU that's ranging between 9 and 10TF might seem low end compared to whats going to be available for PC space, but for a console i wouldn't call it bad by any means. In raw TF, it's abit over 5 times more powerfull then the base PS4, and abit over two compared to the Pro.

I suspect that the XSX is lower on the power frequency curve. Just thinking about exponential looking graph. It’s 400 MHz down. But two will have different power frequency curves so I don’t know for sure.

He can't compare the XSX to the PS5 because we haven't seen anything SoC/design/case related to the PS5 yet. MS went with a rather PC-style design, double PCB, a wide/slower GPU, fixed clocks. What kind of powerusage have MS/Sony aimed for? They could have completely aimed for different goals from the beginning there.
I can imagine that extreme clocks isn't that efficient either (if 2.3ghz+ range can be considered extreme for RDNA2, maybe its normal, but why didn't MS clock theirs higher).

The 2ghz/3Ghz with boost have most likely nothing to do with a bad or poor design, but rather to do with a different design. I guess that both MS and Sony had their design plans already laid out 4 to 5 years ago, but both had different plans in mind.

Where do dimishing returns appear though? I know that in OCing my CPU's/GPU's, after a certain overclock/boost, practical results start to diminish, they mostly become usefull for benchmarks then. For example, a i7 920 clocks easily to 4ghz or beyond, atleast the D0's do, but after 3.6ghz you have about zero advantages going higher in real world gaming. I had the same experiences with GPU's in general, although there you dont work with 1+Ghz overclocks.
 
Sure and what happens when you're not looking at oblique textures?

You will get a lower mip. Like mip 2 will be 4K.

Memory Capacity and available Memory bandwidth

Cache. You still use it.
Unless you wanna do RT. then your cache is busted by random access all other the place.
That's why RT wasn't ever used in realtime, not because nobody could do the puny intersection fixed path.

hundreds of 4-8k maps per model

That's because they can. Not because they need it.
Good luck finding any artist with a good understanding of hardware in the movie industry. :)
 
That's all cool and stuff.
But do you remember that for oblique angles the mip 0 is used on a pretty small portion of the screen?
All the farther pixels will use higher mips.
That's why if you could possibly load just a part of the texture, tiny part of that 16K, the life will be so much easier... wait, that's what SFS does. Is it not?

What textures will be targeted because while the needed memory for 16k textures in vram may be kept small, the amount of storage memory cannot be? There is over an order of magnitude difference in pixel count between a 4k texture and a 16k texture.
 
The Tempest Engine has great similarities to AMD Audio Next.
AMD-TrueAudio-Next-2.jpg

Question is: On RDNA 2 and Tempest Engine, will it still be a reserved CU on the GPU, or an extra one?
 
The Tempest Engine has great similarities to AMD Audio Next.
AMD-TrueAudio-Next-2.jpg

Question is: On RDNA 2 and Tempest Engine, will it still be a reserved CU on the GPU, or an extra one?

No, why fo you want them to reserve a CU from the GPU and it is different to a CU, it is a hybrid CU/SPU and they did this for letting a dedicated part for audio and 3d audio. You can dream to see graphics developer let some CU from the GPU to audio.

This is what a third party told and the important part is dicreet nothing to share with graphics guy.



And what a first-party guy from Naughty Dog told as a joke.

unknown.png


unknown.png
 
Sure and what happens when you're not looking at oblique textures? Have you done the math on what it takes to render 16K textures? Or you just assuming it's capable because I/O isn't the limiter anymore?

Can someone else provide some thoughts here? I don't want to be dismissive, but I have found that texture performance for a game is mainly a factor of I/O, Memory Capacity and available Memory bandwidth - which the latter 2 will be severely lacking with respect to the boost we got in I/O

Some metrics on movie scene quality here:
source: http://theillusionden.blogspot.com/2016/03/
Film characters and models can have 8k+ texture maps or even hundreds of 4-8k maps per model

“Almost every asset rendered by Weta for Avatar was painted to some extent in MARI. A typical character was around 150 to 170 patches, with 30 or more channels (specular, diffuse, sub-surface, etc) and 500k polygons at Sub-division level one. The full texture set ran to several tens of gigabytes, all of which could be loaded in MARI at the same time. The biggest asset I saw being painted was the shuttle, which came in at 30Gb per channel for the fine displacement detail (500, 4K textures). Assets of over 20M polys can be painted.” - Jack Greasley

You think PS5 is capable of this in real time because of their SSD?

I've been intrigued by those comments from MS's Game Stack dude who talked about real time ML uprezzing of game textures loaded from disk.

I wonder if you could keep some memory free to upres appropriate textures for the occasions they got really, really close.

Infact, I wonder if hints for the ML upscaler could become part of a texture compression scheme .... :?:
 
No, why fo you want them to reserve a CU from the GPU and it is different to a CU, it is a hybrid CU/SPU and they did this for letting a dedicated part for audio and 3d audio. You can dream to see graphics developer let some CU from the GPU to audio.

This is what a third party told and the important part is dicreet nothing to share with graphics guy.



And what a first-party guy from Naughty Dog told as a joke.

unknown.png


unknown.png

Those where shared the day (or day after??) spec release two weeks ago, why are they appearing again and again? With that i mean the tweets :p
There are a boatload of MS devs 'joking' too, but let's spare those maybe. A Sony dev is going to say PS5 is better and vice versa. I think we get it now ;)
 
I've been intrigued by those comments from MS's Game Stack dude who talked about real time ML uprezzing of game textures loaded from disk.

I wonder if you could keep some memory free to upres appropriate textures for the occasions they got really, really close.

Infact, I wonder if hints for the ML upscaler could become part of a texture compression scheme .... :?:
Sure all this is possible but at the cost of having to perform ML as soon as the textures arrive and store the final outputs ready before they are called for rendering. I guess if there are spare cycles, perhaps async compute or some other format to fill in the dips, this might be ideal. If there is a trained model that is successful at doing this up-rez solution I'd like to see it's outputs in any title, just to see how well it performed. I know with enough compute time, ML can create wonderful up-resolutions, but a real-time application tends to be much more difficult. Not sure how much power is available for this type of thing.

There could be some use cases for games where this will make sense to do, games that are naturally slower easier to determine what to load next. But twitch based games, this doesn't seem like an ideal use case for.
 
The Tempest Engine has great similarities to AMD Audio Next.
AMD-TrueAudio-Next-2.jpg

Question is: On RDNA 2 and Tempest Engine, will it still be a reserved CU on the GPU, or an extra one?
Tempest Engine is separate, it's based on unknown AMD Compute Unit (so could be GCN or RDNA, we don't know) which they stripped out of caches. Apparently only one unit too, or at least I think Cerny never used plural about it.
 
I have a question. With int4 and int8 support, can that mean that the XSX offers tensor core like performance with those precision modes?

I ask because the RDNA white paper makes reference to Navi CU variants.

Some variants of the dual compute unit expose additional mixed-precision dot-product modes in the ALUs, primarily for accelerating machine learning inference. A mixed-precision FMA dot2 will compute two half-precision multiplications and then add the results to a single-precision accumulator. For even greater throughput, some ALUs will support 8-bit integer dot4 operations and 4-bit dot8 operations, all of which use 32-bit accumulators to avoid any overflows

AMD has a patent for a parallel matrix multiply pipeline using dot product units.

http://www.freepatentsonline.com/y2019/0171448.html
 
Last edited:
I have a question. With int4 and int8 support, can that mean that the XSX offers tensor core like performance with those precision modes?
Not quite the same. hard to explain. Like a google Tensor Processing Unit is different from the Nvidia.

Nvidia does a single matrix multiply accumulate in a single clock cycle. It does this with mixed precision, so 4,8,16 etc. And outputs i believe a 32 bit value. I could be wrong. It is fast though.(I’m wish I had one to play with)

I’m not sure what googles does; they made an ASIC for theirs. It’s very fast.

as for AMD; unless they can do more operations in a single cycle; it will likely not be as fast as a tensor core(nvidia)
 
Last edited:
Not quite the same. hard to explain. Like a google Tensor Processing Unit is different from the Nvidia.

Nvidia does a single matrix multiply accumulate in a single clock cycle. It does this with mixed precision, so 4,8,16 etc. And outputs i believe a 32 bit value. I could be wrong. It is fast though.

I’m not sure what googles does; they made an ASIC for theirs. It’s very fast.

as for AMD; unless they can do more operations in a single cycle; it will likely not be as fast as a tensor core(nvidia)

I added more details that I initially came across that warranted my question.
 
I added more details that I initially came across that warranted my question.
Thanks;
Reading that (if it completes in a single clock cycle, yea sounds pretty close at a high level)

should be very similar to a tensor core. This is nice. Thanks for the heads up. I hope that the community embraces this (now that the hardware is available).

CUDA has dominated this industry, making nvidia the main card of choice as the result. I hope that we see these libraries expand to OpenCL.

Many of us also have Macs ;) and others etc, I would like to consider buying a AMD GPU if it supports more ML libraries without the pain (or I guess I could i get to trying to code my own algorithms)
 
Last edited:
Reading that (if it completes in a single clock cycle, yea sounds pretty close at a high level)
should be very similar to a tensor core.
The SIMDs use separate execution units for double-precision data. Each implementation includes between two and sixteen double-precision pipelines that can perform FMAs and other FP operations, depending on the target market. As a result, the latency of double-precision wavefronts varies from as little as two cycles up to sixteen. The double-precision execution units operate separately from the main vector ALUs and can overlap execution."
https://www.techspot.com/community/...next-navi-configurations.256148/#post-1767329
 
Status
Not open for further replies.
Back
Top