Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
Actually, when I said above Devs could chose 90% utilization @ 2.23ghz or 100% at 2Ghz, it was just to alude that there is a table in use that prevents 100% at 2.23ghz. Devs wont have to chose at all.

They will make the game they have to make, and they will have to optimize to reach their frame-rate targets, like on any other console, regardless if the GPU will run at 2.23 or 2 or 1.8. These numbers are marketing numbers now and irrelevant to the developers.
 
Do you not understand the Tflop figures are a math calculation that does not mean anything for the actual performance of the machine? And yes, it means Cerny was just playing the marketing game as well, knowing very well people around the web will pay attention to that meaningless number either way.

My bad, forgot that TF doesn't mean anything anymore since march :)

I'm an artist at Activision and worked for the second revision of the Goon05B 3d model, the main gun, and I've been part of the small team that produced the ps5 theme. It's a nice group of people, they always keep pulling out jokes because I'm the only bald in room 27 on the second floor.
But I want to stay anonymous, so can't add anything more.

Lol :D

Looks awesome, both game play and graphics wise. Definitely interested.

Next gen graphics chinese style.

Edit:
Anyone shared this already? Couldn't find the game in the games section?

https://www.eurogamer.net/articles/...z0iH5iLl_f-sI3mpci98G2IG--xUEguXeX7Cgwu1iFdPc

They got to be joking, right? Also, like many of the comments, how many are going to think their controller broke? :D
 
Last edited:
Geometry processing & workload distribution come to mind. i.e. at the front-end of the pipeline, which is the least wide/parallel aspect of a GPU.

Thanks.

Do you also have any ideas if the CPU and GPU can pass the power between each other quickly and frequently?

For example, in the duration of a whole second,

0 ms, 3.0 G cpu, 10.28 G gpu
100 ms, 3.5 G cpu, 10.00 G gpu
200 ms, 3.0 G cpu, 10.28 G gpu
300 ms, 3.3 G cpu, 10.20 G gpu
400 ms, 3.1 G cpu, 10.15 G gpu
500 ms, 3.5 G cpu, 10.28 G gpu
600 ms, 2.9 G cpu, 10.28 G gpu
700 ms, 3.5 G cpu, 9.90 G gpu
...
..
.

Maybe they can pass power to each other even faster?
 
Are there workloads where higher frequency is more desired but with the same power consumption?
Yes. This is now a growing topic (an old topic getting more attention in the PC space but a common topic for the mobile space) for consideration as we are hitting computing limits, algorthmic optimization for power. If you do it right you keep the clockspeed high and the power requirement the same.

if you do it wrong the power level goes up and depending could lose boost.

the programmer needs to decide which will benefit them more, using more power or using less but having a higher clock rate.

that being said, it’s not trivial to find an algorithm that will use less power and get a result as fast. A lot of benchmarking needs to be done and tendency is towards Multi core parallel algorithms.
There tends to be common trade offs: you can get faster computation as a result of vastly increasing your memory footprint (pre-computation) And equally the opposite would be true, no pre computation would also imply a generally low footprint and very high computation at run time.
 
Last edited:
Thanks.

Do you also have any ideas if the CPU and GPU can pass the power between each other quickly and frequently?

For example, in the duration of a whole second,

0 ms, 3.0 G cpu, 10.28 G gpu
100 ms, 3.5 G cpu, 10.00 G gpu
200 ms, 3.0 G cpu, 10.28 G gpu
300 ms, 3.3 G cpu, 10.20 G gpu
400 ms, 3.1 G cpu, 10.15 G gpu
500 ms, 3.5 G cpu, 10.28 G gpu
600 ms, 2.9 G cpu, 10.28 G gpu
700 ms, 3.5 G cpu, 9.90 G gpu
...
..
.

Maybe they can pass power to each other even faster?

I heard 2ms if I understand the question, so essentially within a frame and as often as you want
 
Last edited:
I heard 2ms if I understand the question, so essentially within a frame and as often as you want
https://wccftech.com/amd-frank-azor-interview/
according to the interview it’s indeed under 2
ms
It would be interesting how this would affect AMD‘s adaptive frequency scaling as it is as fast as 2 cycles. I’m not sure if the time scale discrepancy between both is so huge that the voltage set by smart shift can be considered constant.
 
My bad, forgot that TF doesn't mean anything anymore since march :)

1 - I love the insinuation that I am part of a console war defending the xsx. When if you read my posts through the mediocrity of a console war lens, it would be the PS5 my arguments would be downgrading.

2 - Do you want to learn or is every fact turned into a biased argument through that same mediocre lens?


TF means everything when comparing exactly the same architecture. Tell me how the graphical performance of the formula ( 2 * clockspeed * stream processors = TF ) has into account CU design, Memory performance, etc? It doesn't and it never will.

inPZ0uk.png




Learn first, talk later.
 
1 - I love the insinuation that I am part of a console war defending the xsx. When if you read my posts through the mediocrity of a console war lens, it would be the PS5 my arguments would be downgrading.

2 - Do you want to learn or is every fact turned into a biased argument through that same mediocre lens?


TF means everything when comparing exactly the same architecture. Tell me how the graphical performance of the formula ( 2 * clockspeed * stream processors = TF ) has into account CU design, Memory performance, etc? It doesn't and it never will.

inPZ0uk.png




Learn first, talk later.

My understanding is that there is an argument that early generation games will not be fully utilising the extra CUs of the XSX, as such TFs cannot be compared (well, there might be a slight advantage to running the whole system at a faster speed).

I guess like examples earlier with the wide road, it's all very well having 4 lanes for cars - but if you only ever have 3 cars running then there's no advantage. So even though the other road only has 3 lanes, because the speed limit is faster then the cars get to their destination quicker.

Once devs are pushing the extra CUs then the extra lane helps the 4 lane road and hinders the 3 lane road.
 
My understanding is that there is an argument that early generation games will not be fully utilising the extra CUs of the XSX, as such TFs cannot be compared (well, there might be a slight advantage to running the whole system at a faster speed).
I think this is probably a Much larger issue with last generation of consoles than this generation. Which RDNA changed to address this, which is why you see a smaller 5700 compete so well against a Radeon 7 for instance.
That’s a scheduling challenge with GCN, fairly okay with compute shaders but terrible for everything else. IIRC they split into CDNA and RDNA And I believe that CDNA still schedules the same.

vRtkxV6.jpg


From my understanding, if you look at the "Vega" execution, you don't get to issue a new instruction until 4 cycles later; without enough work to fill 4 cycles, the cycles are wasted potential. With more and more CUs the workload needs to be really optimized to saturate all those CUs and finish in the right number of clock cycles. This type of scenario seems to penalize heavily for small work loads that can finish faster than 4 clock cycles (spread over the number of CUs).

With RDNA, they can issue new instructions every clock cycle, so that solves the issue of getting dramatically penalized for having so many large number of CUs. Imo this is a welcome and needed change for there to be progress. Increasing clock speed can only take you so far.
 
Last edited:
Can someone explain to me what 100% GPU utilization means?
Does it mean that the GPU is under load 100% of the time? Does it mean 100% of the transistors are flipped?

I would assume the former, no?

So I can't for the life of me understand why the PS5 wouldn't be able to be at full clocks (2.23Ghz) while also at 100% GPU utilization.
The only thing that changes the frequency should be the nature of the load, not the "percentage of GPU utilization".

As far as I understood from Cerny, the type of load that is power-hungry will lower the clocks. But is 100% GPU utilization necessarily always "power-hungry"?
 
Can someone explain to me what 100% GPU utilization means?
Does it mean that the GPU is under load 100% of the time? Does it mean 100% of the transistors are flipped?

I would assume the former, no?

So I can't for the life of me understand why the PS5 wouldn't be able to be at full clocks (2.23Ghz) while also at 100% GPU utilization.
The only thing that changes the frequency should be the nature of the load, not the "percentage of GPU utilization".

As far as I understood from Cerny, the type of load that is power-hungry will lower the clocks. But is 100% GPU utilization necessarily always "power-hungry"?
GPU utilization is largely a meaningless metric at least with respect to graphical complexity or the amount of work done. It’s probably more akin to how many threads are happening concurrently than a measure of work to do. Perhaps From another point of view, it can be seen that the CPU is sufficiently feeding the GPU with work. So 100% utilized means the GPU is the bottleneck for frame rate and not the CPU. So the expectation for nearly all titles is for the GPU to be 100%.

Power hungry work tends to be a more reliable metric on computation being done. But that’s not necessarily reflective of how great graphics will look.
 
Last edited:
GPU utilization is largely a meaningless metric. It’s probably more akin to how many threads are happening concurrently than a measure of work to do. Perhaps It means there is a long lineup of work to do but it doesn’t necessarily mean the GPU is being fully utilized or optimized to do it.

Power hungry work tends to be a more reliable metric on computation being done. But that’s not necessarily reflective of how great graphics will look.
Yes. And this is what confuses me. There are some in this thread who said that it would only be able to stay at 2.23Ghz while at 90% GPU utilization and then at 100% it would be down to 2Ghz or similar.
I just wanted to know what the logic behind that was.
Was it just in theory and not what they thought was actually going on?:D
 
Yes. And this is what confuses me. There are some in this thread who said that it would only be able to stay at 2.23Ghz while at 90% GPU utilization and then at 100% it would be down to 2Ghz or similar.
I just wanted to know what the logic behind that was.
Was it just in theory and not what they thought was actually going on?:D
Looking at just GPU Utilization you’d be unlikely to be able to guess its impact on clock rate.

It may be on the writers POV of what utilization is. But GPUs do not provide activity level As a percentage. The only way to know if there is load is to the power draw go up or down measured from a specific area. And to see frequency go up and down respectively.

This should be where PS5 will differ from XSX. If all is working correctly PS5 power draw should never change at the wall (or barely). The frequency should move up and down to adapt.
XSX will have its power draw move in large amounts.

Unfortunately it’s hard to say more than that. We generally don’t expect things to run at peak, it sort of goes against why they were able to reach peak clock speed.

Another analogy is if I asked you how high you can jump. When you have nothing on, you can keep jumping at your peak height. But once I put weights on you and more weights on you, no longer. This is load. GPU utilization is just the act of jumping on time not how high and how much weight. How high you jump is clockspeed. And how much weight I put on you before you jump is load/activity level.
 
Last edited:
Are there workloads where higher frequency is more desired but with the same power consumption?
As noted, less-parallel front-end work could be an area where not enough wavefronts can be launched to fill all the CUs. Since there's no scaling out onto all the parallel resources, the overall execution time is more sensitive to how fast the existing waves can complete.
Similarly, barriers or events like pipeline or cache flushes can be limited by how fast the GPU can run through existing work or work through buffers and queues, instead of launching more parallel waves. If the GPU is waiting for such an event to complete before continuing on with new waves, how fast the GPU can churn through the time where significant fraction of its hardware cannot be used can affect overall performance.
 
I have a question about the ps5 and the typical fixed clockrates of most consoles like the xbsx or even an imaginary ps5 that has fixed clocks at 3.5 and 2.23.

What happens to a fixed clocks ps5 when it's doing the severely power hungry work load? The power consumption just goes up until it hits its power wall?
Can it still be downclocked even though it has fixed clockrates?
 
I have a question about the ps5 and the typical fixed clockrates of most consoles like the xbsx or even an imaginary ps5 that has fixed clocks at 3.5 and 2.23.

What happens to a fixed clocks ps5 when it's doing the severely power hungry work load? The power consumption just goes up until it hits its power wall?
Can it still be downclocked even though it has fixed clockrates?
Both CPU and GPU will be at nominal clock until the activity monitor determines the dissipation is too high. Developers have the choice of whether CPU or GPU clocks are prioritized.

If you’re asking if they could voluntarily limit clocks outside of the power limit — perhaps, but why would you want to? The behavior is completely deterministic and repeatable, so you wouldn’t be suddenly guaranteeing a better experience for users.
 
I have a question about the ps5 and the typical fixed clockrates of most consoles like the xbsx or even an imaginary ps5 that has fixed clocks at 3.5 and 2.23.

What happens to a fixed clocks ps5 when it's doing the severely power hungry work load? The power consumption just goes up until it hits its power wall?
Can it still be downclocked even though it has fixed clockrates?
It will reduce clock speeds according to power-usage (well actually after a predefined schema according to specific workloads) so the chip never goes above a specific "power stage".
So it is hard to say when the chip reaches which clock speed. Developers get fixed profiles so they can test what they can do under specific frequencies.
According to Cernys words, even >2Ghz is not really possible to hold stable under every workload. So we just can wait and see what happens.
 
Status
Not open for further replies.
Back
Top