Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Quadbitnomial · Aug 15, 2020

iroboto said:
Unfortunately most of the time and near the top are unintentionally vague terminology. I work in data, if something was 80-90% you're already near the top. Anything north of 95 you are at the top. Most of the time really can mean 70% and above. If you ask any person what near the top and most of the time is I would largely believe that it would be around 80+ is near the top, and 70% and above to represent most of the time.

I would like to add that I enjoy a lot of the back and forth that happen in these threads. It would be boring without it regardless of how constructive it is or isn't. More often constructive in my opinion as a long time lurker and recent member.

But to further add some more context to what Mark Cerny said about the bolded so it becomes a lot less vague than it may seem;

"There's another phenomenon here, which is called 'race to idle'. Let's imagine we are running at 30Hz, and we're using 28 milliseconds out of our 33 millisecond budget, so the GPU is idle for five milliseconds. The power control logic will detect that low power is being consumed - after all, the GPU is not doing much for that five milliseconds - and conclude that the frequency should be increased. But that's a pointless bump in frequency," explains Mark Cerny.

At this point, the clocks may be faster, but the GPU has no work to do. Any frequency bump is totally pointless. "The net result is that the GPU doesn't do any more work, instead it processes its assigned work more quickly and then is idle for longer, just waiting for v-sync or the like. We use 'race to idle' to describe this pointless increase in a GPU's frequency," explains Cerny. "If you construct a variable frequency system, what you're going to see based on this phenomenon (and there's an equivalent on the CPU side) is that the frequencies are usually just pegged at the maximum! That's not meaningful, though; in order to make a meaningful statement about the GPU frequency, we need to find a location in the game where the GPU is fully utilised for 33.3 milliseconds out of a 33.3 millisecond frame.

"So, when I made the statement that the GPU will spend most of its time at or near its top frequency, that is with 'race to idle' taken out of the equation - we were looking at PlayStation 5 games in situations where the whole frame was being used productively. The same is true for the CPU, based on examination of situations where it has high utilisation throughout the frame, we have concluded that the CPU will spend most of its time at its peak frequency."
Put simply, with race to idle out of the equation and both CPU and GPU fully used, the boost clock system should still see both components running near to or at peak frequency most of the time. Cerny also stresses that power consumption and clock speeds don't have a linear relationship. Dropping frequency by 10 per cent reduces power consumption by around 27 per cent. "In general, a 10 per cent power reduction is just a few per cent reduction in frequency," Cerny emphasises.

Now lets look at AMD Polaris architecture and its AVFS strategy

Adaptive Frequency and Voltage Scaling (AVFS)

The most powerful technique deployed to manage power consumption in the Polaris architecture is AMD’s AVFS, which was first developed for the 6th-generation AMD ASeries APUs (“Carrizo”). Modern GPUs operate in an incredibly complex environment with radically different combinations of system configurations (e.g. voltage regulator quality, cooling solution), temperature, and varied and changing workload (e.g. light gaming or the latest AAA games filled with explosions and sophisticated effects). Moreover, even theoretically identical GPUs are subject to subtle variations in silicon manufacturing. Traditional design techniques are fairly pessimistic and account for all these potential differences through guardbands, which reduce the operating frequency and/or increase the voltage – sacrificing performance and increasing power consumption. The central concept of AVFS is to avoid guardbands and instead intelligently measure the behavior of each GPU and chose better combinations of voltage and frequency (fig. 9). AVFS uses power supply monitoring circuits to measure the voltage across different parts of a Polaris GPU in real time as seen by the actual transistors. Polaris GPUs also contains small replica circuits that mimic the slowest circuits in the GPU and are continuously monitored. Together these two blocks can measure how close the GPU is to the voltage limit at a given frequency. Similarly, the GPU can dynamically measure the temperature of the silicon in order to choose the right operating point since temperature affects transistor speed and power dissipation.

When the GPU boots up, the power management unit performs boot time calibration, which measures the voltage that is delivered to the GPU, compared to the voltage measured during the test and binning process. For example, it is fairly common for a voltage regulator to output 1.15V, but the GPU only receives 1.05V due to the system design. In the Polaris architecture, the power management unit can correct for this static difference very precisely, rather than requesting a more conservative (i.e. higher) voltage that would waste power. As a result, platform differences (e.g., higher quality voltage regulators) will translate into higher frequencies and lower power consumption. In addition, the boot-time calibration optimizes the voltage to account for aging and
reliability. Typically, as silicon ages the transistors and metal interconnects degrade and need a higher voltage to maintain stability at the same frequency. The traditional solution to this problem is to specify a voltage that is sufficiently high to guarantee reliable operation over 3-7 years under worst case conditions, which, over the life of the processor, can require as much as 6% greater power. Since the boot-time calibration uses aging-sensitive circuits, it automatically accounts for any aging and reliability issues. As a result, Polaris-based GPUs will run at a lower voltage or higher frequency throughout the life time of the product, delivering more performance for gaming and compute workloads.

Adaptive Clocking

Another advantage of AVFS is that it naturally handles changes induced by the workload. For example, when a complex effect such as an explosion or hair shader starts running, it will activate large portions of the GPU that suddenly draw power and cause the voltage to “droop” temporarily until the voltage regulators can respond. Conceptually, these voltage droops in a GPU or processor are similar to brownouts in a power grid (e.g. caused by millions of customers turning on their lights when they get home from work around 6pm). The power supply monitors detect the voltage droop in 1-2 cycles, and then a clockstretching circuit temporarily decreases the frequency just enough so that all circuits will work safely during the droop. The clock stretcher responds to voltage droops greater than 2.5% and can reduce the frequency by up to 20%. These droops events are quite rare, and the average clock frequency decreases by less than 1%, with almost no impact on performance. However, the efficiency benefits are quite large. The clock-stretching circuits enable increasing the frequency of Polaris GPUs by up to 140MHz.

It is a similar strategy AMD uses for RDNA but improved.

More Control Over GPU Power and Performance

Up until the AMD RadeonTM RX Vega and the RX 500 series GPUs, the clock speed (and associated voltage) of the GPUs was dictated by a small number of fixed, discrete DPM states. Depending on the workload, and available thermal and electrical headroom, the GPU would alternate between one of these fixed DPM states. As a result, the GPU had a lot less flexibility in finding and residing at the most optimum state since it had to be one of these valid DPM states, and nothing in between. Often, this meant leaving performance on the table if the ideal voltage-frequency (Vf) state happened to be in between two of the fixed DPM states.
In addition, for every single GPU within a SKU family (for example, reference RadeonTM RX Vega 64 GPUs), the DPM states or Vf points were identical. Given that there is always a die-to-die variance in performance even between two pieces of otherwise identical silicon, once again this meant giving up performance while catering to the lowest common denominator within the wafer population.

Starting with the AMD RadeonTM VII, and further optimized and refined with the RadeonTM RX 5700 series GPUs, AMD has implemented a much more granular ‘fine grain DPM’ mechanism vs. the fixed, discrete DPM states on previous RadeonTM RX GPUs. Instead of the small number of fixed DPM states, the RadeonTM RX 5700 series GPU have hundreds of Vf ‘states’ between the bookends of the idle clock and the theoretical ‘Fmax’ frequency defined for each GPU SKU. This more granular and responsive approach to managing GPU Vf states is further paired with a more sophisticated Adaptive Voltage Frequency Scaling (AVFS) architecture on the RadeonTM RX 5700 series GPUs.

As a result, each AMD RadeonTM RX 5700 GPU can find and run at the most optimum frequency, tailored to the specific workload, electrical, thermal and acoustic conditions – down to the last MHz. Paired with a Vf curve that is optimized for each individual RadeonTM RX 5700 series GPU, RadeonTM Adrenalin software and the RadeonTM WattMan tool provide much more granular control over the power and performance of the GPU.

Hope this adds more context to what Mark Cerny describes as the strategy they chose for PS5 and how it is different from previous consoles.

Vega86 · Aug 16, 2020

Since not all loads are equal for the CPU, is it also the same with the GPU?

QPlayer · Aug 16, 2020

Hot Chips conference is coming! Xbox POWA!

function · Aug 16, 2020

Vega86 said:
Since not all loads are equal for the CPU, is it also the same with the GPU?

Very much so!

The 5700 / 5700XT have three stated clocks: base, boost and game. 'Base' is what you should normally expect to be the minimum, 'boost'* is the guaranteed maximum possible boost frequency, and 'game' sits in between the two is representative of the kind of actual clocks they expect you will game at (though individual games will vary). You can see these different clocks in the table on this page:

https://www.anandtech.com/show/14618/the-amd-radeon-rx-5700-xt-rx-5700-review

*"Boost " is interesting, as depending on the qualities of the actual piece of silicon it may, and in Anand's case did, go a little higher (this obviously won't be the case in PS5 where all systems will be conservatively set to the same boosting criteria).

The page below contains a table showing what the average clocks achieved in samples of different games were:

https://www.anandtech.com/show/14618/the-amd-radeon-rx-5700-xt-rx-5700-review/15

You can see that the range across games was 150 mHz, with the highest clocks being for the old game GTAV, and the lowest being on heavier, newer games like Metro Exodus and The Division 2. GTAV actually clocked fractionally higher than the guaranteed max boost, as the low load combined with this particular card's improved max boost clock meant it could.

A couple of important things to point out:

- These frequencies aren't varying across different frames, they're altering dynamically based on load and so potentially many times per frame. The same will be true for PS5.

- This system is a fair bit different than Sony's though, in that on PC each card is using its own sensors and its own power settings to determine where it's clocking in order to squeeze every last hertz out of every card at every opportunity. For PS5 everything has to be deterministic, with every PS5 predictably working the same way.

This means building up a baseline model of how the worst acceptable PS5 chip runs and applying that to every PS5 (same is basically true for fixed clocks too). It can't be based off sensors in an individual PS5, as that would break uniformity across the platform. So it's naturally a fair bit more conservative than what we see on PC. In the case of the 5700XT reviewed above, applying baselines figures would have cost this particular card 139 mHz off the measured max boost clock. That would absolutely be a price worth paying in the console space though.

Cerny said something about the chip not being guaranteed to even run above 2.23, so you can't knock their ambition with clock speeds. Based on what we might infer from RDNA1, it seems likely that while some less demanding games will typically clock around max, some will be below that. But even if a game did average around 100 mHz or even 200 mHz lower in demanding scenes that's still higher than the 2 gHz they couldn't reach with fixed clocks, it's going to be even better in other games, and it's a win for their system.

Would be great if someone could slip DF their findings once they've got a game out the door.

iroboto · Aug 16, 2020

QPlayer said:
Hot Chips conference is coming! Xbox POWA!

Monday (Times in PDT)
5:00 – 6:30 PM: GPUs and Gaming Architectures

NVIDIA’s A100 GPU: Performance and Innovation for GPU Computing
- Jack Choquette and Wishwesh Gandhi, NVIDIA
The Xe GPU Architecture
- David Blythe, Intel
Xbox Series X System Architecture
- Jeff Andrews and Mark Grossman, Microsoft

Having perused their program, very curious to see these ones actually:
2:00 PM – 5:15 PM: Tutorial 2: Quantum Computing

Introduction
- Misha Smelyanskiy, Facebook
Quantum Supremacy Using a Programmable Superconducting Processor
- John Martinis, UCSB
Applications and Challenges with Near-term Quantum Hardware
- Jarrod McClean, Google
3:15 PM – 3:45 PM: BREAK
Underneath the Hood of a Superconducting Qubit Quantum Computer
- Matthias Steffen and Oliver Dial, IBM
Towards a Large-scale Quantum Computer Using Silicon Spin Qubits
- James S. Clarke, Intel
If Only We Could Control Them: Challenges and Solutions in Scaling the Control Interface of a Quantum Computer

scently · Aug 16, 2020

iroboto said:
Monday (Times in PDT)
5:00 – 6:30 PM: GPUs and Gaming Architectures

NVIDIA’s A100 GPU: Performance and Innovation for GPU Computing

Jack Choquette and Wishwesh Gandhi, NVIDIA

The Xe GPU Architecture

David Blythe, Intel

Xbox Series X System Architecture

Jeff Andrews and Mark Grossman, Microsoft

Having perused their program, very curious to see these ones actually:
2:00 PM – 5:15 PM: Tutorial 2: Quantum Computing

Introduction

Misha Smelyanskiy, Facebook

Quantum Supremacy Using a Programmable Superconducting Processor

John Martinis, UCSB

Applications and Challenges with Near-term Quantum Hardware

Jarrod McClean, Google

3:15 PM – 3:45 PM: BREAK

Underneath the Hood of a Superconducting Qubit Quantum Computer

Matthias Steffen and Oliver Dial, IBM

Towards a Large-scale Quantum Computer Using Silicon Spin Qubits

James S. Clarke, Intel

If Only We Could Control Them: Challenges and Solutions in Scaling the Control Interface of a Quantum Computer

This always contains detailed specifications of the makeup of the chips so we should learn a lot more about the raw numbers of certain ambiguous stuff like the 76mb eSRAM.

iroboto · Aug 17, 2020

Quadbitnomial said:
I would like to add that I enjoy a lot of the back and forth that happen in these threads. It would be boring without it regardless of how constructive it is or isn't. More often constructive in my opinion as a long time lurker and recent member.

But to further add some more context to what Mark Cerny said about the bolded so it becomes a lot less vague than it may seem;

Ehh, I guess I've becoming more of a kum-bai-ya type of person as of late. But I'm glad people are okay with the discourse.

I given Mark's words a quite a run through, and even then, it's still vague to me. If I use AMD's latest technology on the 5700 comparing 'game clock', the expected average clock for their AAA games, is still about 7-10% below the max boost number and that's with race to idle incorporated into the metrics which should be providing a nice boost to the average in this case.

I still would insinuate that if the PS5 was operating at 99% (both in clockspeed and frequency of reaching that target), I think Cerny would have just said 99%.
A win is a win in this case. Being able to go variable allowed them to hit a frequency 200+ Mhz higher than fixed and it holds 99% of the time.

I wouldn't be saying, "most of the time or near the top". That's just my take, but really we won't know until we get some benchmarks against a PC proxy

Vega86 · Aug 17, 2020

function said:
Very much so!

The 5700 / 5700XT have three stated clocks: base, boost and game. 'Base' is what you should normally expect to be the minimum, 'boost'* is the guaranteed maximum possible boost frequency, and 'game' sits in between the two is representative of the kind of actual clocks they expect you will game at (though individual games will vary). You can see these different clocks in the table on this page:

https://www.anandtech.com/show/14618/the-amd-radeon-rx-5700-xt-rx-5700-review

*"Boost " is interesting, as depending on the qualities of the actual piece of silicon it may, and in Anand's case did, go a little higher (this obviously won't be the case in PS5 where all systems will be conservatively set to the same boosting criteria).

The page below contains a table showing what the average clocks achieved in samples of different games were:

https://www.anandtech.com/show/14618/the-amd-radeon-rx-5700-xt-rx-5700-review/15

You can see that the range across games was 150 mHz, with the highest clocks being for the old game GTAV, and the lowest being on heavier, newer games like Metro Exodus and The Division 2. GTAV actually clocked fractionally higher than the guaranteed max boost, as the low load combined with this particular card's improved max boost clock meant it could.

A couple of important things to point out:

- These frequencies aren't varying across different frames, they're altering dynamically based on load and so potentially many times per frame. The same will be true for PS5.

- This system is a fair bit different than Sony's though, in that on PC each card is using its own sensors and its own power settings to determine where it's clocking in order to squeeze every last hertz out of every card at every opportunity. For PS5 everything has to be deterministic, with every PS5 predictably working the same way.

This means building up a baseline model of how the worst acceptable PS5 chip runs and applying that to every PS5 (same is basically true for fixed clocks too). It can't be based off sensors in an individual PS5, as that would break uniformity across the platform. So it's naturally a fair bit more conservative than what we see on PC. In the case of the 5700XT reviewed above, applying baselines figures would have cost this particular card 139 mHz off the measured max boost clock. That would absolutely be a price worth paying in the console space though.

Cerny said something about the chip not being guaranteed to even run above 2.23, so you can't knock their ambition with clock speeds. Based on what we might infer from RDNA1, it seems likely that while some less demanding games will typically clock around max, some will be below that. But even if a game did average around 100 mHz or even 200 mHz lower in demanding scenes that's still higher than the 2 gHz they couldn't reach with fixed clocks, it's going to be even better in other games, and it's a win for their system.

Would be great if someone could slip DF their findings once they've got a game out the door.

Thanks but maybe I should've phrased my question better.

Does the 256 bit avx load example also applies to the GPU?

It's said here that certain loads on the CPU consume a lot of power and that frequencies can be irrelevant to the power drawn. You can draw 100 watts at 3 gigz and you can draw 70 watts at 3 gigz?

Is that the same for the GPU as well? What instruction sets or workloads are on the GPU?

Thank you.

Silent_Buddha · Aug 17, 2020

ToTTenTranz said:
Less correct than claims that the GPU was around 10.5 TFLOPs (with no reference to CU count), which the github inquisitors promptly ridiculed at the time.

Eh? The 10.5 TFLOPs (10.28 actual) claim was 2.14% off. The up to 2.2 GHz (2.23 actual...most of the time) claim was 1.3% off. Which one was more accurate again?

And is it really something that is more or less accurate at those tiny percentages?

And considering that the one predicting up to 2.2 GHz got most of the other specs correct while the 10.5 TFLOPs was a number in isolation of anything else...um...

Regards,
SB

goonergaz · Aug 17, 2020

Silent_Buddha said:
Eh? The 10.5 TFLOPs (10.28 actual) claim was 2.14% off. The up to 2.2 GHz (2.23 actual...most of the time) claim was 1.3% off. Which one was more accurate again?

And is it really something that is more or less accurate at those tiny percentages?

And considering that the one predicting up to 2.2 GHz got most of the other specs correct while the 10.5 TFLOPs was a number in isolation of anything else...um...

Regards,
SB

where did 2.2 speed come from? GitHub was 2 ghz @ 9.2 tf and people were saying it would never be that speed, they are likely testing the ceiling of the chip and will dial back, so expect 8tf vs 12TF of XSX

Allandor · Aug 17, 2020

Vega86 said:
Thanks but maybe I should've phrased my question better.

Does the 256 bit avx load example also applies to the GPU?

It's said here that certain loads on the CPU consume a lot of power and that frequencies can be irrelevant to the power drawn. You can draw 100 watts at 3 gigz and you can draw 70 watts at 3 gigz?

Is that the same for the GPU as well? What instruction sets or workloads are on the GPU?

Thank you.

Yes. It is more or less the same on GPU-side.
E.g. if you fire up a program that fully use your shaders (~100% load), a GPU can draw >300W quite easy.
Another program that fully use the shaders (~100% load) can draw <100W on the same GPU. But in both cases the GPU load is ~100%.
It is always the question what you do with your hardware.

That is something why I think dynamic clocks are not good for a console, because it is already hard enough to calculate how much "power" you can use, but with a variable clock rate the formula get much more complicated. In this case (as a developer) I would generally target a lower "base"-performance and try to use dynamic resolution etc. to try to max it out.

The sony solution really is a method to max out the cooling solution and power-draw of the GPU. The smaller the chip gets, the harder it is to cool it if it uses the "same" power. It would be easier for developers if sony would just deliver a bit more power to the APU (as cerny suggestet that it is just a bit of power that is "smartshifted") and always clock as high as it should. But as the box is already gigantic (from what we know) I think they have just maxed the small APU out to much, so they needed something a bit more reliable than boost-frequencies but still on maxing out the power-usage that can be cooled down with such a small chip.
So I guess PS5 uses more than 300W and maybe even more than the xbox sx.

Behoemoth · Aug 17, 2020

Allandor said:
The sony solution really is a method to max out the cooling solution and power-draw of the GPU. The smaller the chip gets, the harder it is to cool it if it uses the "same" power. It would be easier for developers if sony would just deliver a bit more power to the APU (as cerny suggestet that it is just a bit of power that is "smartshifted") and always clock as high as it should. But as the box is already gigantic (from what we know) I think they have just maxed the small APU out to much, so they needed something a bit more reliable than boost-frequencies but still on maxing out the power-usage that can be cooled down with such a small chip.
So I guess PS5 uses more than 300W and maybe even more than the xbox sx.

Well, the solution Sony is using is the general motivation to increase the performance per area and thus increasing the overall efficiency of the chip. AMD is already using adaptive techniques for several generations and no one is really complaining about it unlike Intels dynamic solution which still causes the chips to throttle a lot.
However, I think gaming consoles never used such frequency or voltage varying methods till now.

PSman1700 · Aug 17, 2020

iroboto said:
If I use AMD's latest technology on the 5700 comparing 'game clock', the expected average clock for their AAA games, is still about 7-10% below the max boost number and that's with race to idle incorporated into the metrics which should be providing a nice boost to the average in this case.

I still would insinuate that if the PS5 was operating at 99% (both in clockspeed and frequency of reaching that target), I think Cerny would have just said 99%.
A win is a win in this case. Being able to go variable allowed them to hit a frequency 200+ Mhz higher than fixed and it holds 99% of the time.

I wouldn't be saying, "most of the time or near the top". That's just my take, but really we won't know until we get some benchmarks against a PC proxy

If it was 99% of the time, and that in most demanding cases, why even bother mentioning anything related to that at all? That's practically 100%.
He could have just said 'we couldnt achieve 2 and 3Ghz, but now we can 2.23 and 3.5 and leave out mentioning 99% of the time when things go heavy.
I say 99% cause some take it he ment that. He actually did not, he said 'most of the time', which can be anything. In game x it can run at 2ghz for the most, in another game the GPU and CPU could run at max speed most of the time.

If that all matters.... Look at forbidden west what they can do with the hardware, it doesnt matter the min is 9.2TF and max 10.2TF anymore.

Betanumerical · Aug 17, 2020

PSman1700 said:
If it was 99% of the time, and that in most demanding cases, why even bother mentioning anything related to that at all? That's practically 100%.
He could have just said 'we couldnt achieve 2 and 3Ghz, but now we can 2.23 and 3.5 and leave out mentioning 99% of the time when things go heavy.
I say 99% cause some take it he ment that. He actually did not, he said 'most of the time', which can be anything. In game x it can run at 2ghz for the most, in another game the GPU and CPU could run at max speed most of the time.

If that all matters.... Look at forbidden west what they can do with the hardware, it doesnt matter the min is 9.2TF and max 10.2TF anymore.

Because if they didn't mention anything at then it came out people would be up in arms about it still?.

PSman1700 · Aug 17, 2020

Betanumerical said:
Because if they didn't mention anything at then it came out people would be up in arms about it still?.

It would never come out because it would never downclock.

iroboto · Aug 17, 2020

Vega86 said:
Thanks but maybe I should've phrased my question better.

Does the 256 bit avx load example also applies to the GPU?

It's said here that certain loads on the CPU consume a lot of power and that frequencies can be irrelevant to the power drawn. You can draw 100 watts at 3 gigz and you can draw 70 watts at 3 gigz?

Is that the same for the GPU as well? What instruction sets or workloads are on the GPU?

Thank you.

So power draw isn't indicated per load. ie, working on compute shaders for say 4K image or an 8K image isn't going to use any more silicon. But 8K is a larger load than 4K for instance. What uses more power is how much of your GPU is flipping states. It's called activity factor, and it's a major component to determine how much power draw there is. So in this case Cerny mentions it in the article as well,

Mark Cerny counters. "I think you're asking what happens if there is a piece of code intentionally written so that every transistor (or the maximum number of transistors possible) in the CPU and GPU flip on every cycle. That's a pretty abstract question, games aren't anywhere near that amount of power consumption. In fact, if such a piece of code were to run on existing consoles, the power consumption would be well out of the intended operating range and it's even possible that the console would go into thermal shutdown. PS5 would handle such an unrealistic piece of code more gracefully."

So the code doesn't need to be a particular type of load. AVX256 instructions uses significantly more transistors than non-avx256 ones do. This is why the power on the CPU ramps up dramatically. This is highlighted by the CPU's move to fully leveraging 16 registers (per core) to complete SIMD calculations.
GPU's use SIMD by default so the question is what type of code would cause all of its CUs and registers to all be flipping bits.
I suspect It would have to be things done in parallel. And since a majority of the transistors are located in the CU's, the CU's would likely have to be doing some very efficient processing.

London Geezer · Aug 17, 2020

PSman1700 said:
If it was 99% of the time, and that in most demanding cases, why even bother mentioning anything related to that at all? That's practically 100%.
He could have just said 'we couldnt achieve 2 and 3Ghz, but now we can 2.23 and 3.5 and leave out mentioning 99% of the time when things go heavy..

Why wouldn't he mention it? It was a GDC talk, people wanted to hear as much technical info as Sony would provide. Cerny provided, in my opinion, way more in-depth info than it probably would be appropriate. He went on and on about the frequencies, about the tempest engine, and all these other things, which didn't "need" to be so in depth, but it was GDC, it was a tech talk, and so he talked about the tech.
How are we now going to complain that he mentioned things that people then of course took on and interpreted in their own way?

Deleted member 13524 · Aug 17, 2020

Silent_Buddha said:
Eh? The 10.5 TFLOPs (10.28 actual) claim was 2.14% off. The up to 2.2 GHz (2.23 actual...most of the time) claim was 1.3% off. Which one was more accurate again?

No one predicted 2.2GHz. The only time you ever saw 2.2GHz was from proelite's maximum limit for the clocks, which came from a "1.8GHz - 2.2GHz" range. As the High Priest of the Github Church, he insisted pretty hard on the 2GHz max up until the official specs came out. Those 2.2GHz were just the ceiling below which he was sure the specs would end up to grant him a win in the specs bet (that he lost).

There was IIRC one pretty known developer claiming 10.5 and another who is making VR games claiming 11-11.5. Perhaps these were simply the targets provided to them, and up until early this year after another of the (apparently many) SoC revisions, Sony could be undecided with having higher max clocks and/or enabling all 40 CUs. At the same 2.23GHz clocks, 40 CUs would result in 11.42 TFLOPs. If the devkits had/have all CUs enabled then that's the spec of the devkit, the same way the OneX devkit has a 6.6TFLOPs GPU.

What both these developers were pretty insistent on was the fact that the github gospel, although being real, presented outdated specifications. Which it did, despite all the denial that persisted after the official specs came out.
The same github gospel also said the PS5 was going to use a GPU codenamed Navi 10 Lite, pointing to a RDNA1 GPU with a featureset from 2019. We now know this is not the case, and those tests on the github that were apparently made in late 2018 were probably using engineering samples of the Navi 10 discrete GPU we see in the RX5700 family. We know the PS5 is using a SoC with an embedded RDNA2 GPU, which is definitely not a Navi 10.

London-boy said:
How are we now going to complain that he mentioned things that people then of course took on and interpreted in their own way?

Every time Cerny says something about the PS5 specs, there's always someone assuming he's being intentionally deceptive.

Cerny: The PS5 will use SSD tech.
Reaction: He just said SSD tech, it doesn't mean the PS5 is using SSD as main storage duh. It's probably a SSHD with a 32GB cache or something since SSDs are too expensive to put in a console.

Cerny: The PS5 will do raytracing.
Reaction: He said it'll do raytracing but there's probably no raytracing hardware, it's just doing software raytracing.

Cerny: Just to clear any doubts, there is raytracing in the GPU hardware.
Reaction: Nah he's still trying to trick us, there's GPU hardware that is doing software raytracing.

Cerny: The GPU will stay at or close to 2.23GHz most of the time.
Reaction: Haha nice try, 51% is "most of the time" too, and 2GHz is close to 2.23GHz in my book.

¯\_(ツ)_/¯

goonergaz · Aug 17, 2020

ToTTenTranz said:
No one predicted 2.2GHz. The only time you ever saw 2.2GHz was from proelite's maximum limit for the clocks, which came from a "1.8GHz - 2.2GHz" range. As the High Priest of the Github Church, he insisted pretty hard on the 2GHz max up until the official specs came out. Those 2.2GHz were just the ceiling below which he was sure the specs would end up to grant him a win in the specs bet (that he lost).

There was IIRC one pretty known developer claiming 10.5 and another who is making VR games claiming 11-11.5. Perhaps these were simply the targets provided to them, and up until early this year after another of the (apparently many) SoC revisions, Sony could be undecided with having higher max clocks and/or enabling all 40 CUs. At the same 2.23GHz clocks, 40 CUs would result in 11.42 TFLOPs. If the devkits had/have all CUs enabled then that's the spec of the devkit, the same way the OneX devkit has a 6.6TFLOPs GPU.

What both these developers were pretty insistent on was the fact that the github gospel, although being real, presented outdated specifications. Which it did, despite all the denial that persisted after the official specs came out.
The same github gospel also said the PS5 was going to use a GPU codenamed Navi 10 Lite, pointing to a RDNA1 GPU with a featureset from 2019. We now know this is not the case, and those tests on the github that were apparently made in late 2018 were probably using engineering samples of the Navi 10 discrete GPU we see in the RX5700 family. We know the PS5 is using a SoC with an embedded RDNA2 GPU, which is definitely not a Navi 10.

Every time Cerny says something about the PS5 specs, there's always someone assuming he's being intentionally deceptive.

Cerny: The PS5 will use SSD tech.
Reaction: He just said SSD tech, it doesn't mean the PS5 is using SSD as main storage duh. It's probably a SSHD with a 32GB cache or something since SSDs are too expensive to put in a console.

Cerny: The PS5 will do raytracing.
Reaction: He said it'll do raytracing but there's probably no raytracing hardware, it's just doing software raytracing.

Cerny: Just to clear any doubts, there is raytracing in the GPU hardware.
Reaction: Nah he's still trying to trick us, there's GPU hardware that is doing software raytracing.

Cerny: The GPU will stay at or close to 2.23GHz most of the time.
Reaction: Haha nice try, 51% is "most of the time" too, and 2GHz is close to 2.23GHz in my book.

¯\_(ツ)_/¯

You forgot that it was only audio RT at one point too.

I can’t wait for these things to arrive and get some DF reviews...cant wait to see how significant differences gets spun into fact when you only see it when pausing and zooming in on some 3rd party games.

iroboto · Aug 17, 2020

ToTTenTranz said:
Cerny: The GPU will stay at or close to 2.23GHz most of the time.
Reaction: Haha nice try, 51% is "most of the time" too, and 2GHz is close to 2.23GHz in my book.

Another perspective here, but no one said 51%.
As indicated earlier, we've seen ample evidence all boost and game clocks are within 10% of their max.
That means to expect the game to operate between game and boost clock as per AMD's specifications.
Just because PS5 does frequency on game code and not on thermals doesn't mean that PS5 isn't bound by the same physics.
Different chips will have different thermal properties. Some chips are bound to run hotter than others running the same code, this is parametric yield.
If you want to keep costs down, you're going to have to allow for larger thermal allowances. If game code keeps frequencies high into the 95-99% range at all times, you're going to have to start dropping off lower yield chips.
The alternative is to have your frequencies throttle down earlier, to allow for lower quality chips.

I'm okay with either method, obviously with cost being an important factor here for consoles, it's natural to lean into lower frequencies. Sony can easily continue to find ways to keep the value at that number, it's just going to cost more to keep poorer yield silicon performing at that level. if they want more power, remove the redundant CUs and run it full. It's only yield and costs at this point in time if you want to ignore everything.

Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Quadbitnomial

Vega86

QPlayer

function

None functional

iroboto

Daft Funk

scently

iroboto

Daft Funk

Vega86

Silent_Buddha

goonergaz

Allandor

Behoemoth

PSman1700

Betanumerical

PSman1700

iroboto

Daft Funk

London Geezer

Deleted member 13524

Guest

goonergaz

iroboto

Daft Funk

Similar threads