Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
Yep, that's why Xeons drop to 50% of the freq when AVXing.
While the desktop CPUs don't.
It is getting ridiculous.

What’s ridiculous? That AMD and Intel choose to market their products or design their products to operate in such fashion? Or the assertion that console manufacturers can choose to design their products around fixed clocks?
 
They can, the question is will it have a better performance?
So far it seems it will not.

Are u talking consoles vs. pc? Then no. What would be the point anyway, pc product iterate much faster and better performance is a year away.

Console vs. console? You always have the option to go wider. Simply allow the design to draw more power and provide a better cooling system.
 
Maybe having large L1 and L2 GPU caches? They are determined per shader engine and per memory channel respectively, going by RDNA 1.
AMD's used SRAM as a marketing point before, when it launched Vega. The presentation included the claim that Vega has 45MB of SRAM on-die, and we never figured out just where all of it came from. The L1 and L2 caches, vector register files, estimated ROP cache and vertex parameter cache capacities were still far from explaining the amount. There's likely a sea of pipeline buffers, internal registers, and internal processors adding to that total.
Microsoft could be doing what AMD did and counted every possible SRAM macro on the chip.
Vega didn't have 8 CPU cores and their attendant L3s, which would be very large single contributors to the total for the console.

Oh it was in one of the confirmed leaks? Then I have a follow up question: How exactly is having more CU's per shader engine supposed to be worse? And it's not especially high count either, Navi 10 has 20 CUs per Shader Engine, XSX 13(14) if there's 4 Shader Engines
Cerny mentioned it can be more difficult to fill the larger number of CUs with work in a case where there was a lot of small geometry.
There was a presentation from AMD for GDC2018 where one topic of CU count versus SE count arose, and larger GPUs with more CUs per SE would do better with longer-running waves. There's a bottleneck in terms of how quickly the shader engine can launch waves in GCN, one wave every 4 cycles. In a GCN GPU with 16 CUs per SE, that's 64 cycles to get one wave per CU, and GCN needs at least 4 each to populate every SIMD, so 256 cycles before the last SIMD in the SE has a wave launched. Then, there's usually a preference of at least 2-4 wavefronts per SIMD so that there can be latency hiding, so we can see that fill time can take a lot of cycles.
If the workload has a lot of simple or short-lived shaders, SIMDs could have sub-optimal occupancy because those waves may terminate long before the SE gets back to them.
Smaller GPUs would be able to reach peak occupancy faster, and while they lack the sustained grunt of the larger GPUs, they would have fewer resources idling when dealing with short shaders.

There was a shift in hardware distribution from shader engine to shader array in Navi, but not much detail on the reason or what didn't transfer. The pressure would be more acute if for some reason the SE:CU ratio was still important rather than SE:SA.
Since the rasterizers and primitive units migrated to the SA, it would seem like the launch function would move with them, but there are other things AMD didn't distribute to the SAs. Not much attention was paid to this change, or detail given about what had really changed.

Hang on, did Sony say that? I don't remember Sony saying that.
Sony would be testing to ensure the drive wouldn't be obviously non-performant, and would check the physical dimensions of the drive and any attached heatsink/fan.
Given how many M.2 SSDs use the same configuration under a random assortment of metal and fans, I would expect some rejects would work fine if their heatsink were removed and the drive could rely on whatever measure the PS5 has for cooling bare drives (if it does). However, I suppose it would be irresponsible for Cerny to comment that gamers could just tear their drives apart before installing.

Probably shutdown, just like the PS5 or any device would or should do, you know, to prevent fires or damage to the hardware.
The Xbox One had thermal failsafe modes where it didn't immediately shut down. If there's an anomalous environmental condition, the console isn't obligated to follow its normal operation guarantees.
https://www.eurogamer.net/articles/2013-08-14-xbox-one-built-to-automatically-tackle-overheating

AMD’ TDP is rated in Watts but it’s not electrical watts, its what AMD calls “thermal watts”. It’s a measure of heat not power consumption.
AMD's TDP is a somewhat opaque measure of heatsink thermal transfer capability, based on a number of inputs that AMD tweaks at its convenience.
If it were truly a measure of heat, then it would also be a measure of power consumption, because physically they are almost completely the same outside of trace conversions to other forms of energy.
 
AMD's used SRAM as a marketing point before, when it launched Vega. The presentation included the claim that Vega has 45MB of SRAM on-die, and we never figured out just where all of it came from. The L1 and L2 caches, vector register files, estimated ROP cache and vertex parameter cache capacities were still far from explaining the amount. There's likely a sea of pipeline buffers, internal registers, and internal processors adding to that total.
Microsoft could be doing what AMD did and counted every possible SRAM macro on the chip.
Vega didn't have 8 CPU cores and their attendant L3s, which would be very large single contributors to the total for the console.


Cerny mentioned it can be more difficult to fill the larger number of CUs with work in a case where there was a lot of small geometry.
There was a presentation from AMD for GDC2018 where one topic of CU count versus SE count arose, and larger GPUs with more CUs per SE would do better with longer-running waves. There's a bottleneck in terms of how quickly the shader engine can launch waves in GCN, one wave every 4 cycles. In a GCN GPU with 16 CUs per SE, that's 64 cycles to get one wave per CU, and GCN needs at least 4 each to populate every SIMD, so 256 cycles before the last SIMD in the SE has a wave launched. Then, there's usually a preference of at least 2-4 wavefronts per SIMD so that there can be latency hiding, so we can see that fill time can take a lot of cycles.
If the workload has a lot of simple or short-lived shaders, SIMDs could have sub-optimal occupancy because those waves may terminate long before the SE gets back to them.
Smaller GPUs would be able to reach peak occupancy faster, and while they lack the sustained grunt of the larger GPUs, they would have fewer resources idling when dealing with short shaders.

There was a shift in hardware distribution from shader engine to shader array in Navi, but not much detail on the reason or what didn't transfer. The pressure would be more acute if for some reason the SE:CU ratio was still important rather than SE:SA.
Since the rasterizers and primitive units migrated to the SA, it would seem like the launch function would move with them, but there are other things AMD didn't distribute to the SAs. Not much attention was paid to this change, or detail given about what had really changed.


Sony would be testing to ensure the drive wouldn't be obviously non-performant, and would check the physical dimensions of the drive and any attached heatsink/fan.
Given how many M.2 SSDs use the same configuration under a random assortment of metal and fans, I would expect some rejects would work fine if their heatsink were removed and the drive could rely on whatever measure the PS5 has for cooling bare drives (if it does). However, I suppose it would be irresponsible for Cerny to comment that gamers could just tear their drives apart before installing.


The Xbox One had thermal failsafe modes where it didn't immediately shut down. If there's an anomalous environmental condition, the console isn't obligated to follow its normal operation guarantees.
https://www.eurogamer.net/articles/2013-08-14-xbox-one-built-to-automatically-tackle-overheating


AMD's TDP is a somewhat opaque measure of heatsink thermal transfer capability, based on a number of inputs that AMD tweaks at its convenience.
If it were truly a measure of heat, then it would also be a measure of power consumption, because physically they are almost completely the same outside of trace conversions to other forms of energy.
These two are helpful on your last point.


and this interview:
https://www.anandtech.com/show/1458...ng-an-interview-with-intel-fellow-guy-therien
 
AMD's TDP is a somewhat opaque measure of heatsink thermal transfer capability, based on a number of inputs that AMD tweaks at its convenience.
If it were truly a measure of heat, then it would also be a measure of power consumption, because physically they are almost completely the same outside of trace conversions to other forms of energy.

I get what you are saying.

Here is AMD formula for TDP

TDP (Watts) = (tCase°C - tAmbient°C)/(HSF ϴca)

You take the temperature between the die and the heatspreader minus the ambient temperature being pull into the case and divide that by the rating of the heat sink and fan.

The TDP is a formula for designers to ensure that they are dissipating the proper amount of heat so AMD processors can optimally operate.

It’s doesn’t have to be a reflection of the maximum power that can be consumed by the processor. AMD’s TDP for a processor can be met just by fiddling around with a variable that has nothing to do with power consumption.
 
Last edited:
Hehehehe... Good solution ;)

Only 6.03% os steam users have more than 16 GB RAM, and 49.7% os users have quad cores, so I would have to agree...
But how many will do it... it's another matter!

One thing to be careful about when looking at Steam hardware survey stats that I've said many times in the Steam Hardware Survey thread. China has a disproportionate share (at least 1/3 of Steam users are in China) of the survey such that the numbers aren't reflective of hardware install base in Western countries. And a large majority of machines in China are lower spec using used hardware that is recycled from western countries. Chinese PC companies are the #1 customer for recycled computer hardware in the west. Basically when a PC is recycled either its parts are purchased in bulk by Chinese companies that resell computers to Chinese consumers or they get scrapped.

Regards,
SB
 
If the speed and latency of the SSD is what it is in the PS5, seeming to be basically an NVME pci-e 4.0 spec with additional customizations to improve latency even further, then how much RAM you need may vary greatly between having a PC with an nvme pci-e 3.0 or lower spec HDD and a console ... ?

The speeds of the Xbox series X seems to suggest something pci-e 3.0 based. Do we know of any customizations by Microsoft on improving the latency / seek times here from the base setup? The expansion cards they created would probably need to have the same kind of performance characteristics?
 
Xbox series X seems to suggest something pci-e 3.0

Its pcie4.0, with ideal speeds to match the interface.

To your other question, it seems that normal ssd’s still arent a replacement for ram, theres more into play then raw speeds. Intel optane seems to be closer to ram latency but more expensive.

For pc i guess you should be fine with a fast pci4 nvme drive for console oriented games.
 
I think you misunderstand. It is not a question that with this kind of performance you can leave a lot of things out of RAM until you need it, but a given. The question only is how much RAM can you save at what SSD latency and throughout speeds.

This will partly depend on the type of game of course, but we have concrete examples for games like the latest Spider-Man which would benefit much presumably because of how fast you travel through an open world, and for which it was already explained that right now a lot of data duplication needs to happen because of seek times (similar already to what was explained in that great Crash Bandicoot technical deep dive posted recently which was one of the first games to stream like that, from CD at the time), but also items you need often and quickly then need to stay in RAM at all times.

Many games would have similar considerations although to varying degrees of course.

The main bottlenecks why we need fast memory is not anymore as far as I know for being able to retrieve textures or meshes quickly, but for the complex repeated calculations and transformations we need to do on them. So if our storage becomes much faster in latency and throughput, bringing in meshes and textures for a given scene can at some point be done in real-time without needing to cache anything in RAM, leaving this completely to that which needs to be transformed and calculated with in the current scene, and being able to replace the entire contents of RAM in just a second means we can transition between scenes at such a fast rate we can see much greater a variation of better quality textures and meshes in a much shorter time, and as such the impact would be far greater than say having 32GB with a much slower storage solution.

EDIT: oh and your comparing Octane with NVME on PCI-e 3.0 performance above ... ?
 
Its pcie4.0, with ideal speeds to match the interface.

I believe Arwin was asking if it was NVME 3.0. I've only looked at the DF articles on it, and in those there was no mention of whether it was NVME 3.0 or NVME 4.0, just that it was an NVME drive.

The speed that MS chose is more along the lines of a NVME 3.0 drive. While it isn't as fast as the fastest NVME 3.0 drives, this is likely due to MS wanting a consistent performance level that won't degrade due to load or thermals. Faster drives will generate more heat making them more difficult to cool such that performance remains consistent. As well, I'd imagine cost was a factor in them choosing a NVME 3.0 speed grade versus the higher speeds that NVME 4.0 allows. NVME 4.0 based drives are almost twice the price of NVME 3.0 based drives.

Regards,
SB
 
I believe Arwin was asking if it was NVME 3.0. I've only looked at the DF articles on it, and in those there was no mention of whether it was NVME 3.0 or NVME 4.0, just that it was an NVME drive.

The speed that MS chose is more along the lines of a NVME 3.0 drive. While it isn't as fast as the fastest NVME 3.0 drives, this is likely due to MS wanting a consistent performance level that won't degrade due to load or thermals. Faster drives will generate more heat making them more difficult to cool such that performance remains consistent. As well, I'd imagine cost was a factor in them choosing a NVME 3.0 speed grade versus the higher speeds that NVME 4.0 allows. NVME 4.0 based drives are almost twice the price of NVME 3.0 based drives.

Regards,
SB

4.0 vs 3.0 doesn’t apply to NVME only to PCI-E. At the speeds Microsoft uses there is no need for them to support 4.0 unless they want to make faster ‘memory cards’ in the future.

On the other hand it is very hard to compare what Sony has been doing to improve SSD performance for games to anything in the PC space it seems (have been trying to do some research). Indeed it does seem that it may be closest to the work intel is doing with Optane?
 
Too bad they wouldn't be able to stripe performance between internal and external. :eek:

Would they have hooked up both the internal and external drives to one single custom 4.0? It could be possible since 4.0 just doubles the lanes from 3.0?

EDIT: the changes in NVME 1.3 to 1.4 do look potentially significant and relevant to the work Sony is doing for latency improvements and how third party vendors could communicate if they are compatible with that Sony needs?

https://nvmexpress.org/changes-in-nvme-revision-1-4/
 
Last edited:
4.0 vs 3.0 doesn’t apply to NVME only to PCI-E. At the speeds Microsoft uses there is no need for them to support 4.0 unless they want to make faster ‘memory cards’ in the future.

On the other hand it is very hard to compare what Sony has been doing to improve SSD performance for games to anything in the PC space it seems (have been trying to do some research). Indeed it does seem that it may be closest to the work intel is doing with Optane?

Sorry, I'm just so used to looking at NVME 3.0 and 4.0 based on which version of PCIE it supports versus explicitly stating PCIE 3.0 and 4.0.

But basically yes. I suspect they may just be using PCIE 3.0 for NVME even though the SOC likely supports PCIE 4.0. At that speed it doesn't need more than PCIE 3.0. Of course the option is always there for them to go higher with a newer revision of the console.

Regards,
SB
 
4.0 vs 3.0 doesn’t apply to NVME only to PCI-E. At the speeds Microsoft uses there is no need for them to support 4.0 unless they want to make faster ‘memory cards’ in the future.

On the other hand it is very hard to compare what Sony has been doing to improve SSD performance for games to anything in the PC space it seems (have been trying to do some research). Indeed it does seem that it may be closest to the work intel is doing with Optane?

In theory, it could make memory-mapped files as sparse resources viable. AMD GPUs can translate the process virtual address space & fault via IOMMU already. So perhaps a blazingly fast secondary storage, together with PS4/5’s unified memory model and proprietary stack, could achieve what VK/DX12 sparse resource APIs cannot enable today.
 
Status
Not open for further replies.
Back
Top