Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
A stupid question
We are all talking about how cool BCPack and kraken are, and how we would like to compress and decompress stuff all day.
From what I've read using hardware the overhead is really small, I don't know small compared to what, but small.
So why nobody uses them in a memory controller, even in bespoke products as consoles?
Maybe just on a pair of memory controllers, reserved for not latency sensitive data like texture, to store double the data.
 
Sony has intimated that they have invested a lot of silicon into the hardware of the decompression and whatever else is involved.

Edit: actually not sure if Cerney said that or it was Digital Foundry.
 
Last edited:
I don't know, how many?
Given Valve approach users to take the survey, I assume they pick people with at least some decent amount of gaming time. It'd be daft to select a large proportion of people who don't even use Steam for their Steam survey. ;)
 
Sony has intimated that they have invested a lot of silicon into the hardware of the decompression and whatever else is involved.

Edit: actually not sure if Cerney said that or it was Digital Foundry.
Not many, it still is a 256 bits bandwidth chip, so 300 mm2 around.
I wish they at least increase their GDDR6 speed.
 
A stupid question
We are all talking about how cool BCPack and kraken are, and how we would like to compress and decompress stuff all day.
From what I've read using hardware the overhead is really small, I don't know small compared to what, but small.
So why nobody uses them in a memory controller, even in bespoke products as consoles?
Maybe just on a pair of memory controllers, reserved for not latency sensitive data like texture, to store double the data.

Making a working compression/decompression solution that can work at memory speeds (30/50gb/s) with low enough latency and high throughput is a nontrivial task.

Compressing everything would also add issues to the 'random' access part as you would have to access the entire file or at least a large enough part to decompress everything which would waste bandwidth and increase latency if you only needed part of the file.
 
That is not how it's described.
DirectStorage – DirectStorage is an all new I/O system designed specifically for gaming to unleash the full performance of the SSD and hardware decompression. It is one of the components that comprise the Xbox Velocity Architecture. Modern games perform asset streaming in the background to continuously load the next parts of the world while you play, and DirectStorage can reduce the CPU overhead for these I/O operations from multiple cores to taking just a small fraction of a single core; thereby freeing considerable CPU power for the game to spend on areas like better physics or more NPCs in a scene. This newest member of the DirectX family is being introduced with Xbox Series X and we plan to bring it to Windows as well.
Is DirectStorage already in use with AI and HPC applications but called GPUDirect Storage?
 
Given Valve approach users to take the survey, I assume they pick people with at least some decent amount of gaming time. It'd be daft to select a large proportion of people who don't even use Steam for their Steam survey. ;)
I think this discussion has occurred before. I have Steam installed on all my machines, I never game my on 12" Macbook because it's a 12" Macbook, and I only ever seen to get the Steam hardware survey on that machine. Never on my gaming PC or iMac. I'm pretty sure it's random, bit it's definitely not targeted in that regard. :nope:
 
That's just speculation on what they think MS are doing. What MS are doing might be related to that with improvements for the console space...or it might not be. We don't have enough details to make any sort of call on it at the moment.

Regards,
SB

In last year's Project Scarlett E3 teaser, Jason Ronald - partner director of project management at Xbox - described how the SSD could be used as 'virtual memory', a teaser of sorts that only begins to hint at the functionality Microsoft has built into its system.
 
A stupid question
We are all talking about how cool BCPack and kraken are, and how we would like to compress and decompress stuff all day.
From what I've read using hardware the overhead is really small, I don't know small compared to what, but small.
So why nobody uses them in a memory controller, even in bespoke products as consoles?
Maybe just on a pair of memory controllers, reserved for not latency sensitive data like texture, to store double the data.

They do. Data compression on SSDs isn’t new. It’s been around since before the current gen of consoles.

It’s one way of reducing write amplification as well as increasing performance since there is less data being written and stored on a drive.

LSI was the first company to offer compression tech on ssd controllers. And their controllers were widely used. However, their tech was patented and fell under the control of Seagate.
 
Last edited:
Shift, it's not an interrogation or attack, and not expecting anyone here to have the answer. Please accept my apologies if I came across that way, or the need to defend Sony or respond at all. At least not on my account. I just happen to believe asking about a components base clock, when it's certainly known by the manufacture is a fair and legitimate question. I know I wouldn't feel good about buying a GPU with an advertised clock of 2GHz, but under load it was designed to throttle down to 1GHz without any disclosure of that information. Not saying that's the case here, but just providing as an example for the point of view. No one needs to agree with it. And I'm not asking for some esoteric research lab data. Every PS5 is going to ship with the exact same clock setting presets to use at various activity monitor levels. It's a simple curve with a top end frequency which can be maintained all the way up to a certain activity level, through a range, down to a bottom end frequency at the max possible activity level. It's predefined, known, and baked into every PS5. And I know we aren't going to get that from Sony unfortunately, but guessing it will get leaked at some point.

The argument I'm hearing is that, if they released that information, there would be no context to align actual game loads to that activity level curve. That perhaps worst case game loads would never reach those levels, and so the fact that those settings exist at all are meaningless. And I think that's the thrust of the point Cerny is trying to make when he says he believes the system will run near those max clocks most of the time, which I agree with if you were to profile a broad range of titles. But we know something like God of War for instance can push a PS4 Pro to something like 170+ watts, which is right at the limits of system TDP. So while an outlier, getting close is definitely possible. And as such, worth understanding how it's configured to run in such conditions. Just my opinion of course. I won't say anything more on it. Hoping you won't judge me to harshly in the future for having that perspective.

Thanks for all your contributions here, they are most appreciated. Respectfully.
You know where God of War can push that much watts ? (~170W)

On the main screen. This is where that test was done.
 
If someone wants more knowledge on electro migration being described above, see here:
One point that came up in the video was the 95C operating temperature for the 290 line leading to a higher rate of thermal cycling failures. I wouldn't have the data to know, although I recall at the time that Hawaii launched I theorized that the constant 95C could reduce the impact of cycling. I think there was an article or blurb from AMD that insinuated the same, but it's been so long that I can't find it.
Part of the impact of thermal cycling is the cycle part of the process, which a constant 95C wouldn't be doing. The power-up and power-down cycles are some of the most extreme transitions, but they are relatively infrequent. Spiky utilization of a high-power chip and the back-and-forth trips up and down the temp/fan curve can happen many times between system power-on events, which is something vendors have an eye on as a more persistent threat to the mechanical reliability of the package.
There's a wide array of optimization points for the choice of materials, their arrangement, and the power behavior of the chip. There's the coefficient of thermal expansion that vendors can try to match, or try to handle when it doesn't. On top of that, there are the properties of the connections and layers like the underfill. Over the whole operating range, the physical properties can shift. Layers can expand/contract, and they can be stiff/soft or brittle/flexible as well. Selecting a target range or operating limit can influence what materials are chosen, and mistakes can lead to materials that weaken too much at a high temp, or remain too stiff at some temperatures, causing them to transmit excessive force up or down the stack. In theory, a chip package with materials that matched well at 95C and didn't have overly stiff adhesive or support layers could exist at a comfortable balance at a fixed operating temperature, with only the rarer power-up or power-down ramps being the place where stresses rise. Taking the same stack and putting it out of its range or not being consistent could actually increase the rate of wear, even if cooler.

That was the theory at the time, although I don't have the long-term data to know if that turned out the be the case. It's possible it wasn't that helpful, or there could have been other reasons AMD moved away from that operating point. It was at a time something AMD indicated was a design advantage, that their DVFS could react quickly enough at 95C to maintain constant temp and not allow utilization spikes to push hotspot temps into dangerous territory. Competing GPUs needed much more safety margin in order to catch temperature ramps and give their longer driver-controlled loops time to react.
HBM memory, user fears of overheating, cooler variability, iffy leakage and efficiency effects, and possibly concerns of other temperature-driven effects besides cycling may have led to it being a solution appropriate only for that specific set of circumstances.

A stupid question
We are all talking about how cool BCPack and kraken are, and how we would like to compress and decompress stuff all day.
From what I've read using hardware the overhead is really small, I don't know small compared to what, but small.
So why nobody uses them in a memory controller, even in bespoke products as consoles?
Maybe just on a pair of memory controllers, reserved for not latency sensitive data like texture, to store double the data.
There are two items that come to mind.
First is that IBM has Active Memory Expansion, which works by setting aside part of RAM and treating it more like a storage device. There's the regular set of pages, and then a pool of compressed pages. Less-active pages are moved to the compressed pool, and compressed pages get decompressed and moved to the active pool when they are accessed.
https://www.ibm.com/support/pages/aix-active-memory-expansion-ame
Chips like the Power9 aren't cheap, and while the decompression block's bandwidth is theoretically quite high at 32 GB/s or so, this is far below what normal memory bandwidth of a major SOC (source: Power9 processor manual).
(edit: Correction, the block needs to compress and decompress for up to 16 GB/s into the compressor and 16GB/s out of the decompressor. There are other accelerator blocks in the engine, and the total bandwidth they share is 32GB/s in each direction.)
Given that this is paging memory blocks back and forth in a similar fashion to a disk access, the latency of the operation is significant. Real performance-sensitive operations depend on data remaining in the active pool. The motivation isn't bandwidth savings or outright performance, but is focused on workloads like keeping more VM instances active in memory than would be possible if they weren't compressed. Some workloads like a database might benefit from having more data in DRAM in a big server system because the latency hit for the compressed memory is still smaller than a trip to a storage node or network access to get data.

An alternative form is the in-line compression done by Qualcomm's cancelled server chip.
https://www.qualcomm.com/media/documents/files/qualcomm-centriq-2400-processor.pdf
It's low-latency and is able to work on data in memory that is actively being accessed, but it's described as allowing for 128B lines to sometimes compress to 64B, so it's not as capable for compression and the compressed lines in RAM leave gaps that cannot be used, meaning overall RAM consumption would be unchanged. It would save power for data transfers over the DRAM bus.
 
Last edited:
There is no limitation. Just as always Microsoft like to talk about theoretical max numbers as it those were typical, average numbers. PS5 SSD also reaches >20GB/s in ideal conditions. 6GB/s (or 4.8GB/s ?) would be also ideal conditions for MS. Some data cannot be compressed as much when you use a lossless algo.
PS5 SSD typical speed 9GB/s is only 40% of the decompressor speed (22GB/s). But xbox SSD 4.8GB/s is 80% of
decompressor speed (6GB/s). I feel it very strange that why Sony needs such a high speed decompressor?


I am curious that the number "8~9GB/s" for PS5 SSD is just theoretical max or real game performance?
PS5 SSD is 5.5GB/s theoretical raw speed x 1.6~1.7 compression ratio = 9 GB/s.
However in Sony's SSD patents the real world speed can reach 80~90% of theoretical max speed.
Can PS5 reach more than 8GB/s in the real game so PS5 uses 22GB/s decompressor for the
best scenario?

On the xbox side we have seen the load of a x1x game which takes about 8~9 seconds.In other words
the real world SSD speed is probably 1~1.5GB/s which is about half of the raw speed 2.4GB/s If we
count the compression rate 2x then we can expect 2~3GB/s for xsx games. So xsx only needs 6GB/s
of decompressor.

If PS5 can't reach 8GB/s (for example, 4GB/s in real world games) then I feel very strange to use a
22GB/s decompressor.
 
PS5 SSD typical speed 9GB/s is only 40% of the decompressor speed (22GB/s). But xbox SSD 4.8GB/s is 80% of
decompressor speed (6GB/s). I feel it very strange that why Sony needs such a high speed decompressor?
The designs use different algorithms with different implementation costs. There could also be a difference in how the blocks are hooked into the on-die fabric for transporting data.
Maybe Sony's algorithm has a lower cost in silicon for the higher ratio, even if it's rare, so it wasn't a big deal to add it. If Microsoft's method had a higher hardware cost for a rare case, or the algorithm optimized for more general compression versus rare high-compression cases, then the design may have opted for a lower peak.

Even if the hardware blocks had high throughput in rare cases, it's possible the consoles have different numbers of units in their IO block that share links with the system in a different way. A narrower and less expensive in terms of power and area link might free up resources for other parts of the chip, rather than targeting a very rare case. Perhaps Sony's choices created additional link capacity, and once there there was little reason not to use it even for rare compression rates.
 
If that were the case then PS4-P and XBO-X would garner the majority of sales over their base counterparts, but that isn't the case. The XBO-X is over 300% more performant than the XBO-S and generally about 100 USD more and the XBO-S sells significantly more units.

Price is important to the majority of console buyers. It's also one of the reasons why the majority of console buyers are unlikely to ever buy a new gaming PC.

Regards,
SB
Not seeing the marketing at all, it’s all how you market something.
Simple equation, would you buy a $400 consol...
Or buy a $500 consol that was much more powerful.
 
Status
Not open for further replies.
Back
Top