Digital Foundry Article Technical Discussion [2022]

Status
Not open for further replies.
I'm not sure what the future holds for dedicated decompression hardware on the PC.

It's definitely the future. When you design any fixed hardware you have to accept that it can never be fully future-proofed, but that's not a reason to do nothing. However, lossless data compression is something of an algorithm enigma because the real advances are in the compression algorithm which does not require changes to the decompressor. This is why you can take a zlib/lz decompression algorithm from 15 years ago and it will decompress a zlib/lz stream from a modern compressor using the latest compression techniques.

The same is true with Oodle Kraken which Sony put into PS5. Whilst the base decompressor pre-dated the latest compression tech, the data is constant and can be decompressed just fine.

These IC blocks are also fairly basic in complexity and transistor count because decompressing a stream is pretty straight forward which is why even really underpowered CPUs can do it at pace.

If we look at the PC space specifically they are designed to operate entirely without a GPU, whereas consoles must have a GPU. The inclusion of HUMA and shared memory generates a scenario on console that could never occur on PC, quite simply I/O to memory is as direct as it gets.
I think you've misunderstood my post. I'm repeating it here - minus typos with underlines for emphasis:

In X years, this functionality will definitely be part of PCs I/O chipset which is where is should be. Data is read, decompressed and routed to main RAM or video RAM with no CPU or GPU intervention. But that's a long road to drive this, working with OS vendors to build support for this into the OS.

Putting decompression blocks in the I/O logic is both insignificant in terms of transistor counts and it doesn't matter what memory setup you have because like today's consoles, these all happens between the data being read by the drive, passed on by the drive/interface controller but before it hits the northbridge/memory controller.
 
Last edited by a moderator:
It's definitely the future. When you design any fixed hardware you have to accept that it can never be fully future-proofed, but that's not a reason to do nothing. However, lossless data compression is something of an algorithm enigma because the real advances are in the compression algorithm which does not require changes to the decompressor. This is why you can take a zlib/lz decompression algorithm from 15 years ago and it will decompress a zlib/lz stream from a modern compressor using the latest compression techniques.

The same is true with Oodle Kraken which Sony put into PS5. Whilst the base decompressor pre-dated the latest compression tech, the data is constant and can be decompressed just fine.

These IC blocks are also fairly basic in complexity and transistor count because decompressing a stream is pretty straight forward which is why even really underpowered CPUs can do it at pace.


I think you've misunderstood my post. I'm repeating it here - minus typos with underlines for emphasis:

In X years, this functionality will definitely be part of PCs I/O chipset which is where is should be. Data is read, decompressed and routed to main RAM or video RAM with no CPU or GPU intervention. But that's a long road to drive this, working with OS vendors to build support for this into the OS.

Putting decompression blocks in the I/O logic is both insignificant in terms of transistor counts and it doesn't matter what memory setup you have because like today's consoles, these all happens between the data being read by the drive, passed on by the drive/interface controller but before it hits the northbridge/memory controller.
I agree with the bolded. It was the paragraph before I didn’t agree with. Though im unsure which company will have a hold over the compression algorithm; it’s not a worry for me as long as it is industry standard and used widely enough for it to be functional.
 
I agree with the bolded. It was the paragraph before I didn’t agree with.

When said: "For this generation of consoles, this was entirely about chasing performance and not keeping costs down (money). Engineering and manufacturing this custom silicon and creating APIs for it work, cost time and money. Both Microsoft and Sony could have let the APUs in the consoles do the decompression as the always have done, by the CPU or via the zlib decompressors that they had in the previous generation."

Did you perhaps misread it, or could you explain further?

Though im unsure which company will have a hold over the compression algorithm; it’s not a worry for me as long as it is industry standard and used widely enough for it to be functional.

There are a bunch of people who are developing compressions algorithms, although most are predicated on a small number of core algorithms. It will be Intel and AMD who would determine which decompression algorithms to support but there aren't so many that they couldn't support lz, oodle-kraken and any emerging ones. Genuinely new lossless algorithms are exceedingly rare and it would in those companies interest to allow hardware decompression support, because that would help drive interest/support/adoption which means they can licence the compression side of the tech.
 
When said: "For this generation of consoles, this was entirely about chasing performance and not keeping costs down (money). Engineering and manufacturing this custom silicon and creating APIs for it work, cost time and money. Both Microsoft and Sony could have let the APUs in the consoles do the decompression as the always have done, by the CPU or via the zlib decompressors that they had in the previous generation."

Did you perhaps misread it, or could you explain further?



There are a bunch of people who are developing compressions algorithms, although most are predicated on a small number of core algorithms. It will be Intel and AMD who would determine which decompression algorithms to support but there aren't so many that they couldn't support lz, oodle-kraken and any emerging ones. Genuinely new lossless algorithms are exceedingly rare and it would in those companies interest to allow hardware decompression support, because that would help drive interest/support/adoption which means they can licence the compression side of the tech.
And in my post I just explained it could not just use standard GPU decompression because of bandwidth contention constraints. The costs are on bandwidth limits. There was only one solution and this method is cheaper per console than the alternative.

Let’s say 100M to make the custom compression silicon. You sell 100M consoles; that’s $1 per console.
The cost of increasing bandwidth 30-50% more to allow for GPU decompression would cost more than $1 per console.

You said it was just chasing performance and not keeping costs down but I disagree with that.
 
Last edited:
And in my post I just explained it could not just use standard GPU decompression because of bandwidth contention constraints. The costs are on bandwidth limits. There was only one solution and this method is cheaper per console than the alternative.

Yeah, from PC side, having decompression on CPU IO is the simpler option from R&D and industry adoption angle. Its cheaper in that sense. But if you want to maximize BW, then having decompression of the GPU as well to not hamper the bus with uncompressed data could potentially be cheaper than bruteforcing it with a wider/faster bus. I'm not sure.
 
When said: "For this generation of consoles, this was entirely about chasing performance and not keeping costs down (money). Engineering and manufacturing this custom silicon and creating APIs for it work, cost time and money. Both Microsoft and Sony could have let the APUs in the consoles do the decompression as the always have done, by the CPU or via the zlib decompressors that they had in the previous generation."

Did you perhaps misread it, or could you explain further?



There are a bunch of people who are developing compressions algorithms, although most are predicated on a small number of core algorithms. It will be Intel and AMD who would determine which decompression algorithms to support but there aren't so many that they couldn't support lz, oodle-kraken and any emerging ones. Genuinely new lossless algorithms are exceedingly rare and it would in those companies interest to allow hardware decompression support, because that would help drive interest/support/adoption which means they can licence the compression side of the tech.

I think there are good arguments for and against both solutions without there being a clear winner (in the PC space). I've had a stab at breaking them down below.

Separate ASIC

Pro:

  • Allows CPU targeted data to be decompressed without burdening the CPU or having to copy it back and forth over the GPU's PCIe bus
  • Places no additional burden on the GPU
  • Potentially more guaranteed JIT performance as per @iroboto's post (although we don't how GPU based decompression works under the hood yet though so it's quite possible it's implemented in such a way as to give give a guaranteed minimum latency).
Cons:
  • Adds cost and eats up die space (however little) on the CPU IO chip.
  • Performance limited - i.e. it might be fast enough for todays PCIe 5 drives but become a bottleneck for PCIe 6 or 7 drives in a couple of years.
  • Inflexible - It can only support those compression formats for which it was built. This was one of the main reasons the decompression block in the PS4/XBO didn't get used. It ended up faster in many cases to use more advanced compression formats (e.g. Kraken) on the CPU that it was do do LZ on the decompression block. This will act as a disincentive to game developers using newer and more advanced compression formats as they become available as well as to those looking to develop new compression formats. GPU compute based lossless decompression is a fairly new thing so it's conceivable we might see big advances in it over the next few years.
  • You're potentially creating a monopoly (or at least restricting competition) in the compression format market depending on what is supported. There's a reason Sony gives Kraken away for free to PS5 devs but the same wouldn't hold true for PC's. PC devs still have to pay a license fee to use those formats as far as I'm aware.

GPU Decompression

Pro:

  • It's utilising potentially idle compute power via async compute without the need for a separate chip
  • Potentially less performance limited and so may scale better with faster SSD's. When not fast enough it's likely due to the GPU being older or low end and thus a GPU upgrade also improves your decompression performance (both gaming related functions on the single gaming oriented add in card).
  • More flexible and presumably able to work with any GPU based decompression format that comes along giving developers the freedom to use whichever they choose.
  • Allows the GPU targeted data to be sent over the GPU's PCIe bus compressed, potentially making for very significant PCIe bandwidth savings - although it's arguable if we actually need them
  • It may be possible to store game assets in VRAM in a compressed state ready for GPU decompression on the fly resulting in a potential multiplier for VRAM
Cons:
  • Still leaves the decompression of CPU targeted date to the CPU itself, only GPU targeted data is decompressed on the GPU. So the CPU cycle savings are lesser
  • May have some appreciable impact on gaming performance - yet to be seen
 
  • It may be possible to store game assets in VRAM in a compressed state ready for GPU decompression on the fly resulting in a potential multiplier for VRAM

Unless you are decompressing that data every frame, I can't see the VRAM savings. You could save system RAM this way by keeping it compressed there, and sending it through the buss still compressed to be decomp. in the GPU only when needed, so that is a plus, but it saves system RAM rather than VRAM.

Theoretically, one could keep some compressed cashed data on VRAM and decompress only some of the needed parts when needed and then save the uncompressed data for future frames and such, but the whole promise of SSD and fast IO is to NOT waste fast VRAM with unused cashed data, so I figure that goes against the "way of the future".
 
Unless you are decompressing that data every frame, I can't see the VRAM savings. You could save system RAM this way by keeping it compressed there, and sending it through the buss still compressed to be decomp. in the GPU only when needed, so that is a plus, but it saves system RAM rather than VRAM.

Theoretically, one could keep some compressed cashed data on VRAM and decompress only some of the needed parts when needed and then save the uncompressed data for future frames and such, but the whole promise of SSD and fast IO is to NOT waste fast VRAM with unused cashed data, so I figure that goes against the "way of the future".

Good point, I hadn't thought of it that way. But if we consider all levels of PC performance, then systems with slower SSD's but decent amounts of RAM/VRAM could benefit from this.
 
I can see why console manufacturers went with ASICs for compression as consoles are stuck with their hardware for the next 4-5 years. There is a need to maximize bandwidth and minimize the work of the CPU and GPU as much as possible from the start.

But PCIE 5 based SSDs are due out this year while PCIE 6 SSDs are due out in 12-18 months and pci-e 7 specs due in roughly 24 months. Intel 12 series are already available and Ryzen 7 CPU are due at the end of the month. What’s the point of compression based ASICs for PC when an upgrade in a year or two will get you 10-15 GBps off just raw bandwidth alone.
 
Last edited:
Good point, I hadn't thought of it that way. But if we consider all levels of PC performance, then systems with slower SSD's but decent amounts of RAM/VRAM could benefit from this.

True as a theoretical but if the industry trully addopts the idea of leveraging fast SSD I/O more heavily, then such system which will have a modern enough CPU/GPU to feature HW decompression from the future, will most likely also have a fast enough SSD, no?
 
And in my post I just explained it could not just use standard GPU decompression because of bandwidth contention constraints. The costs are on bandwidth limits. There was only one solution and this method is cheaper per console than the alternative.

I haven't mentioned the GPU at all. The decompression blocks would go into the I/O - the glue that traditionally comprises the south/north-bridge bridge. That is why the CPU, GPU and memory layout are irrelevant.

Let’s say 100M to make the custom compression silicon. You sell 100M consoles; that’s $1 per console. The cost of increasing bandwidth 30-50% more to allow for GPU decompression would cost more than $1 per console.

You don't need custom compression silicon at all, the dev compresses the data and that's how it's installed. That's no different to now. All that does change is a modest transistor block that handles decompression. This would represent a tiny transistor footprint of what is already a massive chunk of the die.

You said it was just chasing performance and not keeping costs down but I disagree with that.

What I said was in response to you saying "With consoles it always comes down to money". Consoles are budget-conscious designs and these I/O decompression approaches cost time & money to develop and implement . So it was not 'always about money', but about performance for Microsoft and Sony. Like I said in an earlier post, both console manufacturers could have chosen to do nothing but include fast SSDs and leave the decompression/check-in model as it has been on consoles and PC for decades. But they chose to spend money to have better performance.
 
Last edited by a moderator:
True as a theoretical but if the industry trully addopts the idea of leveraging fast SSD I/O more heavily, then such system which will have a modern enough CPU/GPU to feature HW decompression from the future, will most likely also have a fast enough SSD, no?

From that we know so far the only GPU requirement for Direct Storage is Shader Model 6.0 which means Maxwell (GTX 9xx) and Arctic Islands (RX 4xx) level GPU's or above. That doesn't preclude them from making new requirements when the GPU decompression update is released but hopefully not.

So it's quite conceivable that you could have an older system using a SATA SSD that's not fast enough for the on demand streaming model, but is able to mitigate that by caching more game data into RAM/VRAM.

I think even in the case of a modern system with an NVMe drive, caching data into memory still has it's place. If for example a game is designed to take full advantage of the PS5's IO, then unless the PC is running a 5.5+ GB/s NVMe drive it may be unable to run the game at PS5 settings with the same level of streaming performance. But that PC might have 32GB RAM and 16GB VRAM and be in a position to cache much of the data that the PS5 would have to go back out to the SSD to fetch, thus allowing the PS5 settings or be equalised or exceeded.

And they will have the same problem current NVME drives have.

No software to support their capability = They're useless.

Isn't the point here that games are now being designed to take advantage of high speed NVMe drives? If they're architected in such a way as to minimise CPU time at loading screens then with sufficiently fast CPU's, the NVMe may become a bottleneck again. Granted when even a 5.5GB/s NVMe is capable of filling even a top end GPU's VRAM in about 1.5 seconds (with decompression) then there's only so much gain that can be had from a faster NVMe, but if the CPU can keep up then PCIe 5.0 drives would be needed for sub 1 second load times in that scenario and PCIe 6.0 would be needed for sub 0.5/instant loads. Obviously the question of whether the CPU can keep up is a major one because it seems to be the bottleneck in pretty much every example we have today of fast loading.
 
Like I said in an earlier post, both console manufacturers could have chosen to do nothing but include fast SSDs and leave the decompression/check-in model as it has been on consoles and PC for decades. But they chose to spend money to have better performance.

Both PS4 and XBO had hardware decompression units like the PS5 and XBS do. The current gen solution is really just a necessary evolution of that due to the faster drives they're using.
 
Both PS4 and XBO had hardware decompression units like the PS5 and XBS do. The current gen solution is really just a necessary evolution of that due to the faster drives they're using.
The decompression hardware in last gen consoles required the data to first be loaded into RAM, then it was decompressed - needing memory to hold both the input compressed data and output decompressing data. It's definitely an evolution of the architecture and not trivial to implement on PC. A technical PC consortium will need to assemble, discuss the technical aspects and agree an implementation.

We'll know when this is being considered because like other changes to what we consider the basic/modern PC specifications (like when TPM was included), the consortium will become public - you can see the sort of thing that will happen by looking at the history of the TCG. PC will just brute force this for a while.
 
Status
Not open for further replies.
Back
Top