Digital Foundry Article Technical Discussion [2022]

Status
Not open for further replies.
Like I said in an earlier post, both console manufacturers could have chosen to do nothing but include fast SSDs and leave the decompression/check-in model as it has been on consoles and PC for decades. But they chose to spend money to have better performance.
So I don't think I've been explicit enough in my posts so perhaps there's some mix up of where my arguments are directed towards, not to mention a poor typo I'm noticing now. To begin with, I want to affirm that I agree that sometime in the X future there will be some form of hardware decompression on PC. But I want to caveat that all of my responses have been directed towards the quoted bit around console manufactures spending money to have better performance.

On console:
It costs significantly more to do decompression off the GPU than to run the decompression through on the consoles custom IO controllers. As per Brit's notes, 1-2TF of power can be siphoned from compute to run decompression at a high rate. But the challenge with consoles is bandwidth contention and the fact that both (PS5/XSX) GPUs don't really have much compute power to spare. If they wanted the GPUs to do the decompression on console without dedicated decompression hardware, the GPU would need to both get slightly larger in compute and significantly larger in bandwidth because the CPU and GPU are already using it for game code and rendering, and now we are layering in decompression as a third item. The amount of additional bandwidth needed to accommodate on top of the losses due to bandwidth contention would significantly increase the cost PER console.

So as I was saying earlier, you noted that it takes time and money to make these decompression solutions; but as I noted earlier if we assume 100M to make the custom I/O silicon and a fraction of a dollar to place it into a console, the actual cost of that I/O solution is $1 per console sold if 100M consoles are sold. The cost to move bandwidth to 50% more than they are now would cost way more than $1 per console. So in my perspective of things, in order to support high speed texture fidelity on console the cheaper solution was what they built.

IMO, if they chose to brute force it like on PC, they would have given the consoles both more compute and bandwidth. That would result in larger wins because if the game is not leveraging an insane amount of I/O decompression on the GPU than those compute and bandwidth resources can be put towards graphical rendering.

So this is where I'm at odds with your underlined, the brute force method provides overall better performance at least from a flexible resource perspective. So let's circle back to your statement here:
both console manufacturers could have chosen to do nothing but include fast SSDs and leave the decompression/check-in model as it has been on consoles and PC for decades
Quite frankly they didn't because it costs too much. My highlights below. The flash drive route was discussed to save DRAM costs. If you really want to save DRAM costs, you can't use the GPU to decompress the assets off SSD because that will take more DRAM or more expensive DRAM. The move to Flash and custom I/O controllers IMO are about saving money while still getting the performance that they want.
hotchips.jpg
 
Last edited:
What I said was in response to you saying "With consoles it always comes down to money". Consoles are budget-conscious designs and these I/O decompression approaches cost time & money to develop and implement . So it was not 'always about money', but about performance for Microsoft and Sony. Like I said in an earlier post, both console manufacturers could have chosen to do nothing but include fast SSDs and leave the decompression/check-in model as it has been on consoles and PC for decades. But they chose to spend money to have better performance.

It does always come down to money. These boxes need to go for around 400 500 dollars max. This custom ASIC for decompression is nothing more then a cost saving measure. They could have gone the PC way, ie going with a more capable GPU to accompany a more powerfull decompression solution (and more GPU performance at the same time), but that would have been more expensive. They would have needed to opt out on a shared memory design since that already hampers aniso filtring, so one can imagine decompression.
This cheap ASIC block is much cheaper then extra CU's for more TF on the GPU. Yes its less capable then the GPU solution, and less flexible/scaling, but its cost that matters.

And they will have the same problem current NVME drives have.

No software to support their capability = They're useless.

its not like theres so much support on the consoles side either atm.
 
I've never heard of this. Can you explain?
Anisotropic filtering comes in the following options
2X, 4X, 8X, 16X
These represent the number of samples of the texel it will take. Each one doubles the number of samples, thus doubling the bandwidth requirement per texture; respectively 4X takes double the bandwidth than 2X and so forth.

Because on consoles bandwidth is at a significant premium (CPU and GPU share the bandwidth and contention causes asymmetrical loss, and CPU bandwidth is prioritized) and on PCs it is not, AF is often pulled back because with some titles more than others, the hit to bandwidth is not worth the increase. Some titles can look really good with 8X, and others can handle as low as 4X. And some titles just look terrible without 16X. It's not exactly clear to me why this is the case, but that's just something we've noticed. But on PC space, there's always ample bandwidth available to handle 16X AF. That's just not true on consoles.

I do suspect in the future with infinity cache, this will likely be a solved problem on consoles.

If you were around back in day when we were going through color modes, the move to 32bpp color was a massive hit to the GPU. Most people couldn't see the difference between 16-32bpp for some reason. But it became clear for most people when Homeworld arrived. That being said, it's just one of those growing pains on consoles that PC has dealt with through just significant brute force, but on things like steam deck, where the platform is more constrained, I can still see AF being an issue.
 
So as I was saying earlier, you noted that it takes time and money to make these decompression solutions; but as I noted earlier if we assume 100M to make the custom I/O silicon and a fraction of a dollar to place it into a console, the actual cost of that I/O solution is $1 per console sold if 100M consoles are sold. The cost to move bandwidth to 50% more than they are now would cost way more than $1 per console. So in my perspective of things, in order to support high speed texture fidelity on console the cheaper solution was what they built.

This is no way these chips cost less than a price of a cup of coffee. What on earth makes you think that? :-? PS5's CXD90062GG SSD controller is bigger than each of the flash blanks.

IMO, if they chose to brute force it like on PC, they would have given the consoles both more compute and bandwidth.

Brute forcing it like on PC would have been a step back from PS4 and Xbox One, both of which have zlib hardware decompression and neither of which requires its own compute. I'm struggling to see the constituency's in your arguments. More compute and more bandwidth costs more money and more compute on the APU would certainly impact of overall yields of these APUs. A discrete I/O block was the way to go here.

But I was talking about PC and where you would integrate hardware decompression if you wanted to do it like on the console and there really is only place place, pretty much the same architectural locale as on consoles. Between the solid state and the main memory buses.
 
Brute forcing it like on PC would have been a step back from PS4 and Xbox One, both of which have zlib hardware decompression and neither of which requires its own compute. I'm struggling to see the constituency's in your arguments. More compute and more bandwidth costs more money and more compute on the APU would certainly impact of overall yields of these APUs. A discrete I/O block was the way to go here.

But I was talking about PC and where you would integrate hardware decompression if you wanted to do it like on the console and there really is only place place, pretty much the same architectural locale as on consoles. Between the solid state and the main memory buses.

Its not really brute-forcing on pc though. Its efficient usage of compute capabilities on the GPU, where it should happen since the GPU is still rendering things (not the ssd). More compute and more BW doesnt just mean a more capable IO solution for the SSD, it also means more raster and RT power (amd). On consoles, thats not really a good idea due to cost. A simple yet not-as-capable dedicated ASIC is going to be a cheaper solution.
For the next console they'd need a faster, more capable ASIC block.... That is if the consoles dont follow suit and use the GPU for decompression since next gen consoles should have more capable hw. Though that might break BC?
 
I can see why console manufacturers went with ASICs for compression as consoles are stuck with their hardware for the next 4-5 years. There is a need to maximize bandwidth and minimize the work of the CPU and GPU as much as possible from the start.

But PCIE 5 based SSDs are due out this year while PCIE 6 SSDs are due out in 12-18 months and pci-e 7 specs due in roughly 24 months. Intel 12 series are already available and Ryzen 7 CPU are due at the end of the month. What’s the point of compression based ASICs for PC when an upgrade in a year or two will get you 10-15 GBps off just raw bandwidth alone.
To be fair I think your time line for when customers actually get PCI-E 6 and 7 are a bit off kilter.
 
This is no way these chips cost less than a price of a cup of coffee. What on earth makes you think that? :-? PS5's CXD90062GG SSD controller is bigger than each of the flash blanks.
True, but it's in no where close to the cost of increasing bandwidth by 50%+. It's not likely to be the case, the bus would have to be increased significantly or the memory chips would have to run sufficiently faster. Either case it's going to be cheaper to run the SSD controller.

Brute forcing it like on PC would have been a step back from PS4 and Xbox One, both of which have zlib hardware decompression and neither of which requires its own compute. I'm struggling to see the constituency's in your arguments. More compute and more bandwidth costs more money and more compute on the APU would certainly impact of overall yields of these APUs. A discrete I/O block was the way to go here.
The argument was quite simply, a brute force method was never an option for console. It's in response to your statement:

| both console manufacturers could have chosen to do nothing but include fast SSDs and leave the decompression/check-in model as it has been on consoles and PC for decades. But they chose to spend money to have better performance.

Imo: they couldn't have chosen. zlib is not sufficient for what was needed. Compute offers an alternative but cost is an issue. The cheapest alternative is still a discrete I/O block, they didn't choose to spend money to have better performance, I think we're agreeing it was the only way forward if this is the functionality required. Its certainly cheaper.

But I was talking about PC and where you would integrate hardware decompression if you wanted to do it like on the console and there really is only place place, pretty much the same architectural locale as on consoles. Between the solid state and the main memory buses.
I don't have a problem with this. But also, I don't have an issue with also saying that on the PC space where power and price is not really the major concern, (unlike Ray Tracing in which without dedicated hardware we have no chance), then I'm also ok with there not being a hardware solution because decompression isn't that heavy of a task, leaving it up to the market to decide which decompression algorithm to use in software vs coming to consensus for industry standards for hardware. Either is okay with me in this space, I suspect it will take a while for the latter to occur and the prior can be done as Direct Storage matures.
 
Brute forcing it like on PC would have been a step back from PS4 and Xbox One, both of which have zlib hardware decompression and neither of which requires its own compute. I'm struggling to see the constituency's in your arguments. More compute and more bandwidth costs more money and more compute on the APU would certainly impact of overall yields of these APUs. A discrete I/O block was the way to go here.

But I was talking about PC and where you would integrate hardware decompression if you wanted to do it like on the console and there really is only place place, pretty much the same architectural locale as on consoles. Between the solid state and the main memory buses.

In console terms where the hardware is fixed and determined by the total cost to manufacture, it would certainly be a step back.

However, for PC where you generally have resources to spare (especially in enthusiast gamer machines) it's a method to use available resources in order to accomplish something similar at no additional cost.

Sure, budget PC gamers wouldn't generally benefit to nearly the same degree as enthusiast PC gamers, but then budget PC gamers are also quite used to the fact that they often have to make some compromises in terms of gaming performance compared to enthusiast gamers. IE - you don't hear budget PC gamers complain to even remotely the same degree as say console gamers if they encounter the occasional drop in performance, or if they need to drop graphical IQ settings to lower than console setting levels.

Basically, on PC it's often more cost efficient for gaming to brute force things because most gamers that actually care about performance generally have an over-abundance of computing resourses (GPU, CPU, Memory, etc.) to throw at a problem.
  1. So for enthusiast gamers, dedicated silicon would only serve to increase the cost while at the same time limiting what could be done to what the dedicated silicon can do.
  2. For budget gamers, dedicated silicon would still increase the cost but could accomplish things more efficiently than brute forcing things. Of course, then you're likely to be limited by the CPU, GPU and memory in a budget system such that a budget gamer would likely not see the full benefit of the dedicated decompression silicon. Which then begs the question of the usefulness of the dedicated silicon.
Outside of gaming, there would likely be benefits even in high spec machines depending on the non-gaming workload. But for gaming, there isn't much reason to not brute force it, IMO. Yes, it is less efficient, but it is also far more flexible.

Regards,
SB
 
True, but it's in no where close to the cost of increasing bandwidth by 50%+. It's not likely to be the case, the bus would have to be increased significantly or the memory chips would have to run sufficiently faster. Either case it's going to be cheaper to run the SSD controller.

Why did does bandwidth need to increase by 50%? Where is this coming from? There was no expectation on the part of consumers that consoles would have some futuristic hardware decompression built into the I/O that wasn't even a thing on PC. I'm not following your reasoning. I never mentioned, the bus or about 90% of the others things you keep includes in your replied to me. I am utterly stumped, it feels like you're replying to somebody else about something else entirely.

Just to be clear, because I'm not sure why you're posting cost curves over time of DRAM vs solid state storage, I only take point with the post where you said it was about money. Again, and I'm keeping this is simple as possible, if the consoles were focussed only on the cost aspect they could put SSD drives in them and kept everything else the same as last generation. They would have had a massive boost over 5400rpm drives.

Imo: they couldn't have chosen. zlib is not sufficient for what was needed. Compute offers an alternative but cost is an issue. The cheapest alternative is still a discrete I/O block, they didn't choose to spend money to have better performance, I think we're agreeing it was the only way forward if this is the functionality required. Its certainly cheaper.

I don't follow this sentence at all. Sony chose zlib and Oodle/Kraken, Microsoft chose zlib and bcpack. Both PS5 and Xbox Sereis have hardware decompression for these formats, just like they both had hardware zlib last gen. zlib (for lz) remains industry standard for most general data types. Oodle also does texture and general data compression but bcpack is designed for graphics/textures. Both companies chose a pair of solutions, one industry standard and one particularly well suited for textures.

The implementation of hardware decompression is all that has changed. This is what Microsoft and Sony spent money on. They could have had a discrete hardware decompression block on the APU like previous gen which would have been the least-expensive middle-ground but both companies pushed the boat out.
I don't have a problem with this. But also, I don't have an issue with also saying that on the PC space where power and price is not really the major concern.
This is such a meme-level PCMR thing to say. Price and power absolutely are an issue for a great many people, because they have to pay for their hardware.

I think I'm out here though. I'm happy to disagree with people but I don't even think we're talking about the same things.
 
This is such a meme-level PCMR thing to say.

He's for sure not a master race forum poster thats for sure.

Price and power absolutely are an issue for a great many people, because they have to pay for their hardware.

Then they shouldnt get consoles either, its less expensive upfront, more expensive long-term, unless all you do is play very few games a year. Not counting MS then since gamepass has a value just as high on xbox as on pc.
Theres good reason for Sony going with dedicated hw block for decompression, its best for a console where you target a certain BOM and a fixed design for seven years. However on pc things evolve all the time, you dont want to be stuck on a certain level of performance, hampering future interfaces and drives. Decompression on the GPU also is good for BC and future developments.
To minimally meet PS5 level of performance, you'd want a 3060(TI) which still should have ample performance to decompress enough to play at PS5-level settings. Seeing RT has become standard in current gen software, that 3060 as an entry into current generation should be good enough. 3060 was a lower-midrange or even low end GPU from two years ago.

I wouldnt want a hardware ASIC for decompression in the pc space, it hampers performance and flexibility. Full GPU saturation practically never happens anyway.
 
But I was talking about PC and where you would integrate hardware decompression if you wanted to do it like on the console and there really is only place place, pretty much the same architectural locale as on consoles. Between the solid state and the main memory buses.

I don't think it's that straightforward. The GPU may be a more viable location for it for the PC market, primarily because it's a gaming related feature that CPU makers may not wish to spend the effort and die space on when a large portion of their consumer base would see no benefit from the tech. If we look back to the pro's for the ASIC I noted earlier:

  • Allows CPU targeted data to be decompressed without burdening the CPU or having to copy it back and forth over the GPU's PCIe bus
  • Places no additional burden on the GPU
  • Potentially more guaranteed JIT performance as per @iroboto's post (although we don't how GPU based decompression works under the hood yet though so it's quite possible it's implemented in such a way as to give give a guaranteed minimum latency).
The bottom 2 would still hold if it was located on the GPU. For the first one, it's not clear what the remaining load would be on the CPU once the GPU data decompression has been removed (i.e. is it even relevant) and on top of that, it may be feasible to send the CPU data to the GPU for decompression and then back to system RAM. It would probably be a net win in terms of PCIe bandwidth because you would be able to send the much larger GPU data stream to the GPU in compressed format.

That is if the consoles dont follow suit and use the GPU for decompression since next gen consoles should have more capable hw. Though that might break BC?

That's a really interesting point. Presumably the use of this hardware decompression block has now locked the console vendors into including something similar in at least the next console generation even if those compression formats are outdated by then. Either that or go with a more generalised GPU compute based approach.

The implementation of hardware decompression is all that has changed. This is what Microsoft and Sony spent money on. They could have had a discrete hardware decompression block on the APU like previous gen which would have been the least-expensive middle-ground but both companies pushed the boat out.

In what way do you think they've pushed the boat out in terms of the hardware decompression unit vs last gen (other than supporting additional formats at higher performance levels)? It's presumably still on the APU in the PS5 since the Zen2 IO hub is part of the CPU.
 
Just to be clear, because I'm not sure why you're posting cost curves over time of DRAM vs solid state storage, I only take point with the post where you said it was about money. Again, and I'm keeping this is simple as possible, if the consoles were focussed only on the cost aspect they could put SSD drives in them and kept everything else the same as last generation. They would have had a massive boost over 5400rpm drives.

In the slide that @iroboto posted, MS were stating that the motivation behind Velocity Architecture (which is more than just the SSD) was to allow for a reduced DRAM footprint. Without the full suite of benefits that VA offers - which includes high on the fly decompression throughput without loading the CPU, and removing some intermediate buffers in DRAM - MS would not have been in as strong a position to realise their dram footprint / dram cost reduction goals.

I don't think you can really separate performance and cost. VA both reduces overall cost over time of MS's target capability level (clearly a huge consideration for MS) and increases performance of a hugely important part of the system. They're two sides of the same coin - you can't get the overall best bang for buck without investing in the whole of VA (or Sony's equivalent suite of capabilities).

I've also just remembered Series S as I was about to post. Oh boy! Probably wouldn't want to be using up much GPU on decompression for that little fella, and MS are already working to free up every MB they can for developers (by optionally disabling things iirc).
 
I don't think it's that straightforward. The GPU may be a more viable location for it for the PC market, primarily because it's a gaming related feature that CPU makers may not wish to spend the effort and die space on when a large portion of their consumer base would see no benefit from the tech. If we look back to the pro's for the ASIC I noted earlier:
Exactly, on PC it's not straight-forward at all and that's why I prefaced that statement with "if you wanted to do it like on the consoles.." This absolutely requires an architectural tweak on PC - the insertion of a semi-automatous controller - assuming you want to have the simplest I/O implementation and you don't want the CPU or the GPU impacted in any way.

In what way do you think they've pushed the boat out in terms of the hardware decompression unit vs last gen (other than supporting additional formats at higher performance levels)?

Not in the hardware decompression but the solution is brilliant. On both consoles, zlib and oodle/bcpack compressed data is read off the storage and decompressed in realtime before it's written to RAM. Compare that with all other implementations; previous generation consoles or DirectStorage on PC, where you have to read the data off the storage, write it to memory first, where either the CPU or GPU decompresses it and writes it back to memory. If the CPU is decompressing data intended for the GPU, it then it get sent over PCI to the VRAM. If the GPU is decompressing data intended for the CPU then it gets sent over PCI to main RAM.

It's presumably still on the APU in the PS5 since the Zen2 IO hub is part of the CPU.

I'm not sure about Xbox Series but in PS5 the decompression hardware is part of the custom control chip on it's own die (p/n CXD90062GG) which you can see on pictures of the motherboard.
 
Exactly, on PC it's not straight-forward at all and that's why I prefaced that statement with "if you wanted to do it like on the consoles.." This absolutely requires an architectural tweak on PC - the insertion of a semi-automatous controller - assuming you want to have the simplest I/O implementation and you don't want the CPU or the GPU impacted in any way.

No its not needed on a hardware-level at all (considering 2020+ hardware), and theres no reason to 'do it like on the consoles', i see no reason as to why (hw wise). Software needs an architectural tweak though, and thats whats being worked on right now.
 
Considering https://wccftech.com/fsr-2-0-added-to-deep-rock-galactic-coming-soon-to-saints-row/
I don't see why they wouldn't implement it on consoles also.
Even though framerate limits seem beyond them, could see how every platform could easily benefit including current. Maybe XO wouldn't be worthwhile.
Could even give a decent quality and performance mode for XSS, still probably not including RT though.

Seen consoles miss out on that update even when they don't have their own solution.
 
I've also just remembered Series S as I was about to post. Oh boy! Probably wouldn't want to be using up much GPU on decompression for that little fella, and MS are already working to free up every MB they can for developers (by optionally disabling things iirc).
Probably would've stuck with LHA, or studios would use kraken/oodle on cpu?
Think they said it runs ok on last gen.
Did I see they have an avx implementation?

It's a shame that everything about BCPack is NDA'd, even documtation.
 
To be fair I think your time line for when customers actually get PCI-E 6 and 7 are a bit off
The timing for 7 is related just to the spec release. The timing for PCI-6 is not just my made up opinion but based on the assertion by PCI-SIG as they gave a 12-18 month figure for PCIE 6 hardware back in Jan. It obvious they were off but they gave that figure with PCIE 5 still waiting for availability.

Apparently there is demand for a quick transistion to PCIE 6 as it will fully support data centers transition to 800g ethernet. Also, ML/AI needs are helping to drive that push.

None of it is given which is why I said an upgrade in 1 to 2 years will give 10-15 GBs of raw bandwidth. That’s in PCIE 5 SSDs range.
 
Last edited:
Status
Not open for further replies.
Back
Top