Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
I’m pretty sure it won’t do that. There’s nothing to gain or compress if you are compressing individual blocks.

You not losslessly compressing individual blocks. You are losslessly compressing individual tiles of blocks. Each tile is composed of multiple blocks of lossy compressed pixels.

You can selectively load each tile into VRAM but you won't have the ability to randomly select pixel blocks within each tile so you need to decompress first and just keep the pixels in the texture block compression format while in VRAM.
 
Last edited:
You not losslessly compressing individual blocks. You are losslessly compressing individual tiles of blocks. Each tile is composed of multiple blocks of lossy compressed pixels.

You can selectively load each tile into VRAM but you won't have the ability to randomly select pixel blocks within each tile so you need to decompress first and just keep the pixels in the texture block compression format while in VRAM.
Will need to see if either format is capable of this. You would need a different compression algorithm imo. I don't think Kraken is designed for incremental streaming AFAIK or random access. BCPack is unknown since it's designed for textures, but I'm still unsure as to what that means.
 
Will need to see if either format is capable of this. You would need a different compression algorithm imo. I don't think Kraken is designed for incremental streaming AFAIK or random access. BCPack is unknown since it's designed for textures, but I'm still unsure as to what that means.

I think you are misunderstanding.

Kraken or Zlib is performing lossless compression. It’s simply serving as a way to minimize bandwidth consumption between the ssd and vram as well as the capacity used within the SSD. Textures have an additional layer of compression thats lossy and is a block based.

The block based compression also helps with bandwidth consumption across the ssd to vram and minimize storage usage on the ssd. But it also serves to minimize VRAM usage as data is compressed while allowing random access.

This layered or two-stage compression approach has been available for quite some time. It’s not really new. In fact there are newer derivative approaches that allows better compression.

http://www.jacobstrom.com/publications/StromWennerstenHPG2011.pdf
 
Last edited:
If other parts of the system could benefit from this new compression system then definitely. That would probably be worth the cost of ~2x the texture traffic over PCIe to the GPU (PCIe 4 and then 5 should have plenty of headroom).
This is the dilemma with the PC, many of the individual components and bus controllers have their own driver and are connected to other components over a bus. Consoles are different in that there is effective one I/O bus to unified RAM pool from which the CPU and GPU bus can access the pool.

If the compression scheme is only really suitable for graphics, then I still think that perhaps the GPU side of the PCIe bus would be good, as you get the benefits of reduced traffic, reduce memory footprint for streaming pools in main ram, and crucially the GPU vendors can implement it at will, without waiting for mobo chipsets or CPUs to implement it, and crucially the GPU teams can control implementation and drivers.
Kraken is optimised for images, Zlib is not. PS5 has hardware decompressors for both. Likewise XSX supports BCPack and Zlib.
 
I think you are misunderstanding.

Kraken or Zlib is performing lossless compression. It’s simply serving as a way to minimize bandwidth consumption between the ssd and vram as well as the capacity used within the SSD. Textures have an additional layer of compression thats lossy and is a block based.

The block based compression also helps with bandwidth consumption across the ssd to vram and minimize storage usage on the ssd. But it also serves to minimize VRAM usage as data is compressed while allowing random access.

This layered or two-stage compression approach has been available for quite some time. It’s not really new. In fact there are newer derivative approaches that allows better compression.
That's understood,. The concern is if the Kraken-type compression on SSD allow PRT to be read. If not compressed beyond DXTC, we could store and read texture tiles. If we crunch a whole texture down to its smallest size for fast loading, can we then load individual tiles within that archive? One would assume not. One can't dive into a .zip of a text document and pull out letters indexed at position 15 and 27 without unzipping the whole thing.
 
That's understood,. The concern is if the Kraken-type compression on SSD allow PRT to be read. If not compressed beyond DXTC, we could store and read texture tiles. If we crunch a whole texture down to its smallest size for fast loading, can we then load individual tiles within that archive? One would assume not. One can't dive into a .zip of a text document and pull out letters indexed at position 15 and 27 without unzipping the whole thing.
Data split into 256k chunks (for seek/random access)
So maximum read overhead for a part of file is 256k - 1byte.


MS have not said BCPack is lossless, have they?.
 
Last edited:
Data split into 256k chunks (for seek/random access)
So maximum read overhead for a part of file is 256k - 1byte.


MS have not said BCPack is lossless, have they?.
Too large for a tile I think. Effective VT solutions should have smaller tiles to reduce bandwidth and footprint in VRAM.

there are proprietary compressions for textures that will beat block compression and support VT; but I don’t think 256K blocks are the right size. That’s 1/4 MB.
 
256k chunking is both desirable for seeking/patching and means that all matches in that mode come from a fast local cache, no extra DRAM traffic. Minimizing impact on system mem BW was a design goal.


What does he mean by fast local cache? is this decompression block local sram?

Yes the SRAM inside the I/O Complex.

Other comment are very interesting too.


 
Last edited:
How do you manage data access with irregular compression? 256K of data on SSD will be varying amounts of texture data - one block may contain 10 tiles and another block contain 15.

If Kraken is transparent on system level you will get ssd read (but not ram write) overhead if required data is smaller, he also said "primary mode of use is on data split into 256k chunks", maybe where are other modes?
 
Too large for a tile I think. Effective VT solutions should have smaller tiles to reduce bandwidth and footprint in VRAM.

there are proprietary compressions for textures that will beat block compression and support VT; but I don’t think 256K blocks are the right size. That’s 1/4 MB.
Theoretically you can request 1 byte and SSD controller reads 256K and only DMA 1 byte to memory.
I do not think 256K is a lot however.

How do you manage data access with irregular compression? 256K of data on SSD will be varying amounts of texture data - one block may contain 10 tiles and another block contain 15.
Data format can hold offsets and real size of data in a table.
 
What's the size of pages in sony's ssd? I don't think we know. Often minimum page size on ssd is quite large. Could be the flash pages are 256kB or even 512kB. The optane argument really is valid for small pages as ssd's do have overhead on the minimum read size and optane can read storage like ram. That's also the same reason why trim was invented and early ssd drives could hitch when disk starts to be full(ish). There would be free space on disk but no free blocks. Drive would on writes then have to read many blocks that are not completely full, rearrange data and write back. Trim does this behind the scenes.

Games would have to pack the data into fairly sized chunks to optimize ssd bandwidth. In best case the ssd block size and kraken compression block sizes are same. I assume this is no problem because most data is large and the very small data you can cache in ram anyway if it turns out to be issue(it's small, pack small data together to larger blocks and cache it in ram as needed...)
 
Last edited:
Theoretically you can request 1 byte and SSD controller reads 256K and only DMA 1 byte to memory.
I do not think 256K is a lot however.

Data format can hold offsets and real size of data in a table.
That reminds me of Sony talking about address tables something something in their IO patents.
 
Theoretically you can request 1 byte and SSD controller reads 256K and only DMA 1 byte to memory.
I do not think 256K is a lot however.

Data format can hold offsets and real size of data in a table.
I’m not sure. Every VT system is different and is capable of different things and has different restrictions. I really don’t know what is and what isn’t possible; VT systems can get fairly complex and as I understand it supporting and trying to solve and tackle various edge/limit cases.

It would be ideal that the decompression hardware should be used with VT. And it would appear to be a small miss if it didn’t. But, there as many things we can see, many companies can do compression and VT purely in compute shaders. So I’m not sure if they are using kraken or something else.

But in non VT scenarios this compression hardware and power is still very effective. Which is everything else basically.

Some of the actual render programmers here can probably give a proper response; but the rabbit hole is going to be deep I suspect.
 
A byte addressable SSD? The only serious suggested implementation that I know of is one proposed by Samsung R&D in an IEEE paper in 2018.
Byte-addressable storage is already a thing in server-space, but there aren't - or were (and I think it's still the case) - any commercial byte-addressable filesystems so they tend to get utilised just like slow RAM with the inherent advantages / disadvantages that you would expect. I think their use is pretty niche, generally if you need byte-addressable storage you're working in a field where funding is sufficient to shove terabytes of RAM into yours server. In my previous job, some of our servers had 2 petabytes of RAM. And it wasn't enough! :no:
 
Status
Not open for further replies.
Back
Top