Blazing Fast NVMEs and Direct Storage API for PCs *spawn*

Just wanted to point out something. The thread started out about Shader Compilation now it's moved on to decompresion

Yeah, this latest discussion is due to the reveal of Nvidia RTX IO which will use DirectStorage, so the discussion moved past one bottleneck and onto the next.
 
do we even know how direct storage even works? security models ? file systems ? access methods etc?
Seems a bit premature to make such grand superiority statements when Driver/OS/security play can have such a large impact on performance
 
do we even know how direct storage even works? security models ? file systems ? access methods etc?
Seems a bit premature to make such grand superiority statements when Driver/OS/security play can have such a large impact on performance

We have been there pre-NV showcase. I think MS and NV/AMD can work something out. MS and nvidia worked together already for the Windows implementations. I guess their journey will continue from here. Hardware providers also have a stake in this.
 
do we even know how direct storage even works? security models ? file systems ? access methods etc?
Seems a bit premature to make such grand superiority statements when Driver/OS/security play can have such a large impact on performance

While most details of DirectStorage is not public yet, it’s not hard to imagine how some of its features might be done. For example, it’s likely to be for reading only. Therefore it’s much easier to do security and file systems. Just lock a file and fixed all the sectors of the file during the time, then DirectStorage only has to know which sectors to read from. Security and file system details can be handled at the time of locking. The mechanisms are already in the OS.
 
No. Nvidia did not specify the algorithm or the maximum bandwidth, but if they're using LZ-family block compression, CUDA-based libraries are typically reaching a few GByte/s according to academic papers, so 28 GByte/s would be too high even when you consume the entire GPU, and not just 'a fraction of GPU'.

They didn't state maximum bandwidth but they did state a lower bound. 14GB/s output is explicitly stated as something they can exceed and which consumes a very small percentage of GPU resources.

The entire idea of DirectStorage is to free the CPU from loading and decompression tasks by streaming the data directly to video memory as fast as possible and using dedicated hardware chip (on the Xbox) or compute units (on the PC) - which means their block compression algorithm has to be designed for simplicity and low decompression overhead, not for best possible compression efficiency or processing bandwidth. Not sure why it is so hard to understand.

Compression efficiency is 2:1 if that's what you're referring to? Interestingly matching BCPACK (which is not part of the LZ-family) which I'm not sure is coincidental. In terms of processing bandwidth/requirements. I'd say that would be critical considering whatever the decompression costs is, is subtracted from your rendering capabilities. So efficiency in terms of processing requirements would have to be a high priority.

do we even know how direct storage even works? security models ? file systems ? access methods etc?
Seems a bit premature to make such grand superiority statements when Driver/OS/security play can have such a large impact on performance

Nvidia have been pretty explicit about the performance levels. At this stage there's no more evidence or reason to suggest they're lying than there is to suggest the same of Sony or Microsoft. So until we have some evidence to the contrary I don't see any reason not to take them all at their word.
 
No word on Silicon Motion SM2264 and SM2267 products though ADATA showed a few prototypes at CES-2020

ADATA announced two PCIe 4.0 SSDs based on Silicon Motion controllers:

  • XPG Gammix S50 Lite - SM2267, 3.9/3.2 Gbyte/s sequential read/write, 490/540K IOPS (codename Pearl)
https://www.techpowerup.com/272182/...s50-lite-pcie-gen4-m-2-2280-solid-state-drive

  • XPG Gammix S70 - SM2264, 7.4/6.4 GByte/s sequential read/write, ~1M IOPS, (codename Indigo)
https://www.techpowerup.com/272381/...mmix-s70-pcie-gen4-m-2-2280-solid-state-drive

Both support NVMe 1.4 and come in capacities of 1 TB or 2 TB.


The updated ASUS Hyper M.2 x16 Gen4 card
https://www.asus.com/us/Motherboard-Accessories/HYPER-M-2-X16-GEN-4-CARD/

Using s 4x SDD RAID with a x16 GPU is only possible on HEDT/Server platforms.
https://www.asus.com/us/support/FAQ/1037507
https://blog.donbowman.ca/2017/10/06/pci-e-bifurcation-explained/
 
Last edited:
interestingly matching BCPACK (which is not art of the LZ-family)
XBTC/BCPack in the Xbox Series X SDK tools, just like the Oodle RDO and BC7Prep in PlayStation 5, is not a general-purpose lossless data compression algorithm - it's a lossy texture compression format similar to the S3TC/DXTn/BCn texture compression which is strictly limited to specific resource formats, unlike the Oodle Kraken algorithm licensed for PlayStation 5 SSD controller which is a LZMA-family decoder that runs solely on the CPU threads and only goes up to about 1 Gbyte/s.

14GB/s output is explicitly stated as something they can exceed and which consumes a very small percentage of GPU resources.
Compression efficiency is 2:1 if that's what you're referring to
These numbers come from the RTX IO slides above, but 14 Gbyte/s with 2:1 compression ratio using only 32 compute cores (<1% of the total) just doesn't add up for a lossless data compression algorithm working on binary (non-text) data, like textures, normal maps, and geometry/meshes.

If they're referring to lossy texture compression algorithms like the XBTC/BCPack, that's another story because it's likely just a low-complexity conversion to the legacy S3TC/DXTn/BCn texture formats which were designed for 2000-era fixed-function hardware.

There is wild speculation that XBTC/BCPack would achieve 2:1 compression ratio (~50% reduction) over existing assets processed with RDO/BC7Prep (which can achieve a 10-20% reduction from DXTn/BCn formats).


In terms of processing bandwidth/requirements. I'd say that would be critical considering whatever the decompression costs is, is subtracted from your rendering capabilities. So efficiency in terms of processing requirements would have to be a high priority.
High coding efficiency (compression rate) requires high complexity; vice versa, low complexity will result in low compression rate. It's that simple.
 
Last edited:
XBTC / BCPack, just like BC7Prep, is not a general-purpose lossless data compression algorithm - it's a lossy texture compression format similar to the S3TC (BC/BCH),

But BC7Prep is lossless. And I've not seen any confirmation as to whether BCPACK is lossy or not. Microsoft have been very tight lipped on specifics unless you've spotted something I've missed on this?

so that's strictly limited to specific resource formats - unlike the Oodle Kraken algorithm licensed for PlayStation 5, which is a LZMA-family decoder that runs solely on the CPU and only goes up to several megabytes per second.

I think it works on all BCn formats which make up the vast majority of a modern games texture set as far as I'm aware. Non-texture data (around 20% of total) is handled by zlib in the XSX decompression unit I believe.

These numbers come from the botched RTX IO slides, but 14 Gbyte/s with 2:1 compression ratio using only 32 compute cores (<1% of the total) just doesn't add up for a lossless data compresion algorithm working on binary (non-text) data, like textures, normal maps, and geometry/meshes.

Nvidia has been explicit about the 2:1 compression ratio which would result in 14GB/s on the fastest PCIe 4.0 drives (even more on the one you linked above). And Jenson has also stated they can go beyond the limits of PCIe 4.0 so I don't think there's any reason to doubt that at this stage.

I'm not sure where the 32 compute cores come from?
 
But BC7Prep is lossless
It's a lossless transform for BC7 compressed textures, and BCn algorithms are lossy. Have you read the links for Ooodle Texture tools posted above by @BRiT?

https://cbloomrants.blogspot.com/2020/06/oodle-texture-slashes-game-sizes.html
https://cbloomrants.blogspot.com/2020/06/oodle-texture-bc7prep-data-flow.html
http://www.radgametools.com/oodletexture.htm


I've not seen any confirmation as to whether BCPACK is lossy or not
BCPack is a block texture compression used by Xbox Texture Compressor (XBTC) tool - what's the point of making the compression algorithm lossless when it needs to produce textures in lossy DXTn/BCn compression formats for the hardware TMUs to consume?


I think it works on all BCn formats which make up the vast majority of a modern games texture set as far as I'm aware. Non-texture data (around 20% of total) is handled by zlib in the XSX decompression unit I believe.
I'd guess BCn textures would be further processed by LZ compression. For example in Oodle tools, the original RGB texture resources are first encoded by the lossy BCn compression tools, which use RDO (rate-distortion optimisation) metrics to trade off some image quality for improved compression ratios. Then the entire game assets are further compressed by LZ-family lossless data compression (Oddle Kraken/Mermaid/Selkie), and specifically BC7-compressed resources can be additionally rearranged by BC7Prep to improve LZ-family compression ratios.


Nvidia has been explicit about the 2:1 compression ratio which would result in 14GB/s on the fastest PCIe 4.0 drives
Again, without giving any details about the compression algorithm(s) used. So far only the names of BCPack block texture compression algorithm and Xbox Texture Compressor tool were disclosed, but it's not known how DirectStorage handles LZ format decoding on the PC - whereas we know that Sony licensed the entire Oodle Texture Compression (RDO/BC7Prep) and Oodle Data Compression (Kraken) toolset for the PlayStation 5 developer kit, and included a hardware Kraken decompressor chip in the disk I/O data path.

where the 32 compute cores come from?
Just an approximation of <1% GPU compute resources.
 
Last edited:

Yes, so as I said, the BC7Prep part is lossless. Once the transform is reversed on the GPU, the end result is a BC7 texture identical to that which would have existed if no BC7Prep transform was performed at all. The fact that it's working to further compress an already lossy compressed texture isn't relevant as the same logic would apply to Kraken, i.e. Kraken would also further losslessly compress a lossy compressed BC7 texture.


Thanks yes, it was the topic of quite a detailed discussion in another thread some time ago.

BCPack is a block texture compression used by Xbox Texture Compressor (XBTC) tool - what's the point of making the compression algorithm lossless when it needs to produce textures in lossy DXTn/BCn compression formats for the hardware TMUs to consume?

Because just like Kraken, BCPACK is further compressing already lossy compressed BCn textures to reduce their size below that which BCn already achieves. It's not used instead of BCn, it's used on top of it. Both BCPACK and Kraken take a lossy compressed texture as a starting point (or any other file in Krakens case), compress that down further (Kraken losslessly) and then decompress back to the original lossy compressed texture for the GPU to consume. The two key differences that have been confirmed so far are that BCPACK only works on BCn textures whereas Kraken works on all files types, while BCPACK gets a much higher compression rate.

We don't know whether BCPACK is lossless like Kraken or not yet, but since it's doing exactly the same job (albeit focused on a particular file format to achieve a better compression rate than a more generalised algorithm), I'm not seeing a good reason to make the assumption at this stage that it's lossy.

I'd guess BCn textures would be further processed by LZ compression. For example in Oodle tools, the original RGB texture resources are first encoded by the lossy BCn compression tools, which use RDO (rate-distortion optimisation) metrics to trade off some image quality for improved compression ratios. Then the entire game assets are further compressed by LZ-family lossless data compression (Oddle Kraken/Mermain/Selkie), and specifically BC7-compressed resources can be additionally rearranged by BC7Prep to improve LZ-family compression ratios.

Yes, this is what I've described above. And BCPACK works in exactly the same way, except we don't know whether it's lossless or not, but my money is on that it is. We also don't know that it can work in conjunction with BC7Prep, but I'd guess not.

Again, without giving any details about the compression algorithm(s) used.

The 2:1 compression ratio cited gives a hint that it may also use BCPACK but that's far from a given. It'll certainly be interesting to learn what's going on here when the information is made public.

So far only the names of BCPack block texture compression algorithm and Xbox Texture Compressor tool were disclosed, but it's not known how DirectStorage handles LZ format decoding on the PC - whereas we know that Sony licensed the entire Oodle Texture Compression (RDO/BC7Prep) and Oodle Data Compression (Kraken) toolset for the PlayStation 5 developer kit, and included a hardware Kraken decompressor chip in the disk I/O data path.

Yes this will be interesting to find out. i.e. does RTXIO support general compression (LZ family) routines on the GPU for decompression of all data types or is it purely for texture data only (perhaps using BCPACK)? And if RTXIO does support general compression routines, does it decompress everything streamed from the SSD and pass the relevant data sets back to system memory for the CPU? Or does everything go via the CPU first where the GPU data is separated out and sent on before decompression?
 
Samsung finally released their pcie gen4 nvme ssd. 7GB read, 5GB write(2GB/s write on tlc), 229$+tax for 1TB. Not too bad. I know this is wrong thread but by the time I will run out of disk space on ps5 the ssd upgrade will not be too bad assuming this samsung drive or something cheaper works.

This drive in 2TB size would be decent enough for my next pc build. I still have dreams of optane as boot/apps drive, but maybe that is not anything but fun with specs.

https://www.anandtech.com/show/16087/the-samsung-980-pro-pcie-4-ssd-review
 
Yes, so as I said, the BC7Prep part is lossless.
My point was that BC7Prep performance at 100+ Gbyte/s is not relevant for assesment of RTX IO, because BC7Prep is not a compression algorithm at all.

The fact that it's working to further compress an already lossy compressed texture isn't relevant as ... Kraken would also further losslessly compress a lossy compressed BC7 texture.
But these three technologies (RDO-aware BCn lossy texture compression, BC7Prep transform, and LZ-family lossless data compression) work in accord with each other - RDO texture compression is aware of both image quality and compression ratio metrics, so it can choose a color encoding that makes the resulting BCn texture more compressible by the LZ family algorithms, while BC7Prep rearranges BC7 data format to additionally increase the compression ratio of the LZ pass.


just like Kraken, BCPACK is further compressing already lossy compressed BCn textures to reduce their size
BCPACK only works on BCn textures whereas Kraken works on all files types, while BCPACK gets a much higher compression rate.
It's not used instead of BCn, it's used on top of it.

They said it's a block texture compression algorithm, as in BCn block compression - not data compression algorithm like LZ/Deflate.

What if BCPack is actually a combination of lossy and lossless stages? The first stage is a more efficient lossy texture compression which works to improve compression efficiency of LZ-based algorithms, and the second stage is a general data compression with a LZ/DEFLATE stream format that requires low decoding complexity, but encoded with high computation complexity to extract additional efficiency. The lossless step would be handled by a decompressor in the I/O path, while the lossy step is handled by shaders and decodes into native DXTn/BCn formats directly in video memory.


It'll certainly be interesting to learn what's going on here when the information is made public.
does RTXIO support general compression (LZ family) routines on the GPU for decompression of all data types or is it purely for texture data only (perhaps using BCPACK)? And if RTXIO does support general compression routines, does it decompress everything streamed from the SSD and pass the relevant data sets back to system memory for the CPU? Or does everything go via the CPU first where the GPU data is separated out and sent on before decompression?
My point exactly. We know how LZ/DEFLATE compression works on the consoles - i.e. using hardware decompressor in the I/O controller - but PC/Windows implementation is still a mystery.

yes, it was the topic of quite a detailed discussion in another thread some time ago
Ah. Sorry, missed that discussion entirely.
 
Last edited:
As usual samsung msrp is way higher than retail price. 980 pro 1TB is 150$ at amazon versus the 229$ msrp price

https://www.amazon.com/SAMSUNG-980-...eywords=samsung+980+pro&qid=1600887706&sr=8-2

edit. Actually, it might be 512GB model price even though I selected 1TB model. Someone brave could try to get discount ssd. Newegg shows 229$ for 1TB model.

edit2. No dice, checkout shows it's 512GB model. well, it was cheap until reality hit in :/
 
Last edited:
Samsung finally released their pcie gen4 nvme ssd. 7GB read, 5GB write(2GB/s write on tlc), 229$+tax for 1TB. Not too bad. I know this is wrong thread but by the time I will run out of disk space on ps5 the ssd upgrade will not be too bad assuming this samsung drive or something cheaper works.

This drive in 2TB size would be decent enough for my next pc build. I still have dreams of optane as boot/apps drive, but maybe that is not anything but fun with specs.

https://www.anandtech.com/show/16087/the-samsung-980-pro-pcie-4-ssd-review

I wouldn't hold your breath on Samsung Pro drives getting significantly cheaper before their replacement comes out. Samsungs Pro drives tend to stay at a high premium throughout their life with little to no decrease in price.

That may possibly change this generation as they've moved to TLC versus MLC on their Pro drives and people aren't happy about it. But I doubt it will change much as their EVO drives were still popular using TLC and those didn't really drop much in price either.

Regards,
SB
 
I wouldn't hold your breath on Samsung Pro drives getting significantly cheaper before their replacement comes out. Samsungs Pro drives tend to stay at a high premium throughout their life with little to no decrease in price.

That may possibly change this generation as they've moved to TLC versus MLC on their Pro drives and people aren't happy about it. But I doubt it will change much as their EVO drives were still popular using TLC and those didn't really drop much in price either.

Regards,
SB

If my google mojo didn't turn out all sour it looks like 970pro 1TB msrp was 449$. Current selling price in newegg is 313$. If 980pro follows similar path it's going to get a bit cheaper but not necessarily cheap.
 
If my google mojo didn't turn out all sour it looks like 970pro 1TB msrp was 449$. Current selling price in newegg is 313$. If 980pro follows similar path it's going to get a bit cheaper but not necessarily cheap.

That's a bit more of a drop than I was thinking, but then the last time I looked was about 6 months ago.

Since Samsung switched to TLC (lower endurance lifespan) for the 980 Pro, then looking at the price of the 970 EVO over time might give you a better idea.

Also, in the US, the 970 PRO launched at 629.99 USD for the 1 TB while the 970 EVO launched at 449.99 USD.

https://www.anandtech.com/show/12674/samsung-announces-970-pro-and-970-evo-nvme-ssds

OK, so it looks like you were already inadvertently looking at the EVO price. :)

That's actually a decent drop from the 970 EVO launch price to the 980 PRO launch price. TLC to TLC makes the cost comparisons easier.

Regards,
SB
 
Just an update. 1 month ago I disabled my virtual memory (have 32gb) so far I have had zero problems
Just an update : I was installing some games and the setup.exe on some of them would just exit with no error, I did some troubleshooting clear temp folder, compatibility modes, run as admin, disable anti virus ect nothing worked then I remembered I disabled my swap file so i set it to system managed and everything worked fine.
Setup.exe (I believe it's Installshield from Macrovision)
Bl5UDLr.jpg


Edit : Thinking about it I was pretty sure ive installed install shield games before with no problems
so I tried installing some other games with swap file disabled and they worked fine. Then I tried the problem game Fear 3 and it ctd'd with a swap file enabled so i dont think no swapfile was the culprit
 
Last edited:
Back
Top