Blazing Fast NVMEs and Direct Storage API for PCs *spawn*

Discussion in 'PC Hardware, Software and Displays' started by DavidGraham, May 18, 2020.

  1. PSman1700

    Veteran Newcomer

    Joined:
    Mar 22, 2019
    Messages:
    4,692
    Likes Received:
    2,131
    Yes that went fast with SSD tech, when consoles got it announced, then PC came with something superior quickly after.
     
  2. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    16,826
    Likes Received:
    4,129
    Just wanted to point out something. The thread started out about Shader Compilation now it's moved on to decompresion
     
    PSman1700 likes this.
  3. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    18,938
    Likes Received:
    21,387
    Yeah, this latest discussion is due to the reveal of Nvidia RTX IO which will use DirectStorage, so the discussion moved past one bottleneck and onto the next.
     
    PSman1700 likes this.
  4. itsmydamnation

    Veteran Regular

    Joined:
    Apr 29, 2007
    Messages:
    1,330
    Likes Received:
    444
    Location:
    Australia
    do we even know how direct storage even works? security models ? file systems ? access methods etc?
    Seems a bit premature to make such grand superiority statements when Driver/OS/security play can have such a large impact on performance
     
  5. PSman1700

    Veteran Newcomer

    Joined:
    Mar 22, 2019
    Messages:
    4,692
    Likes Received:
    2,131
    We have been there pre-NV showcase. I think MS and NV/AMD can work something out. MS and nvidia worked together already for the Windows implementations. I guess their journey will continue from here. Hardware providers also have a stake in this.
     
  6. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    2,914
    Likes Received:
    419
    Location:
    Taiwan
    While most details of DirectStorage is not public yet, it’s not hard to imagine how some of its features might be done. For example, it’s likely to be for reading only. Therefore it’s much easier to do security and file systems. Just lock a file and fixed all the sectors of the file during the time, then DirectStorage only has to know which sectors to read from. Security and file system details can be handled at the time of locking. The mechanisms are already in the OS.
     
  7. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    8,605
    Likes Received:
    2,993
    Location:
    Guess...
    They didn't state maximum bandwidth but they did state a lower bound. 14GB/s output is explicitly stated as something they can exceed and which consumes a very small percentage of GPU resources.

    Compression efficiency is 2:1 if that's what you're referring to? Interestingly matching BCPACK (which is not part of the LZ-family) which I'm not sure is coincidental. In terms of processing bandwidth/requirements. I'd say that would be critical considering whatever the decompression costs is, is subtracted from your rendering capabilities. So efficiency in terms of processing requirements would have to be a high priority.

    Nvidia have been pretty explicit about the performance levels. At this stage there's no more evidence or reason to suggest they're lying than there is to suggest the same of Sony or Microsoft. So until we have some evidence to the contrary I don't see any reason not to take them all at their word.
     
    PSman1700 likes this.
  8. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    904
    Likes Received:
    1,081
    Location:
    55°38′33″ N, 37°28′37″ E
    ADATA announced two PCIe 4.0 SSDs based on Silicon Motion controllers:

    • XPG Gammix S50 Lite - SM2267, 3.9/3.2 Gbyte/s sequential read/write, 490/540K IOPS (codename Pearl)
    https://www.techpowerup.com/272182/...s50-lite-pcie-gen4-m-2-2280-solid-state-drive

    • XPG Gammix S70 - SM2264, 7.4/6.4 GByte/s sequential read/write, ~1M IOPS, (codename Indigo)
    https://www.techpowerup.com/272381/...mmix-s70-pcie-gen4-m-2-2280-solid-state-drive

    Both support NVMe 1.4 and come in capacities of 1 TB or 2 TB.


    The updated ASUS Hyper M.2 x16 Gen4 card
    https://www.asus.com/us/Motherboard-Accessories/HYPER-M-2-X16-GEN-4-CARD/

    Using s 4x SDD RAID with a x16 GPU is only possible on HEDT/Server platforms.
    https://www.asus.com/us/support/FAQ/1037507
    https://blog.donbowman.ca/2017/10/06/pci-e-bifurcation-explained/
     
    #408 DmitryKo, Sep 21, 2020
    Last edited: Oct 7, 2020
    Jawed, PSman1700 and BRiT like this.
  9. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    904
    Likes Received:
    1,081
    Location:
    55°38′33″ N, 37°28′37″ E
    XBTC/BCPack in the Xbox Series X SDK tools, just like the Oodle RDO and BC7Prep in PlayStation 5, is not a general-purpose lossless data compression algorithm - it's a lossy texture compression format similar to the S3TC/DXTn/BCn texture compression which is strictly limited to specific resource formats, unlike the Oodle Kraken algorithm licensed for PlayStation 5 SSD controller which is a LZMA-family decoder that runs solely on the CPU threads and only goes up to about 1 Gbyte/s.

    These numbers come from the RTX IO slides above, but 14 Gbyte/s with 2:1 compression ratio using only 32 compute cores (<1% of the total) just doesn't add up for a lossless data compression algorithm working on binary (non-text) data, like textures, normal maps, and geometry/meshes.

    If they're referring to lossy texture compression algorithms like the XBTC/BCPack, that's another story because it's likely just a low-complexity conversion to the legacy S3TC/DXTn/BCn texture formats which were designed for 2000-era fixed-function hardware.

    There is wild speculation that XBTC/BCPack would achieve 2:1 compression ratio (~50% reduction) over existing assets processed with RDO/BC7Prep (which can achieve a 10-20% reduction from DXTn/BCn formats).


    High coding efficiency (compression rate) requires high complexity; vice versa, low complexity will result in low compression rate. It's that simple.
     
    #409 DmitryKo, Sep 21, 2020
    Last edited: Sep 21, 2020
    Jawed and BRiT like this.
  10. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    8,605
    Likes Received:
    2,993
    Location:
    Guess...
    But BC7Prep is lossless. And I've not seen any confirmation as to whether BCPACK is lossy or not. Microsoft have been very tight lipped on specifics unless you've spotted something I've missed on this?

    I think it works on all BCn formats which make up the vast majority of a modern games texture set as far as I'm aware. Non-texture data (around 20% of total) is handled by zlib in the XSX decompression unit I believe.

    Nvidia has been explicit about the 2:1 compression ratio which would result in 14GB/s on the fastest PCIe 4.0 drives (even more on the one you linked above). And Jenson has also stated they can go beyond the limits of PCIe 4.0 so I don't think there's any reason to doubt that at this stage.

    I'm not sure where the 32 compute cores come from?
     
    Remij and PSman1700 like this.
  11. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    904
    Likes Received:
    1,081
    Location:
    55°38′33″ N, 37°28′37″ E
    It's a lossless transform for BC7 compressed textures, and BCn algorithms are lossy. Have you read the links for Ooodle Texture tools posted above by @BRiT?

    https://cbloomrants.blogspot.com/2020/06/oodle-texture-slashes-game-sizes.html
    https://cbloomrants.blogspot.com/2020/06/oodle-texture-bc7prep-data-flow.html
    http://www.radgametools.com/oodletexture.htm


    BCPack is a block texture compression used by Xbox Texture Compressor (XBTC) tool - what's the point of making the compression algorithm lossless when it needs to produce textures in lossy DXTn/BCn compression formats for the hardware TMUs to consume?


    I'd guess BCn textures would be further processed by LZ compression. For example in Oodle tools, the original RGB texture resources are first encoded by the lossy BCn compression tools, which use RDO (rate-distortion optimisation) metrics to trade off some image quality for improved compression ratios. Then the entire game assets are further compressed by LZ-family lossless data compression (Oddle Kraken/Mermaid/Selkie), and specifically BC7-compressed resources can be additionally rearranged by BC7Prep to improve LZ-family compression ratios.


    Again, without giving any details about the compression algorithm(s) used. So far only the names of BCPack block texture compression algorithm and Xbox Texture Compressor tool were disclosed, but it's not known how DirectStorage handles LZ format decoding on the PC - whereas we know that Sony licensed the entire Oodle Texture Compression (RDO/BC7Prep) and Oodle Data Compression (Kraken) toolset for the PlayStation 5 developer kit, and included a hardware Kraken decompressor chip in the disk I/O data path.

    Just an approximation of <1% GPU compute resources.
     
    #411 DmitryKo, Sep 21, 2020
    Last edited: Dec 12, 2020
    chris1515 and BRiT like this.
  12. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    8,605
    Likes Received:
    2,993
    Location:
    Guess...
    Yes, so as I said, the BC7Prep part is lossless. Once the transform is reversed on the GPU, the end result is a BC7 texture identical to that which would have existed if no BC7Prep transform was performed at all. The fact that it's working to further compress an already lossy compressed texture isn't relevant as the same logic would apply to Kraken, i.e. Kraken would also further losslessly compress a lossy compressed BC7 texture.

    Thanks yes, it was the topic of quite a detailed discussion in another thread some time ago.

    Because just like Kraken, BCPACK is further compressing already lossy compressed BCn textures to reduce their size below that which BCn already achieves. It's not used instead of BCn, it's used on top of it. Both BCPACK and Kraken take a lossy compressed texture as a starting point (or any other file in Krakens case), compress that down further (Kraken losslessly) and then decompress back to the original lossy compressed texture for the GPU to consume. The two key differences that have been confirmed so far are that BCPACK only works on BCn textures whereas Kraken works on all files types, while BCPACK gets a much higher compression rate.

    We don't know whether BCPACK is lossless like Kraken or not yet, but since it's doing exactly the same job (albeit focused on a particular file format to achieve a better compression rate than a more generalised algorithm), I'm not seeing a good reason to make the assumption at this stage that it's lossy.

    Yes, this is what I've described above. And BCPACK works in exactly the same way, except we don't know whether it's lossless or not, but my money is on that it is. We also don't know that it can work in conjunction with BC7Prep, but I'd guess not.

    The 2:1 compression ratio cited gives a hint that it may also use BCPACK but that's far from a given. It'll certainly be interesting to learn what's going on here when the information is made public.

    Yes this will be interesting to find out. i.e. does RTXIO support general compression (LZ family) routines on the GPU for decompression of all data types or is it purely for texture data only (perhaps using BCPACK)? And if RTXIO does support general compression routines, does it decompress everything streamed from the SSD and pass the relevant data sets back to system memory for the CPU? Or does everything go via the CPU first where the GPU data is separated out and sent on before decompression?
     
    PSman1700 and BRiT like this.
  13. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    2,858
    Likes Received:
    2,061
    Location:
    Earth
    Samsung finally released their pcie gen4 nvme ssd. 7GB read, 5GB write(2GB/s write on tlc), 229$+tax for 1TB. Not too bad. I know this is wrong thread but by the time I will run out of disk space on ps5 the ssd upgrade will not be too bad assuming this samsung drive or something cheaper works.

    This drive in 2TB size would be decent enough for my next pc build. I still have dreams of optane as boot/apps drive, but maybe that is not anything but fun with specs.

    https://www.anandtech.com/show/16087/the-samsung-980-pro-pcie-4-ssd-review
     
  14. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    904
    Likes Received:
    1,081
    Location:
    55°38′33″ N, 37°28′37″ E
    My point was that BC7Prep performance at 100+ Gbyte/s is not relevant for assesment of RTX IO, because BC7Prep is not a compression algorithm at all.

    But these three technologies (RDO-aware BCn lossy texture compression, BC7Prep transform, and LZ-family lossless data compression) work in accord with each other - RDO texture compression is aware of both image quality and compression ratio metrics, so it can choose a color encoding that makes the resulting BCn texture more compressible by the LZ family algorithms, while BC7Prep rearranges BC7 data format to additionally increase the compression ratio of the LZ pass.


    They said it's a block texture compression algorithm, as in BCn block compression - not data compression algorithm like LZ/Deflate.

    What if BCPack is actually a combination of lossy and lossless stages? The first stage is a more efficient lossy texture compression which works to improve compression efficiency of LZ-based algorithms, and the second stage is a general data compression with a LZ/DEFLATE stream format that requires low decoding complexity, but encoded with high computation complexity to extract additional efficiency. The lossless step would be handled by a decompressor in the I/O path, while the lossy step is handled by shaders and decodes into native DXTn/BCn formats directly in video memory.


    My point exactly. We know how LZ/DEFLATE compression works on the consoles - i.e. using hardware decompressor in the I/O controller - but PC/Windows implementation is still a mystery.

    Ah. Sorry, missed that discussion entirely.
     
    #414 DmitryKo, Sep 22, 2020
    Last edited: Sep 23, 2020
  15. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    16,826
    Likes Received:
    4,129
    #415 Davros, Sep 23, 2020
    Last edited: Sep 23, 2020
  16. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    2,858
    Likes Received:
    2,061
    Location:
    Earth
    As usual samsung msrp is way higher than retail price. 980 pro 1TB is 150$ at amazon versus the 229$ msrp price

    https://www.amazon.com/SAMSUNG-980-...eywords=samsung+980+pro&qid=1600887706&sr=8-2

    edit. Actually, it might be 512GB model price even though I selected 1TB model. Someone brave could try to get discount ssd. Newegg shows 229$ for 1TB model.

    edit2. No dice, checkout shows it's 512GB model. well, it was cheap until reality hit in :/
     
    #416 manux, Sep 23, 2020
    Last edited: Sep 23, 2020
    BRiT likes this.
  17. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    17,844
    Likes Received:
    7,916
    I wouldn't hold your breath on Samsung Pro drives getting significantly cheaper before their replacement comes out. Samsungs Pro drives tend to stay at a high premium throughout their life with little to no decrease in price.

    That may possibly change this generation as they've moved to TLC versus MLC on their Pro drives and people aren't happy about it. But I doubt it will change much as their EVO drives were still popular using TLC and those didn't really drop much in price either.

    Regards,
    SB
     
  18. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    2,858
    Likes Received:
    2,061
    Location:
    Earth
    If my google mojo didn't turn out all sour it looks like 970pro 1TB msrp was 449$. Current selling price in newegg is 313$. If 980pro follows similar path it's going to get a bit cheaper but not necessarily cheap.
     
  19. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    17,844
    Likes Received:
    7,916
    That's a bit more of a drop than I was thinking, but then the last time I looked was about 6 months ago.

    Since Samsung switched to TLC (lower endurance lifespan) for the 980 Pro, then looking at the price of the 970 EVO over time might give you a better idea.

    Also, in the US, the 970 PRO launched at 629.99 USD for the 1 TB while the 970 EVO launched at 449.99 USD.

    https://www.anandtech.com/show/12674/samsung-announces-970-pro-and-970-evo-nvme-ssds

    OK, so it looks like you were already inadvertently looking at the EVO price. :)

    That's actually a decent drop from the 970 EVO launch price to the 980 PRO launch price. TLC to TLC makes the cost comparisons easier.

    Regards,
    SB
     
    manux and BRiT like this.
  20. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    16,826
    Likes Received:
    4,129
    Just an update : I was installing some games and the setup.exe on some of them would just exit with no error, I did some troubleshooting clear temp folder, compatibility modes, run as admin, disable anti virus ect nothing worked then I remembered I disabled my swap file so i set it to system managed and everything worked fine.
    Setup.exe (I believe it's Installshield from Macrovision)
    [​IMG]

    Edit : Thinking about it I was pretty sure ive installed install shield games before with no problems
    so I tried installing some other games with swap file disabled and they worked fine. Then I tried the problem game Fear 3 and it ctd'd with a swap file enabled so i dont think no swapfile was the culprit
     
    #420 Davros, Oct 27, 2020
    Last edited: Oct 28, 2020
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...