Blazing Fast NVMEs and Direct Storage API for PCs *spawn*

Discussion in 'PC Hardware, Software and Displays' started by DavidGraham, May 18, 2020.

  1. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,118
    Likes Received:
    3,092
    I think you know ;)
     
    pharma likes this.
  2. eastmen

    Legend Subscriber

    Joined:
    Mar 17, 2008
    Messages:
    13,878
    Likes Received:
    4,724
    https://devblogs.microsoft.com/directx/directstorage-is-coming-to-pc/

    really interesting
     
  3. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,118
    Likes Received:
    3,092
    Well, the SSD tech for pc's seems darn impressive, however we put it.
     
  4. eastmen

    Legend Subscriber

    Joined:
    Mar 17, 2008
    Messages:
    13,878
    Likes Received:
    4,724
    It seems like you need an nvme drive. Not sure if it has to be pci-e 4 and higher or if the older ones will do it.
     
  5. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    Other than DirectStorage support, isn't this something AMD's HBCC offered years ago, GPU direct access to storage medium(s)?
     
    BRiT likes this.
  6. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,118
    Likes Received:
    3,092
    Well, i don't think anyone would expect anything different :) I think you need a rather modern pc ofcourse.

    Perhaps, refined and designed for the gamer/consumer market.
     
  7. I think you only focused on the second part of my post.
    If the Turing support for RTX IO is the same as Ampere's, then there's no dedicated hardware for data decompression on Ampere and both architectures are using just the shader ALUs.
    If there is no dedicated hardware for data decompression then we shouldn't expect performance similar to the new consoles. If GPU compute shaders were great for data decompression then microsoft or sony wouldn't bother themselves with dedicated units, as compute shaders serve additional functionality over fixed function hardware.


    The alternative to this is Ampere has dedicated decompression hardware that Turing doesn't have, in which case we should expect very different IO performance between these two architectures (first half of my previous post).

    Another alternative to this is Turing having secret sauce data decompression hardware hidden from us all this time, which I think it's very unlikely.


    This is not related to you, but I'm not sure why my post made some people so defensive though. I thought it was "common sense" that the PC wouldn't have anything similar to the consoles on IO performance for a long time, short of the new GPUs getting fixed function hardware in the GPU SoC and a M.2 slot in the graphics card.
     
  8. chris1515

    Legend

    Joined:
    Jul 24, 2005
    Messages:
    7,157
    Likes Received:
    7,965
    Location:
    Barcelona Spain

    https://on-demand.gputechconf.com/gtc/2016/posters/GTC_2016_Algorithms_AL_11_P6128_WEB.pdf

    GPU are good and better than CPU for decompression but the cost is not negligible.
     
  9. Vega86

    Newcomer

    Joined:
    Sep 25, 2018
    Messages:
    191
    Likes Received:
    131
    For PS5 it was said their decompression requires 9 of the PS5 Zen 2 cores: 5 GB/S
    Xbox Series X said it was 5: 2.4 GB/S
    What kind of beefy CPUs are in that chart where it only needs 2 cores to handle 7 GB/S?
     
  10. chris1515

    Legend

    Joined:
    Jul 24, 2005
    Messages:
    7,157
    Likes Received:
    7,965
    Location:
    Barcelona Spain
    It is not compressed data. It is raw data.
     
  11. Dictator

    Regular

    Joined:
    Feb 11, 2011
    Messages:
    682
    Likes Received:
    3,969
    I do not understand why you write that it not being dedicated hardware somehow makes it "worse". A different Design achieving the same goal of acceleration decompression is not inherenrly worse due to its differentness. If I were to take a guess MS and Sony went the Route of hardware decompression blocks for the conglomerative cost (thermal, monetary, yield, etc.) of utilising more generalised Transistors on the GPU for the same purpose. If MS or Sony did it on the GPU like NV, it would mean less resources for graphics or conversely more die space or more heat and powe usage directly on the SoC. Using a dedicated GPU with in a System with swappable parts for such a purpose does not seem inferior, just a smart way to enforce a standard via a swappable Part. Basically, it makes a lot of sense on the PC to do it differently and not with a hw Block on a motherboard. Think of the difference of using an ASIC, an FPGA or a generalised CPU for the same task - one may make a lot more sense under a certain Budget, manufakturing limitation, or based upon the actual computation necessary and the overhead.
    I see no reason why it needs to be deemed inferior for achieving the same goal in a different manner.
     
    egoless, pharma, iroboto and 4 others like this.
  12. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,236
    Likes Received:
    4,259
    Location:
    Guess...
    So it looks like they've brought GPUDirect Storage to the desktop. Awesome.

    I found this quote particularly interesting from Microsoft on Nvidia's web page:

    The emphasis is mine and it implies that it'll be a standard feature of Direct Storage games to use advanced compression. It would make sense if that compression scheme were the same as used in the XSX, i.e. BC-PACK. A possible further hint towards that is the compression ratio advertised for the XSX and that used by Nvidia in their slide are the same at 2:1.

    I'm not sure why. 22GB/s is the theoretical limit of the PS5 decompressor, not what you're going to achieve with normal game code and Kraken. Sony advertised 8-9GB/s from a 5.5GB/s raw throughput for a reason - because it's in line with typical Kraken compression ratio's.

    Microsoft are advertising more than that for BC-Pack at 2:1 and that's the same compression ratio used by Nvidia in their example too. So starting with a higher raw throughput (7GB/s vs 5.5GB/s) and adding a higher compression ratio results in the 14GB/s being advertised by Nvidia being reasonably comparable to the 8-9GB/s advertised by Sony. Note in their presentation Nvidia also mentioned the GPU decompression could run faster than the limits of a 7GB/s SSD. That's analogous to Sony's mention of the hardware decompression block being capable of 22GB/s peak.

    Nvidia clearly showed in their slide a 14GB/s decompression rate (2:1 compression ratio on the fastest PCIe 4.0 NVMe drives), and they go on to state that the GPU's are capable of more. It seems pretty open and shut to me and I'm not seeing the basis for assuming a hardware decompression block is going to be more performant than a GPU packing tens of TFLOPs. The console manufacturers could easily have included them simply because they were relatively cheap compared to including a slightly bigger more powerful GPU capable of doing the same job in shaders.

    EDIT: damn... @Dictator beat me by 60 seconds and said it better too.
     
  13. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,118
    Likes Received:
    3,092
    Thanks Alex, also covers the XSX in some ways.
     
    pharma likes this.
  14. Vega86

    Newcomer

    Joined:
    Sep 25, 2018
    Messages:
    191
    Likes Received:
    131
    Any ideas what those cores are in the nvidia slide?
    2 cores handling 7 GB raw
    PS5 9 of its Zen 2 cores for 5.5 GB raw
    XSX 5 of its Zen 2 cores for 2.4 GB raw.
     
  15. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,833
    Likes Received:
    18,633
    Location:
    The North
    DavidGraham, xpea, pharma and 3 others like this.
  16. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,118
    Likes Received:
    3,092
    It's probably the more flexible and efficient path too. Edit: with that i mean the NV/GPU solution, ofcourse.
     
    #237 PSman1700, Sep 1, 2020
    Last edited: Sep 1, 2020
    pharma, iroboto and Dictator like this.
  17. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,236
    Likes Received:
    4,259
    Location:
    Guess...
    @chris1515 already answered that above. They're talking about different workloads. The 2 cores requirement in NV's slide is purely to handle the IO, no decompression involved. The PS5/XSX numbers include decompression. For comparable numbers look to the 14 cores NV mentioned being required to handle both the IO and decompression at 7GB/s
     
  18. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,118
    Likes Received:
    3,092
    Then also, what kind of CPU cores? Zen2, zen3, jaguar, Intel i9?
     

  19. GPUs excel at highly parallel tasks (many parallel ALUs with generally lower clock and lower IPC than CPUs).
    From what we saw with the new consoles, data decompression can't very parallel. Zlib is apparently single threaded, and Kraken is 2 threads max.
    You could think that simply splitting data in chunks could work, though if you look at how compression algorithms work then the more you split the lower the compression rate will be, which in the end hurts the effective IO thoughput.



    In my opinion, in general I'd say any method that uses general purpose hardware instead of fixed function is the superior one, as general purpose can be used for other tasks.
    I just don't see how data decompression can use highly parallel general purpose hardware. Not with the most commonly used compression algorithms, at least.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...