Blazing Fast NVMEs and Direct Storage API for PCs *spawn*

Discussion in 'PC Hardware, Software and Displays' started by DavidGraham, May 18, 2020.

  1. DSoup

    DSoup Series Soup
    Legend Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    16,780
    Likes Received:
    12,697
    Location:
    London, UK
    I think this is the tricky bit. What is a game-centric filesystem? Sony on PlayStation, and Microsoft on Xbox, heavily steer the game creation process with their devkits and SDKs. Games are developed exactly for the target platform which includes how assets are stored. You need to put the cart before the horse (games). But then what happens when you have a game installed on a system where the user doesn't want, or can't, use a game-centric filesystem? Does the game now worse on NTFS?

    I think so too. I think Microsoft will want a filesystem that provides benefits all all Windows applications, not just games. But it would be quicker to deploy and even serve as easy opt-in for willing beta testers - without having to hose your whole drive.

    These exist as constructs within the current filesystem, I think you'd want to have a space that exists wholly outside the existing software stack, otherwise you have the overhead of the original system plus the new one.
     
    function and BRiT like this.
  2. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,104
    Likes Received:
    16,896
    Location:
    Under my bridge
    Maybe make it a requirement. Introduce a next-gen SSD only filesystem that can work externally from normal Windows. If that's what it takes to move the PC forwards, it needs to be done, because we can't be tied to legacy hardware forever. We've had fundamental changes in hardware like CPU sockets and RAM sockets change over the years. Even Apple will ditch outmoded hardware after a while when its holding them back. If it takes another major hardware shift to solve the 1970s based file system, go for it. Include a FastIO port/bay/something or other in new systems, and once adapted, move over to it completely. Surely MS doesn't want to be using decades old tech for the next 30 years? They must be thinking of ways to go forwards, somehow or other.
     
    PSman1700 and BRiT like this.
  3. milk

    milk Like Verified
    Veteran

    Joined:
    Jun 6, 2012
    Messages:
    3,977
    Likes Received:
    4,101
    Switch "game-centric" for "Whatever-XBSX-is-doing-like" and there it goes. Would that be a missed oportunity of doing something more broad, robust and useful? Yes. But that would be the quickiest and easiest thing to implement to allow next-gen like performance on PC.

    The robust solution can still be developed in parallel aiming to come 5 years from now or even more...

    Arguably, releasing the quick and dirty band-aid Direct Storage version might provide valuable feedback for the development of the robust new file system, if that actually is a thing.

    "System Requirements: An SSD with a Direct to Storage™ partition with 70 Gb free."

    done.
     
  4. DSoup

    DSoup Series Soup
    Legend Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    16,780
    Likes Received:
    12,697
    Location:
    London, UK
    I don't think we know what XBSX is doing. We have a better insight in PS5 which suggests more individually managed files, the polar opposite of assets being consolidated into large multi-gigabyte data packs.

    But more individually managed files comes with an overhead. We'll still have filesystem minimum block sizes (e.g. 1kb file on a 32kb block takes up 32kb of disk space) and the need to address and manage all of these files so we could be looking an order-of-magnitude increase in filesystem management. I wonder if PS5 still has traditional linux file permissions?.

    It may Series X isn't doing anything radical on the filesystem front and much of it's performance comes from the built-in decompression, only having to deal with one bus, plus a thinner software stack. These are fairly substantial savings.

    Hopefully we'll find out more about both systems but I'm not optimistic.
     
  5. DSoup

    DSoup Series Soup
    Legend Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    16,780
    Likes Received:
    12,697
    Location:
    London, UK
    You've just described PCI, IDE, SCSI, SATA, PCIe, M.2, U.2, mSATA and SATA Express! :lol2: They all started out fast enough then they weren't. The reason the standards keep changing is because you don't want the issue of legacy support slowing a new standard. A new interface generally means a new motherboard. There is always your local bus but you're sticking one bus on top of another which is the last thing you want.

    The reason it's not a solved problem is because it's a fiendishly difficult problem to solve.
     
    Unknown Soldier, chris1515 and BRiT like this.
  6. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    20,511
    Likes Received:
    24,411
    Who said anything about cache? I'm talking permanent file, permanently put in place and non-movable on the normal filesystem. Think VHD, where all games are installed inside of it if you don't want to deal with partitions.

    As for being game-only, that's one way of being able to not have to retest every single application that has ever existed in the WinOS eccosystem since the beginning of time. You make it something new and start off as being game and application centric.

    Edit: To clarify on what I said earlier. It removes the native filesystem overhead, by requiring the new gamefile.sys to be continuous and non movable. You allocate the file on the existing partition starting at Sector N for Length Z. You let the filesystem do whatever it has to do to reserve that space, but all APIs for it deal with reading between Sector N and N+Z. Think of it as a Partition within a Partition.
     
    #46 BRiT, May 22, 2020
    Last edited: May 22, 2020
    tinokun and milk like this.
  7. mrcorbo

    mrcorbo Foo Fighter
    Veteran

    Joined:
    Dec 8, 2004
    Messages:
    4,024
    Likes Received:
    2,851
    What comes to my mind is requiring a dedicated DirectStorage drive. It would only be accessed via the DirectStorage APIs and therefore compatibility with existing applications becomes a non-issue.
     
    DavidGraham, PSman1700 and BRiT like this.
  8. JoeJ

    Veteran

    Joined:
    Apr 1, 2018
    Messages:
    1,523
    Likes Received:
    1,772
    The cache behavor would be fine with other opplications, e.g. to cache some video footage, or huge open world, some volume simulation for offline render, etc.
    It could be also fine for games. Install game to big HD, game caches your current level or region of world to SSD. Also cache the entire game state so load up in a second. And most important: Cache your vendor specific BVH for RT :) It really has to be writable from the game.

    I do not disagree with the general idea, but the issue i see is this: I do not want to spend money on SSD i can only use for games. You'd have a hard time to sell me this. And finally at least i want a cheap and small model, probably too small to install some number of games.
    In that sense the cache thing would be not that bad, maybe.
     
    Per Lindstrom likes this.
  9. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,236
    Likes Received:
    4,259
    Location:
    Guess...
    Although given most systems only have one M2 slot that would restrict NVMe use for most users to gaming only. Everything else would have to operate from good old SATA or HDD. Unless you use a PCIe expansion card of course.
     
    PSman1700 likes this.
  10. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,213
    This, A new API can also give rise to new type of NVMe that is equipped with a dedicaeted compression/decompression chip if necessary.
     
    BRiT and PSman1700 like this.
  11. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    15,134
    Likes Received:
    7,679
    For series x they claim decompression will save 3 cpu cores and directstorage will save 2 cpu cores. They’re claiming a pretty massive reduction in io overhead. Should be one tenth of one core for io and decompression.
     
    iroboto, DavidGraham, Rootax and 2 others like this.
  12. xpea

    Regular

    Joined:
    Jun 4, 2013
    Messages:
    551
    Likes Received:
    783
    Location:
    EU-China
  13. BlackAngus

    Newcomer

    Joined:
    Apr 2, 2003
    Messages:
    134
    Likes Received:
    31
    DSoup and everyone,

    I wanted to talk about a few things with PC architecture vs PS5.
    It seems to me there roughly 3 areas where improvement could be:
    1. Mechanically (Less physical signal path)
    2. Greater SSD controller capability vs standard PC (More transactions per second)
    2. Software layer (Less overhead per IO)

    Mechanical:
    On most modern systems the integrated north bridge on the CPU has 4x PCIe 4.0 lanes which can provide a little less then 8GB/s directly to the socket/die (half that for PCIe 3.0). These NVMe lanes are dedicated lanes separate from the 16x PCIe to the dGPU.
    So I don't *think* you have to worry about any thing for this scenario that is outside the socket/die/main memory/dGPU system (IE. South bridge and shared lanes from SB to CPU.)
    This means you can get consistent access to the high speed NVMe SSD w/o much overhead mechanically, as the PS5 likely still needs a bus that connects the NVMe SSD to the socket/die, I doubt the SSD is on-chip with PS5 or we would see greater raw throughput (and cost).

    Even going through main memory isn't a significant slow down on modern systems from a access latency stand point. Your still talking ~1000x less latency from main memory than the SDD portion of this transaction, nanoseconds for main memory vs microseconds for SSD access.
    So bandwidth and latency don't really seem to be a big concern form the PC side, unless I am missing something in my thinking through this (Which I could be for sure, things are never as simple as they seem =)

    It looks like this to me at this point: SSD (10us) -> PCIe to CPU (300ns) -> Main Memory (write+read ~30ns) -> PCIe to dGPU (300ns) -> dGPU memory write (15ns) - Total: 10.35us of latency per transaction of a given size (Do
    The above covers the access portion of littles law for throughput (Throughput = access latency for a given transaction size x parallelism).
    A Console has an on die GPU so likely ~250-300ns less latency per transaction, which is a very small amount vs the PC but still helps drive higher SSD -> GPU throughput a little.

    Mechanically it would appear that consoles are not going to be all that different than a PC from a latency perspective even with with extra hop on the PC to the dGPU across the much higher bandwidth GPU PCIe lanes. (Latency here will likely be dominated by the SSD retrieval times)

    SSD Controller capability:
    PS5 seems to have a potential advantage in a few places:
    1. SSD Controller request handling:
    A more robust SSD controller that is able to access the storage chips in a more optimized fashion (Possibly more highly parallel fashion from how some of the descriptions read).
    This, in theory, could be applied to the PC as well if the SSD controllers added features. However at this point we don't really know how much better/different the PS5 controller is than a standard high end PCIe NVMe SDD controller is today. This shouldn't be a problem to product/enable on the PC architecture as its part of the NVMe SDD and its OS level driver.
    From reading about the PS5 controller I *think* that it improves the ability to have multiple outstanding accesses satisfied in parallel, again I could be wrong here. This helps improve sustained throughput across various transaction sizes (less queue and more do!).
    This is the throughput part of littles law.

    2. Possible ability to bypass main memory and CPU with an DMA like functionality
    Does the architecture allow for a DMA type method directly to the GPU from the SSD avoiding CPU/memory decreasing latency per transaction?
    I'm not sure this can happen today on a PC (I think it can), but could easily happen with the next generation of PC hardware.
    From a throughput standpoint not sure this would make much difference, but it really depends on the software IO layer efficiency so it might make huge differences or nearly none at all =)
    rDMA on Ethernet is great because it avoids a large amount of additional processing that I don't think it is anywhere near as large an issue with SSD access.

    Software Layer:
    The software IO layer is (as everyone stated) very in need a large optimization in Windows which hopefully what the direct storage enhancements will improve upon greatly, and where PS5/XSX have a potentially great advantage at the moment as they can insert a custom IO layer which is likely thinner from a latency standpoint then windows is today and possibly more parallel.

    I'm guessing even with PS5/XSX there is still a driver needed which is likely more optimized than current storage drivers on Windows, but again I could be wrong I am somewhat un-certian how this is handled on consoles today and if those requirements change with PS5/XSX.

    The filesystem could be something that is new and far more optimized than current systems we have today, not sure how much more efficient these can become really there is basic functionality needed to understand where the data you need is and now to request it.

    The compression is likely pretty interesting from a current perspective, but its purpose seems to be allowing for higher overall throughput (more efficient use of the storage bus bandwidth available), so the PC can really brute force this as time goes by pretty easily with current 4x PCIe 4.0 lanes, but certainly PCIe 3.0 won't cut the mustard here w/o compression.

    Now questions to the crowd:
    From a mechanical perspective - what have I missed in my summary above, and what additional possible improvements would PS5/XSX have?
    From a software perspective - what have I missed in my summary above, and what additional possible improvements would PS5/XSX have?

    Thanks all for your time and thoughts!
     
  14. function

    function None functional
    Legend

    Joined:
    Mar 27, 2003
    Messages:
    5,854
    Likes Received:
    4,406
    Location:
    Wrong thread
    I think could probably do both - implement DirectStorage for both your standard Windows/NTFS partition, and also for a new filesystem (accessed with a modified IO stack) using a dedicated SSD partition.

    The game wouldn't know / care which it was running on, but it would automagically perform better using the more optimal arrangement.
     
    tinokun and PSman1700 like this.
  15. BlackAngus

    Newcomer

    Joined:
    Apr 2, 2003
    Messages:
    134
    Likes Received:
    31
    As it would appear I cannot edit my post above, my math was a bit off on the latnecy calc for the PC. It should have been 10.65us.
     
  16. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,236
    Likes Received:
    4,259
    Location:
    Guess...
    Great post, I just have a couple of bits to add.

    I think this is only the case for Zen 2 at the moment (not sure about Zen). Intel still accesses the NVMe drive through the chipset. That will change with Rocket Lake though (due this year), which should match Zen 2's 20 spare lanes of PCIe 4.0 direct to CPU.

    Yep Sony confirmed the SSD connects to the APU via PCIe 4.0 4x.

    PS5 uses a 12 channel interface vs 8 for the current and next gen top end controllers on the PC. It conversely uses slower memory though to hit the 5.5GB/s throughput. Additionally they implement 6 priority levels for data requests vs NVMe's 3. No idea how much real world difference that would make!

    Nvidia certainly have a solution to this as per Xpeas post above although it's not yet implemented on commercial gaming GPU's. I'm fairly sure AMD's HBCC is also capable of this but isn't currently implemented in drivers.
     
    PSman1700 likes this.
  17. BlackAngus

    Newcomer

    Joined:
    Apr 2, 2003
    Messages:
    134
    Likes Received:
    31
    Yeah by modern I meant PC hardware that will be available in the PS5/XSX time frame, could have said that more clearly =)
     
    PSman1700 and pjbliverpool like this.
  18. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,833
    Likes Received:
    18,633
    Location:
    The North
    I was late to this. But sort of a reminder that the move to windows 10 away from 7 was a larger departure than most people believe. A lot of people ragged on MS for locking DX12 onto windows 10. But when they released 12on7 and people tried to play Gears5 it wasn’t nearly working so great.
    So heads up there could be a lot of changes under the hood and a lot of legacy I/O stuff can be changed wrt W10. There is a lot of stuff happening with windows at the base level. Windows 10 at launch is a very different Windows 10 as of today’s patch. I’m not sure how much legacy knowledge will continue to apply for come the newer releases that are supposed to be due out this year.
     
  19. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    967
    Likes Received:
    1,223
    Location:
    55°38′33″ N, 37°28′37″ E
    Interesting discussion regarding potential DirectStorage API improvements, though some points were already raised in the thread about Sony PlayStation 5 filesystem patent - specifically, the effect of large allocation blocks on SSD performance.

    It's still standard NVMe SSD, with 2.4GByte/s throughput - compared to ~7 GByte/s from top-end SSD controllers in 2020 Q3 (Phison PS5018-E18, Silicon Moton SM2264, Samsung 980 Pro).

    Every disk I/O benchmarks shows how large blocks maximize SSD throughput. Therefore the I/O subsystem needs to respect the native size for flash memory write page (8-16 KB) and erase block (1-2 MB). You can do it by either increasing LBA sector size from the default 512 Bytes - with potentially detrimental effects on backward compatibility - or following NVMe controller hints for optimal I/O block size in the StorNVMe miniport driver which requires updates to filesystem block allocation algorithm.

    In the end, a mid-end PC from 2021 would have about the same throughput as next-gen consoles (for an additional cost though).

    Yes, it's possible to store it all in a large compressed file - and actually you won't even need to run Defrag, since the new 'CompactOS' NTFS compression performs contiguous allocation automatically, so there's no file fragmentation.

    There could be additional compression algorithms, suited for specific types of game data, with offline tools selecting the best possible variation.


    Flash-memory aware filesystems with write-once logic were researched and implemented three decades ago in the era of PCMCIA cards, and they simply did not live up to expectations. It turned out standard filesystems work better in 100% of the use cases when you implement the translation layer inside the SSD controller, and not on the OS driver level. This way LBA emulation and automatic background garbage collection can work around write amplification more efficiently by taking into account all performance details of onboard flash memory.
     
    #59 DmitryKo, May 29, 2020
    Last edited: May 30, 2020
  20. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    967
    Likes Received:
    1,223
    Location:
    55°38′33″ N, 37°28′37″ E
    I don't really think it's a new filesystem - this would break compatibility with a lot of existing tools, and would require either some 'smart' automated disk management tools, or the end-users themselves, to proactively manage a separate partition or a separate SSD just for games. Everything they announced could be done on top of existing file I/O and filesystems, maybe with some specific SSD settings and API hooks for hardware-accelerated decompression.

    I think these are rather improvemens to disk space allocation strategies, file compression algorithms, and read/write I/O performance with deep queues and optimal block sizes - something like a higher-level transaction layer for the NTFS and LZX 'CompactOS' compression. Combined with enough system memory, these could overcome some of the most obvious obstacles for Windows developers.

    And if they really need a low-overhead filesystem, exFAT is already there and it supports 'flash memory parameters' block (but it cannot be used for system partitions).


    It's only because the basic I/O block (disk sector size) did not really change since 1983 IBM PC XT, while disk storage sizes (and game asset file sizes) increased by six orders of magnitude (10^6=1,000,000), from 10 MBytes to 10 TBytes. Make the OS allocate and process these basic blocks in thousands, problem solved.


    They've managed to implement LZX file compression in Windows 10 and Advanced 4Kn (4 KByte native sectors) in Windows 8 without throwing anything in the bin, and similarily 2 MB clusters in exFAT and NTFS simply required an updated release of Windows 10.

    They just need to add support for 64 KB sectors (either native or emulated with deep queue 512 Byte I/O requests) and make this the default I/O granularity and disk allocation unit, then the disk throughput will skyrocket (although for best efficiency, this would probably need an x86_64 CPU with native support for 64 KB virtual memory pages, which is not even announced yet).
     
    #60 DmitryKo, May 29, 2020
    Last edited: Jun 11, 2020
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...