Blazing Fast NVMEs and Direct Storage API for PCs *spawn*

Discussion in 'PC Hardware, Software and Displays' started by DavidGraham, May 18, 2020.

  1. Remij

    Newcomer

    Joined:
    May 3, 2008
    Messages:
    99
    Likes Received:
    120
    I mean, it sounds like DirectStorage is directly going to address this issue. So we'll have to wait and see.
     
    PSman1700 likes this.
  2. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    798
    Likes Received:
    823
    Location:
    55°38′33″ N, 37°28′37″ E
    'Direct access to the NVMe controller' is sure an interesting point. On Windows, this could be implemented with a new NVMe storage port driver that's designed around NVME command interface and controller hints for optimal I/O block size - instead of StorPort port driver/StorNVMe miniport driver model which is based on a generalization of SCSI command set.

    I still think the DirectStorage API would be a user-mode layer designed to issue large file I/O requests with deeper queues, which should be far more efficient on NVMe storage. This will still be based on Windows I/O Manager driver stack and virtual Memory Manager and file Cache Manager, as well as existing Installable File System drivers and filters.

    This way they can tweak the I/O subsystem to reliably support large block sizes and use new or updated internal structures to reflect NVMe control flow, while also remaining compatibile with the StorPort driver model for legacy SATA devices.

    They could also intrercept ReadFile/WriteFile requests from legacy applications and rearrange them to use similar deep-queue and large-block transactions when the new storage drivers are installed.


    It's because applications are not designed to efficiently utilize this enormous bandwidth. Did you really expect to get a different answer for the same question?


    It's not just decompression or other processing overhead, it's also overall program flow and the data set.

    Imagine you have a 1970s era computer system with a tape archive application that reads 80 character lines from punch cards and writes them to text files on the magnetic tape, and a TTY application that sends text files over a 300 bit/s modem line.

    If you port these applications and OS interfaces to a modern computer with SATA disks and Gigabit Ethernet and run them on the same set of text data from 1970s - do you really expect to max out on network and disk bandwidth?

    Processing would only take a fraction of a second on modern hardware, so your theoretical bandwidth is hundreds of megabytes per second. Unfortunately you only have several hundred kilobytes of text to transfer and then your program stops - so your your real-life bandwidth is even less than a megabyte per second.

    That's the difference between maxing at 3 GBytes/s in synthetic disk benchmarks and averaging to 30 MBytes/s in real-world applications.


    It should be possible to plug hardware processing into a filesystem minifilter driver.
     
    #122 DmitryKo, Jun 9, 2020
    Last edited: Jun 10, 2020
  3. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    798
    Likes Received:
    823
    Location:
    55°38′33″ N, 37°28′37″ E
    Premiere Pro is a good example of very demanding non-linear editing application that works with multi-gigabyte 4K video files requring fast disk I/O, and its workloads are well aprroximated by sequential disk I/O benchmarks with large file and block sizes.

    But even if we assume that Windows I/O subsystem has six times as much overhead as Linux, and it's not some specific RAID driver issue with I/O Manager or SMB redirector - would it really help the majority of real-world applications, which average 30 Mbytes/s in random disk access patterns with small file and block sizes, if processing overhead is reduced six- or ten-fold?


    They would probably leave it 'as is' rather than break anything, if a break was the only option. Microsoft goes to great lenghts to provide binary compatibility even for very old software that's no longer maintained, to the point of replacing standard APIs with custom code to work around application bugs .

    Windows file I/O subsystem scaled from 32-bit Windows NT 3.1 Advanced Server running on quad-processor Pentium Pro 200 MHz with dozen megabytes of memory and RAID volumes of dozen hard disks and gigabytes of storage spece, to 64-bit Windows Server 2016 Datacenter running on 100+ 3GHz cores with terabytes of NUMA memory and fiber optic links to clusters of hundreds of disks and petabytes of storage space. All while using essentially the exact same filesystem and I/O subsystem, and having nearly 100% compatibility with existing applications.

    Duing the exact same 25-year span, how many times the Linux Kernel and its I/O subsystems and file systems have been tweaked, substantially changed, or totally rewritten with breaking changes to the APIs and drivers?


    How can I? They show me some server hardware and guys staring at fancy monitors, but there are no details on software configuration, drivers, or workloads used, no performanvce analysis or synthetic benchmarks, just some embedded ads and ramblings-about. Could as well made it a trash talk show episode with videographers debating Linux then engaging in staged fights.
    NVMe protocol only uses bus-mastering DMA and MSI interrupts.
    Hardware time-out is an exact opposite of what you have described here.
    Or it could be just about anything one can imagine.
     
    #123 DmitryKo, Jun 9, 2020
    Last edited: Jun 9, 2020
    iroboto and PSman1700 like this.
  4. DSoup

    DSoup meh
    Legend Veteran Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    12,440
    Likes Received:
    7,691
    Location:
    London, UK
    The issue seems to crop up a lot to be some specific RAID driver issue. It's occurring in different PC hardware (a server, a generic Windows PC), with different storage solutions but the one common where hardware performance crumbles is Windows.

    As for your second questions, the answer is it very much depends on how data is stored. This discussion spawned from the console forum and discussion around PS5 which has a very different storage and I/O paradigm where everything, hardware and software stack has been redesigned to minimum load times. That can work on a new closed platform like a console where games can be bundled and distributed to take advantage of this, but making a radical change like this in Windows? I don't think it would be all good.

    And linux applications are? What about just the disk I/O benchmark which is the most optimum scenario you'll ever get. You're just reading data and not doing anything with it, unlike a real work scenario where an applications reading data for a purpose.
     
  5. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    798
    Likes Received:
    823
    Location:
    55°38′33″ N, 37°28′37″ E
    These same applications and data patterns (i.e. console games and art assets) will come over to the PC - the question is whether mid-range PCs will be able to run them with sufficient performance in 2021.

    I still think it could be done with the existing Windows I/O subsystem. It wouldn't be as efficient, just like I said in an earlier thread, but at least Windows users are ready to trade off some performance and efficiency for broad extensibility and compatibility on both software and hardware levels - so in the end, installing more powerful hardware would improve performance of existing software, which is rarely possible for embedded devices.

    There is not a lot of benchmarking software for Linux - in Phoronix Test Suite, which uses some common command-line utilities or scripts, sequential disk read and write tests typically have much lower peak bandwidth numbers for the same model/make. Not sure if these results are comparable to Windows benchmarks in regard to block sizes and queue depths.


    Depends on your definition of Linux application.

    Linux Kernel is open source, so breaking changes are less of a concern since the source code is always available to make updates. This seems like a better support model for HPC/server applications tailored for the cloud, because you have qualified engineers, programmers and system administrators to manage your hardware and software stack. There are hundreds of vendors who submit changes to the Linux Kernel to support their platforms, and it's their responsibility and their best interest to update the source code for their proprietary applications and drivers and configure them to extract the best possible performance.

    On the other hand, Windows traditionally used binary executable files and proprietary closed source code, and valued binary compatibility above all else - which carries lower support costs and required less qualified staff, which is best for their once-typical on-premises file server and database server setups.

    For example, backward compatibility is the main reason why Windows sets a hard limit of 4 KByte for virtual memory pages, even though x86-64 architecture also supports 2 MByte and 1 GByte pages, and ARM64 additionally supports 16 and 64K Byte pages. There is limited support for 2 MByte large pages in Window - they can only be allocated physically, with no support for paging or kernel-mode use, for a lot of different reasons.
    All the wile Linux does support large 2 MByte pages in kernel mode with Transparent Hugepage (THP), and there is even a background service that converts contiguous physical memory regions into huge pages for old applications. It's not enabled by default either, but at least some applications and drivers can use large pages if they can benefit from them.

    So in theory, free open-source 'desktop' Linux applications could also be written to extract as much performance as possible, just like the Linux Kernel.

    Unfortunately there are almost no 'desktop' applications, besides some SteamOS ported games and a few basic productivity apps, and most of them use multi-platform frameworks with similar less-than-optimal file access patterns, so there are virtually no 'desktop' applications that can saturate disk I/O to the max.
    It's even worse for embedded/mobile applications, where the actual GUI and Linux device drivers use proprietary closed code which ships as binaries - once the vendor releases the initial firmware, it's almost always 'ship and forget' support mode, and even most tedious bugs and vulnerabilities are rarely fixed. Embedded system also use cheaper eMMC flash memory which is several times slower than NVMe drives.
     
    #125 DmitryKo, Jun 9, 2020
    Last edited: Jun 11, 2020
    PSman1700 likes this.
  6. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,135
    Likes Received:
    573
    They can drill down to hardware as far as they want, it's their code. If they want to bypass WDM and make a whole new I/O layer designed for peer to peer DMA they can obviously do so, sure they need to reimplement things build on top of it like say filesystems but for a single one that's not a big deal ... they don't need to support every flavour of legacy filesystem for DirectStorage, they can simply demand a partition be formatted specifically for it filesystem and all.
     
    #126 MfA, Jun 11, 2020
    Last edited: Jun 11, 2020
    PSman1700 likes this.
  7. DSoup

    DSoup meh
    Legend Veteran Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    12,440
    Likes Received:
    7,691
    Location:
    London, UK
    Of course Microsoft can do this, yet there is no evidence of this in Microsoft's forward-looking technology roadmap that this is imminent. Nor is it something you would want them to rush. Testing this in Xbox Series X is far less risky with less repercussions.. And flipping to a new stack and filesystem optimised to SSD is an additional stream of support, Microsoft are not about to abandon the hundreds of millions of Windows users who still use a spinning HDD.

    I wouldn't categorise the effort of making a new filesystem as "no big deal". Ask Sun how long it took ZFS to get traction. Or ask Apple about APFS which was in development almost a decade before it was ready to put on people's actual devices. Changing the stack, I/O and filesystem all at once? Brave.
     
  8. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,135
    Likes Received:
    573
    I assume they'll just lock a file and hand the GPU a block list and let the devs use whatever, the filesystem isn't very critical. A complex filesystem is more trouble than it's worth.
     
    tinokun, CeeGee and PSman1700 like this.
  9. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,135
    Likes Received:
    573
    Abandoning them would be not giving PC developers important tools when it's perfectly possible. Obviously they can't really afford to advantage AMD over NVIDIA, so on the pure graphics side they can't allow devs as low level access as on the Xbox ... but a 3GB/s NVMe drive isn't expensive, they should provide PC devs with an efficient peer to peer DMA system if they do it for console devs.

    IMO they should start an "Optimized For X" certification system for complete PCs and GPUs/drives and get as much feature parity between certified PCs and the Xbox X as feasible. I doubt they'll do it and that's why we need a real desktop alternative to Windows for PCs.
     
  10. eastmen

    Legend Subscriber

    Joined:
    Mar 17, 2008
    Messages:
    10,843
    Likes Received:
    2,032
    Yea HDDs are basicly gone now unless your at the very low end of pricing or your picking a machine that's going to be used for data storage or video editing. SSD prices keep dropping.
    I don't think MS wants to set up a Optimized for X on the PC. It will just lead to confusion in the end and alot of companies will try and skirt around the specification as much as possible.
     
    PSman1700 likes this.
  11. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,135
    Likes Received:
    573
    I didn't say specification, I said certification.

    A certification mark is well protected legally and they can test the components against their certification database to display in Windows whether it's certified. Some third world nothing brand might pretend to be a proper brand, but that's not much of a problem.
     
    #131 MfA, Jun 13, 2020
    Last edited: Jun 13, 2020
    PSman1700 likes this.
  12. Jay

    Jay
    Veteran Regular

    Joined:
    Aug 3, 2013
    Messages:
    2,595
    Likes Received:
    1,660
    This alternative will give you what precisely?
    It won't give you xsx or ps5 certification either.

    MS is bringing pc and xbox into alignment in many ways, tools, api, dev environment etc.
    They don't control the pc hardware and many things that give pc's there benefits also has there downsides.

    I've always expected them to provide direct storage to widows, but there are parts they don't control so pc won't get the full velocity architecture.
    All seems reasonable to me and going beyond what they have to do on pc.
     
  13. Arwin

    Arwin Now Officially a Top 10 Poster
    Moderator Legend

    Joined:
    May 17, 2006
    Messages:
    18,029
    Likes Received:
    1,614
    Location:
    Maastricht, The Netherlands
    No matter how though whether though just hooking SSD to the GPU directly or otherwise, to quote Jeff Goldblum/Malcolm, PC will ... find a way.
     
    PSman1700 likes this.
  14. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    15,831
    Likes Received:
    3,021
    Terrible idea, it may look good at console launch Imagine a p.c had a sticker on it saying basically "as good as an xbox" I wouldnt want it
     
    PSman1700 likes this.
  15. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,135
    Likes Received:
    573
    It gives indications of performance levels and for complete system ensures there's nothing dragging it down, like for instance some shit NVMe drive which lacks say the random access or write throughput to get close to what console devs will expect for being able to use their X streaming code.

    Parents who buy their kid a PC/laptop probably won't do the research to estimate it's performance level. Lazy people don't want to be bothered by it. You could do it by year, Optimized for X 20/21/etc. With 20 being designed to run Xbox One X games without significant compromise, 21 Xbox X, 22 start improving on it. I'd prefer the Chromebook model and Microsoft creating PCs which Just Werk, but better than nothing.
    They've had certification programs for Designed For Windows logos, they could have a more specific one for gaming. Precisely because they are trying to bring PC gaming into alignment with XBox, at least for their own store, it makes PC gaming more accessible.
    Peer to peer DMA has been done already on PCs, Linux obviously but still.
     
    PSman1700 likes this.
  16. sir doris

    Regular

    Joined:
    May 9, 2002
    Messages:
    674
    Likes Received:
    131
    I don't think HDD are gone at all, most people with a "gaming" PC will have a small SSD to boot from (256/512MB) and a multi-GB HDD to store games on. Sure people who own 2080Ti's will have all SSD storage but the majority, with cards like the 2070 and under will be using a hard drive to load games. If you can't afford a 2080 but can afford a 2070, you are not going to spend $300 on a 2TB SSD. Plus they will likely have only 1 M.2 slot. Even a 1TB boot M.2 doesn't leave you much space for modern games. If you are running a QLC drive as you fill it up it slows down too. 2TB SSD's are still expensive, especially the faster non QLC models, and the trusted brands (Corsair, Crucial, Samsung) are even more.
     
    Davros likes this.
  17. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    11,025
    Likes Received:
    5,562
    You think most people with a RTX 2070 aren't even using SATA SSDs?

    I don't personally know a single person that is using a HDD to play PC games. A 256GB SATA3 SSD goes for what, 40€? And people with towers can usually put at least 4 of those in any motherboard.
     
    PSman1700 likes this.
  18. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    7,614
    Likes Received:
    3,677
    Location:
    Pennsylvania
    I haven't built any gaming PC's for my friends with HDD's for years. The only HDD's I've ever bought in the last 6+ years was for my NAS.
     
    CeeGee, PSman1700 and BRiT like this.
  19. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,793
    Likes Received:
    1,077
    Location:
    Guess...
    Now you do :) I have a SATA 256GB SSD for Windows + 4TB HDD for games. It's really not a problem at the moment.

    I have a hungry m.2 slot on my x570 board though waiting for a 7GB/s Direct Storage compatible drive in the 2TB range for when it becomes necessary. Event then though, the HDD will still be there hosting a good few dozen current generation games and emulators.
     
    PSman1700 likes this.
  20. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    15,831
    Likes Received:
    3,021
    Here's a second person you know, 10tb of mechanical goodness....
     
    London-boy likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...