Download more.That's all fine and great.
What happens when you run out of VRAM?
Download more.That's all fine and great.
What happens when you run out of VRAM?
In order for the DirectStorage-specific GPU decompression to work, the entire asset management chain has to be collapsed into a singular, preprocessed blob. An example implementation of this process is the GLTF format he describes starting where I timestamped the AMD presentation above. This is to ensure maximum read speed from the storage layer, and also a "known size" of both the compressed and uncompressed assets in order to reserve the VRAM via the driver (see see timestamp 12:15)
This precomputed blob stores all the linked resources for the asset, and according to the AMD presentation, those are called subresources (he's talking about these around the 14m mark. Also see the API documentation on-screen, specifically calling out SubResources class.)
So I'll reiterate my point again: the performance upswing of GPU decompression still comes at a cost of VRAM. I
Also, the GPU isn't initiating any disk transfers at all, it's still handled by video driver and actioned by the main CPU as a load request into system memory, which is then moved into VRAM. The deck is 100% clear on that transaction flow, even in BypassIO mode.
At the same time, all of that was still true when the resources were in system am. It was always billed (even in the presentation) as "up to 5x" and that number doesn't change in the GPU decompression method vs the CPU decompression method. So at best, you're +256MB of VRAM consumed on the AMD gear, and +128MB consumed on NVIDIA gear, versus workloads today in a world where we're already running out of VRAM.
An increased use of tiled resources and and the use of something like SFS, made more usable thanks to faster decompression, might be able to offset this to some degree (less resident in vram, stuff getting dumped faster when not needed).
Trouble is, we might not see adoption of techniques like that quickly. Cards like the 570, 5700XT, 1060 etc are all still pretty popular, and don't support DX12U. The tools to handle increasing vram pressure seem to be arriving, but their use might not be high enough up the priorities list for a while.
Pascal supports Shader Model 6 so the 1060 should be fine. Those cards are probably too slow in other areas anyway.
Exactly. It's simply a much more efficient use of the architecture that is already there, and a useful improvement which acts as a stop-gap of sorts to buy time until proper architectural improvements can be made and adopted by the market.In the end, all of this DirectStorage tech is good stuff and nobody should think otherwise. In the same breath, a lot of it is future-looking and so several of the coolest features will work better in later cards with higher capabilities and capacities. There's nothing wrong with this at all; in the end some of these technologies will be useful even beyond gaming; the BypassIO and IORing methods are absolutely applicable to general purpose applications.
Addlink has made waves in the tech industry with the announcement of its S95 8TB Gen4x4 SSD. The drive has achieved a remarkable sequential read speed of 28GB/s in a 32TB NVMe RAID array, tested on an AMD Threadripper workstation with four Addlink 8TB SSDs using an MSI M.2 XPANDER-AERO RAID card on an MSI TRX40 Creator motherboard.
The S95 SSD features TLC 3D NAND technology and offers exceptional read speeds of up to 7GB/s, making it at least two times faster than Gen 3 NVMe SSDs and more than 14 times faster than SATA SSDs.
...
In addition to its outstanding performance, the S95 SSD offers impressive endurance, with 2800TBW for the 4TB version and 5600TBW for the 8TB version.
Can any application other than high-end video editing saturate any-bandwidth storage?
I'd wager that some dev tools, especially game dev tools and environments would go close to saturating such setups,Can any application other than high-end video editing saturate any-bandwidth storage?
That's the "big" problem. There is only so many data CPUs and GPUs can process until the shere volume of data just reduces a bit a bit the latencies. Also not every bit of data must be read into memory as most parts are already in there because there is normally not that much changing between frames. So only the initial bandwidth between complete location changes are those that might create problems but even such things where handled quite good in the past.Yeah, we've been able to put SSDs into striped RAID configurations for a while now, the problem is that applications need to be written specifically for fast storage for it to be a meaningful improvement for most use cases. Obviously there are some that will benefit regardless, like simply copying of large volumes of data, but real world consumer applications (like games) generally need to be specifically written to take advantage of it and even there you'll likely run into other bottlenecks before you get even close to needing more than say 7 GB/s or even 4 GB/s drives.
Regards,
SB
At the same time, all of that was still true when the resources were in system am. It was always billed (even in the presentation) as "up to 5x" and that number doesn't change in the GPU decompression method vs the CPU decompression method. So at best, you're +256MB of VRAM consumed on the AMD gear, and +128MB consumed on NVIDIA gear, versus workloads today in a world where we're already running out of VRAM.
So I'll reiterate my point again: the performance upswing of GPU decompression still comes at a cost of VRAM. Is it a worthy tradeofF? For maximum performance on large assets, where the video card has sufficient VRAM, absolutely. Is it a default case for a whole lot of the video cards that exist today? Not really.
Also, the GPU isn't initiating any disk transfers at all, it's still handled by video driver and actioned by the main CPU as a load request into system memory, which is then moved into VRAM. The deck is 100% clear on that transaction flow, even in BypassIO mode.
If it breaks disk performance counters, then perhaps there's no weird read behavior in Forspoken to begin with?I finally fixed my NVMe drives to properly support BypassIO. There's a couple of things that I haven't seen mentioned yet but are probably worth knowing about.
First of all, it breaks disk performance counters in Windows in applications that use DirectStorage. That is to say, they work perfectly normally in applications that do not use DirectStorage. But applications that do use DirectStorage effectively become invisible to the OS as far as those disk counters are concerned. In other words, you can have a DS application that completely maxes out your SSD but Task Manager, Resource Monitor, HWinfo, etc. will all report your disk as being idle. This obviously has some implications for measuring DS performance since we're effectively blind to how the disk itself is performing.
Secondly, BypassIO seems to work transparently with DirectStorage and doesn't need any specific work done. As long as the application uses DS and the drive supports BypassIO then it seems to work automatically.
Also, I was kinda hoping that it would fix that weird read behavior in Forspoken but that doesn't seem to be the case. It was still invisible to the above performance counters but I managed to confirm it via the drive's SMART info.