Why do you say this?
I think you know
Why do you say this?
https://devblogs.microsoft.com/directx/directstorage-is-coming-to-pc/a
also MS confirmed that a new update of Windows 10 is going to add those features soon
- Today, Microsoft announced they're bringing a part of this revolutionary Architecture to Windows gaming PC's.
- DirectStorage will massively reduce load times for games, allowing PC games to more effectively use high-speed storage solutions.
- If you have a speedy SSD and a game supports DirectStorage, load and wait times will be much shorter.
It does this in several ways: by reducing per-request NVMe overhead, enabling batched many-at-a-time parallel IO requests which can be efficiently fed to the GPU, and giving games finer grain control over when they get notified of IO request completion instead of having to react to every tiny IO completion.
Why NVMe?
NVMe devices are not only extremely high bandwidth SSD based devices, but they also have hardware data access pipes called NVMe queues which are particularly suited to gaming workloads. To get data off the drive, an OS submits a request to the drive and data is delivered to the app via these queues. An NVMe device can have multiple queues and each queue can contain many requests at a time. This is a perfect match to the parallel and batched nature of modern gaming workloads. The DirectStorage programming model essentially gives developers direct control over that highly optimized hardware.
In addition, existing storage APIs also incur a lot of ‘extra steps’ between an application making an IO request and the request being fulfilled by the storage device, resulting in unnecessary request overhead. These extra steps can be things like data transformations needed during certain parts of normal IO operation. However, these steps aren’t required for every IO request on every NVMe drive on every gaming machine. With a supported NVMe drive and properly configured gaming machine, DirectStorage will be able to detect up front that these extra steps are not required and skip all the necessary checks/operations making every IO request cheaper to fulfill.
For these reasons, NVMe is the storage technology of choice for DirectStorage and high-performance next generation gaming IO.
Well not everything is about tflops.
It seems like you need an nvme drive. Not sure if it has to be pci-e 4 and higher or if the older ones will do it.Well, the SSD tech for pc's seems darn impressive, however we put it.
Other than DirectStorage support, isn't this something AMD's HBCC offered years ago, GPU direct access to storage medium(s)?Well, the SSD tech for pc's seems darn impressive, however we put it.
It seems like you need an nvme drive. Not sure if it has to be pci-e 4 and higher or if the older ones will do it.
Other than DirectStorage support, isn't this something AMD's HBCC offered years ago, GPU direct access to storage medium(s)?
I think you only focused on the second part of my post.Why do you say this?
I think you only focused on the second part of my post.
If the Turing support for RTX IO is the same as Ampere's, then there's no dedicated hardware for data decompression on Ampere and both architectures are using just the shader ALUs.
If there is no dedicated hardware for data decompression then we shouldn't expect performance similar to the new consoles. If GPU compute shaders were great for data decompression then microsoft or sony wouldn't bother themselves with dedicated units, as compute shaders serve additional functionality over fixed function hardware.
The alternative to this is Ampere has dedicated decompression hardware that Turing doesn't have, in which case we should expect very different IO performance between these two architectures (first half of my previous post).
Another alternative to this is Turing having secret sauce data decompression hardware hidden from us all this time, which I think it's very unlikely.
This is not related to you, but I'm not sure why my post made some people so defensive though. I thought it was "common sense" that the PC wouldn't have anything similar to the consoles on IO performance for a long time, short of the new GPUs getting fixed function hardware in the GPU SoC and a M.2 slot in the graphics card.
That article mentions LZW only.https://on-demand.gputechconf.com/gtc/2016/posters/GTC_2016_Algorithms_AL_11_P6128_WEB.pdf
GPU are good and better than CPU for decompression but the cost is not negligible.
Nvidia's RTX IO has integration with Microsoft DirectStorage to accelerate loading into the GPU.
Here's the Nvidia RTX IO slide:
For PS5 it was said their decompression requires 9 of the PS5 Zen 2 cores: 5 GB/S
Xbox Series X said it was 5: 2.4 GB/S
What kind of beefy CPUs are in that chart where it only needs 2 cores to handle 7 GB/S?
I do not understand why you write that it not being dedicated hardware somehow makes it "worse". A different Design achieving the same goal of acceleration decompression is not inherenrly worse due to its differentness. If I were to take a guess MS and Sony went the Route of hardware decompression blocks for the conglomerative cost (thermal, monetary, yield, etc.) of utilising more generalised Transistors on the GPU for the same purpose. If MS or Sony did it on the GPU like NV, it would mean less resources for graphics or conversely more die space or more heat and powe usage directly on the SoC. Using a dedicated GPU with in a System with swappable parts for such a purpose does not seem inferior, just a smart way to enforce a standard via a swappable Part. Basically, it makes a lot of sense on the PC to do it differently and not with a hw Block on a motherboard. Think of the difference of using an ASIC, an FPGA or a generalised CPU for the same task - one may make a lot more sense under a certain Budget, manufakturing limitation, or based upon the actual computation necessary and the overhead.I think you only focused on the second part of my post.
If the Turing support for RTX IO is the same as Ampere's, then there's no dedicated hardware for data decompression on Ampere and both architectures are using just the shader ALUs.
If there is no dedicated hardware for data decompression then we shouldn't expect performance similar to the new consoles. If GPU compute shaders were great for data decompression then microsoft or sony wouldn't bother themselves with dedicated units, as compute shaders serve additional functionality over fixed function hardware.
The alternative to this is Ampere has dedicated decompression hardware that Turing doesn't have, in which case we should expect very different IO performance between these two architectures (first half of my previous post).
Another alternative to this is Turing having secret sauce data decompression hardware hidden from us all this time, which I think it's very unlikely.
This is not related to you, but I'm not sure why my post made some people so defensive though. I thought it was "common sense" that the PC wouldn't have anything similar to the consoles on IO performance for a long time, short of the new GPUs getting fixed function hardware in the GPU SoC and a M.2 slot in the graphics card.
Microsoft said:“Microsoft is delighted to partner with NVIDIA to bring the benefits of next generation I/O to Windows gamers. DirectStorage for Windows will let games leverage NVIDIA’s cutting-edge RTX IO and provide game developers with a highly efficient and standard way to get the best possible performance from the GPU and I/O system. With DirectStorage, game sizes are minimized, load times reduced, and virtual worlds are free to become more expansive and detailed, with smooth & seamless streaming.” - Bryan Langley - Group Program Manager for Windows Graphics and Gaming
It might still be below PS5, given its 22GB/s best case scenario.
If the Turing support for RTX IO is the same as Ampere's, then there's no dedicated hardware for data decompression on Ampere and both architectures are using just the shader ALUs.
If there is no dedicated hardware for data decompression then we shouldn't expect performance similar to the new consoles. If GPU compute shaders were great for data decompression then microsoft or sony wouldn't bother themselves with dedicated units, as compute shaders serve additional functionality over fixed function hardware.
I do not understand why you write that it not being dedicated hardware somehow makes it "worse". A different Design achieving the same goal of acceleration decompression is not inhwrwntly worse due to its differentness. If I were to take a guess MS and Sony went the Route of hardware decompression blocks for the conglomerative cost (thermal, monetary, yield, etc.) of utilising more generalised Transistors on the GPU for the same purpose. If MS or Sony did it on the GPU like NV, it would mean less respuecws for graphics or conversely more die space or more heat and powe usage directly on the SoC. Using a dedicated GPU with in a System with swappable parts for such a purpose does not seem inferior, just a smart way to enforce a standard via a swappable Part. Basically, it makes a lot of sense on the PC to do it differently and not with a hw Block on a motherboard.
I see no reason why it needs to be seemed inferior for achieving the same goal in a different manner.
It is not compressed data. It is raw data.
To follow up on this; GPUs are very good at decompression. Many opt to do this on the GPU for data science. The streaming data in was probably the part that needed to be addressed.https://on-demand.gputechconf.com/gtc/2016/posters/GTC_2016_Algorithms_AL_11_P6128_WEB.pdf
GPU are good and better than CPU for decompression but the cost is not negligible.
The console manufacturers could easily have included them simply because they were relatively cheap compared to including a slightly bigger more powerful GPU capable of doing the same job in shaders.
EDIT: damn... @Dictator beat me by 60 seconds and said it better too.
Any ideas what those cores are in the nvidia slide?
2 cores handling 7 GB raw
PS5 9 of its Zen 2 cores for 5.5 GB raw
XSX 5 of its Zen 2 cores for 2.4 GB raw.
@chris1515 already answered that above. They're talking about different workloads. The 2 cores requirement in NV's slide is purely to handle the IO, no decompression involved. The PS5/XSX numbers include decompression. For comparable numbers look to the 14 cores NV mentioned being required to handle both the IO and decompression at 7GB/s
I do not understand why you write that it not being dedicated hardware somehow makes it "worse". A different Design achieving the same goal of acceleration decompression is not inhwrwntly worse due to its differentness. If I were to take a guess MS and Sony went the Route of hardware decompression blocks for the conglomerative cost (thermal, monetary, yield, etc.) of utilising more generalised Transistors on the GPU for the same purpose. If MS or Sony did it on the GPU like NV, it would mean less respuecws for graphics or conversely more die space or more heat and powe usage directly on the SoC. Using a dedicated GPU with in a System with swappable parts for such a purpose does not seem inferior, just a smart way to enforce a standard via a swappable Part. Basically, it makes a lot of sense on the PC to do it differently and not with a hw Block on a motherboard.
I see no reason why it needs to be seemed inferior for achieving the same goal in a different manner.