State of GDeflate *spawn

It wouldn't be something Joe Blogs would see the benefit from, but someone working on music/video production might.
They already can. Audio production doesn't tax bandwidth at all - audio files just are bandwidth heavy enough. Video can, but it's a niche occupation, particularly for people needing multiple GBs a second where the video streams are ordinarily compressed down to much lower bitrates. For professional lossless video editing, you'd be better of with a Video Acceleration expansion card. Remember those things? Lots of different cards you could put into PCI slots to add specific functionality?

As a functional block in an everyday PC, all the IO unit would achieve is lower power utilisation on those workloads. It seems more something you'd want on a Gaming motherboard or Gaming-focussed GPU.
 
But if they've never had the ability to decompress GB's of data per second then the software isn't going to use it.

If you gave every CPU the ability to decompress 10GB/s without taxing the CPU, I imagine some data heavy programs would look to take advantage of it.

It wouldn't be something Joe Blogs would see the benefit from, but someone working on music/video production might.

People in music/video production will like faster lossy compression, not faster lossless compression/decompression. Furthermore, if you put a lossless decompression unit into a CPU, it likely will eat some spaces that could be used for other units. Considering that compression/decompression does not generally require some special operations, I think it's really not that useful to have such unit. To put it simply, a decompression unit capable of doing 10GB/s is not going to be very small. You could put another integer unit or something like that and that'd probably be more useful in most situation.

Note that it's a different story on GPU because GPU is not very suitable at handling decompression works. That's why a decompression unit is potentially useful.
 
NVME drives connect directly to the PCIEX lanes in a CPU, which is why it makes more sense to me to have the I/O block on the CPU itself.

Adding it as a separate chip on the motherboard is a possibility but where would it go?

NVME slots have taken over the space on a motherboard that used to be taken up by the old Northbridge chips (I miss those days and the beautiful heatsinks they sometimes had)

You could always add it to the South Bridge, but they typically don't have the same speed capability for connected NVME devices as connecting them directly to the CPU does.

So where would it go? It would just add extra complexity to the traces on motherboard layout.
 
NVME drives connect directly to the PCIEX lanes in a CPU, which is why it makes more sense to me to have the I/O block on the CPU itself.

Adding it as a separate chip on the motherboard is a possibility but where would it go?

NVME slots have taken over the space on a motherboard that used to be taken up by the old Northbridge chips (I miss those days and the beautiful heatsinks they sometimes had)

You could always add it to the South Bridge, but they typically don't have the same speed capability for connected NVME devices as connecting them directly to the CPU does.

So where would it go? It would just add extra complexity to the traces on motherboard layout.

If you were going to put it on the CPU side of the CPU<->GPU PCIe link then in the CPU's IO complex would make most sense (like PS5). But for the reasons noted above that's probably wasteful.

So better to put it on the GPU side of that link - on the GPU itself. That way you benefit from the data compression over the CPU<->GPU PCIe link as well as the hardware units functionality being more focussed on those that need it (gamers). The sacrafice though is that the CPU still has to decompress it's own workloads without the help of the hardware unit. But that's arguably not an issue being relatively much smaller than the GPU workloads.
 
Can't you have a scratchpad area that's read only dedicated to the GPU/games? The file system duplicates necessary game data onto that partition and you can ignore file access responsibilities, because it'll just be flat out "anything can read, nothing can write", and the GPU can read whatever bits it wants. I presume technically that's feasible but not practical due to existing legacy structures imposing a certain system-wide way of doing things.
I think you've discussed several of the bigger points in why this doesn't necessarily help. Just to further elucidate:

Creating a sub-partition means you have to re-partition your existing storage as a function of installing your GPU. Obviously not a problem for those who are building a new-from-scratch PC today, however a bit of a pain for those who are upgrading their existing GPU, or perhaps they're updating their version of Windows and/or DIrectX12 to support this new DirectStorage Partition functionality. Repartitioning an existing disk is entirely feasible, but also makes assumptions about the free space available on your storage, and then we're back to managing unique storage just for your game data, which will just be duplicate data from your "main" partition(s).

You also hinted at another curious challenge: we need to be able to write data into that partition, but it also must be block-mapped so the storage driver knows how to translate GPU reads into the specific blocks of storage where the requested bits and bytes are stored. We are functionally creating another file system, fortunately the GPT spec already provides for this unique type of workload. However, the act of moving data from "main" partition(s) to this new partition will require some modicum of translation, and of course you're blast-read-and-writing to the same SSD as you go. This ultimately results in bottlenecking the process to half of what a common user might expect when reading a disk benchmark.

Additionally, creating a unique partition inside of the existing main storage means the space will be very constrained, the worst-case example would require a pre-processing step (wipe GPU partition, transfer + translate data from main partition to GPU storage partiiton) every time you load a new game with new assets. And what do you do with games with assets that are enormous? Devs would probably use the GPU storage partition as a cache of sorts, holding the most frequently used assets, and then keep the more unique stuff in the main data partition. Now we have a hybrid approach, with hybrid performance pitfalls and caveats and code challenges for the devs again.

Finally, protecting that partition from malware is going to be of high interest. If you simply create a partition where "anything goes", then it's going to find itself a fast easy target for malware to hide. Now we have to think about how EDR solutions will protect this area, to ensure it doesn't become a breeding ground for nasties.

And for what? The performance gain would be arguably minimal compared to what DS can deliver today. There are still two trips to main memory in the current DS implementation; this might remove one or both. And while removing these steps might yield incrementally more performance, it's going to be hard to argue the squeeze of creating a whole new partition type, security methods, data cleanliness and management processes, and requirement to "stage" game data before it can be effectively used outweighs the arguably tiny enhancement to fetch latencies and additional loading throughput.
 
Last edited:
To be clear, as I understand it, the issue with GDeflate based GPU decompression at present is more around how it handles the operation is such a way that control is taken away from the application and can cause the application to stall (i.e. stuttering etc..) rather than actual resource contention- i.e. it's just implemented poorly. Resource contention would still be a thing of course but I don't think that's why we haven't seen much use of GDeflate so far. So the hardware based decoder would need to be implemented in such a way as to avoid that kind of code stalling. And it would of course solve any resource contention issues as well.
Sorry, but that is what I meant by "resource contention".. when DirectStorage can interrupt the scheduling of rendering work causing the application to stall.. What should I call it if it's not that?
 
Back
Top