Digital Foundry Article Technical Discussion [2020]

Status
Not open for further replies.
We have no data, i can counter with the ue5 demo running on a nvme/rtx3080q (2070rtx dgpu) laptop. It was supposedly a ssd showcasing tech demo. It ran better on the laptop (higher res).
 
Not really, at least not from the R&C -- we dont know when it starts, when it stops, or how much it loads.

Like I said before, there is three duplicated frames at the beginning of each purple portal. I suppose some latency between the I/O request and the moment the data are really loading. They don't need this three duplicated frames if the loading begins before.

All of this are compatible with SSD technology, latency in the tenth to the hundredth milliseconds.;) 8 to 9 GB of data load by second from one to 1.6 seconds for loading level.

Spiderman level on PS5 demo were loading in 0.8 seconds but the RAM doubled on PS5. All is logic...;)

We have no data, i can counter with the ue5 demo running on a nvme/rtx3080q (2070rtx dgpu) laptop. It was supposedly a ssd showcasing tech demo. It ran better on the laptop (higher res).

Again we don't know if it was the same quality and resolution has nothing to do with the SSD.
 
We have no data, i can counter with the ue5 demo running on a nvme/rtx3080q (2070rtx dgpu) laptop. It was supposedly a ssd showcasing tech demo. It ran better on the laptop (higher res).

This is not correct IIRC. No concrete performance info was given out about UE5 on anything other than a PS5. An Epic employee said it would run pretty good on a RTX2070 from a GPU standpoint. It was never shown running on a laptop. The laptop was playing back the video of the PS5 version.
 
The compression ratio of RTX IO is 2:1. That's perfectly in line with the XSX's BCPACK. I strongly suspect that's not coincidental.

Nvidia told this is best case compression. It will be often less than this. The compression ratio will vary from level to level depending of the set of textures. There is nothing as a unique compression ratio even in the same game.

Although at a disk IO-level, ones and zeroes are still being moved at up to 7 GB/s, the de-compressed data stream at the CPU-level can be as high as 14 GB/s (best case compression). Add to this that each IO request comes with its own overhead—a set of instructions for the CPU to fetch x resource from y file and deliver it to z buffer, along with instructions to de-compress or decrypt the resource.
 
Last edited:
RTX IO is the fastest ssd tech of the bunch now, yes its marketing but so was sonys. Ratchet doesnt provide any data, ue5 doesnt either but a pcie nvme laptop did the same thing at a higher fps.
 
RTX IO is the fastest ssd tech of the bunch now, yes its marketing but so was sonys. Ratchet doesnt provide any data, ue5 doesnt either but a pcie nvme laptop did the same thing at a higher fps.
Not even on paper as PS5 best case is 22GB/s. Besides, there are no bottlenecks on PS5, there are still plenty on PC + RTX. In practice PS5 I/O will still be quite faster. Currently the loading are so quicks, almost too quick (from 0.8 sec to 1.6 sec) that many people are in denial because it can't be, the loading is happening before and after, that kind of thing.

We'll have a better assessment with games like The Witcher 3 with its quite long loading times. It's going to be hard to deny anything then with such a fair comparison.

Also finally don't forget that 22GB/s is using the custom hardware on PS5. They could have even better compression if they used the GPU shaders like RTX IO.
 
Not even on paper as PS5 best case is 22GB/s. Besides, there are no bottlenecks on PS5, there are still plenty on PC + RTX. In practice PS5 I/O will still be quite faster. Currently the loading are so quicks, almost too quick (from 0.8 sec to 1.6 sec) that many people are in denial because it can't be, the loading is happening before and after, that kind of thing.

We'll have a better assessment with games like The Witcher 3 with its quite long loading times. It's going to be hard to deny anything then with such a fair comparison.

Also finally don't forget that 22GB/s is using the custom hardware on PS5. They could have even better compression if they used the GPU shaders like RTX IO.
I would imagine the difference between them would be inconsequential past a point. PC's throughout the generation will have far more RAM and VRAM and could keep more in memory reducing the need to swap full sets of assets in and out at any given time. And why would they even bother using compute resources for decompression on PS5 if they have a dedicated hardware block? It's already plenty fast enough.. They're going to want to keep all those precious resources for the GPU. Also, with RTX I/O since the decompression is done with the shader cores, there's plenty of resources to decompress assets.. which Nvidia has already stated has a negligible performance hit. It should also scale as more powerful GPUs and faster storage drives are released. There's also a chance that at some point Nvidia and AMD could include a dedicated decompression block right on the GPU in the future as well.

That of course doesn't change the fact that PS5's implementation is more efficient... but efficiency doesn't always mean "faster" or "better".
 
Nvidia told this is best case compression. It will be often less than this. The compression ratio will vary from level to level depending of the set of textures. There is nothing as a unique compression ratio even in the same game.

I could have done with Nvidia's slide PC architecture bottlenecks about six months back to the folks who struggled to understand it (or even believe it). Unless I'm missing something, RTX I/O only solves half the problem and requires a new implementation to game data structuring issue to achieve that.

For compressed data read off storage that is for sole use by the GPU (textures, geometry, shaders, data for compute tasks etc), rather than the data flow being 1) storage, 2) bus, 3) main memory, 4) CPU (unpacking), 5) bus, and 6) GPU/VRAM that data flow can now be 1) storage, 2) bus, 3) GPU/VRAM.

But for compressed data read off storage that is for sole use by the CPU, or is needed by both CPU and GPU like geometry data where AI, collision detection and any other interactions are handled by the CPU, you're still waiting on the CPU - which will have had some load lifted. I think the conundrum comes to how games, or rather installers, package data. If you have a supported Geforce RTX card (this only shows the GeForce 30xx series on Nvidia's site but surely must include first generation RTX cards), a PCIe 4.x board and fast NVMe drive all of your games ever released still have all the data shoved together in one pack, stored for the CPU to pick apart. You know need GPU-only and CPU-only data stored separately so they can be routed to the appropriate RAM pool right up front.

If this takes off, will this result in a proliferation in games patches that re-organize game data to support Nvidia's brand of DirectStorage? What if AMD come up with a different implementation?


Not even on paper as PS5 best case is 22GB/s. Besides, there are no bottlenecks on PS5, there are still plenty on PC + RTX. In practice PS5 I/O will still be quite faster. Currently the loading are so quicks, almost too quick (from 0.8 sec to 1.6 sec) that many people are in denial because it can't be, the loading is happening before and after, that kind of thing.
I like Nvidia's approach but you only do so much when most games store all their data together and the CPU and GPU have their own RAM pools. Once games start to support it, it should be an overall win but it's still a distant step from the simplified, unified architecture of nextgen consoles that don't have this problem to solve. This is an architectural design approach when one model's advantages is the other model's disadvantages and vica-versa.
 
Last edited by a moderator:
...
If this takes off, will this result in a proliferation in games patches that re-organize game data to support Nvidia's brand of DirectStorage? What if AMD come up with a different implementation?



....

My understanding was DirectStorage was sort of a api / "norm", like Direct3D ? So, if nvidia solution utilise DirectStorage, and AMD too, it should not be a problem for the devs since for them they only need to make it work for DirectStorage ?
 
Not even on paper as PS5 best case is 22GB/s.

That's the peak rate of the decompression block, relevant to extreme corner cases only, not to the average.The average is 8-9GB/s, Sony have been unambiguous about that.

Also finally don't forget that 22GB/s is using the custom hardware on PS5. They could have even better compression if they used the GPU shaders like RTX IO.

So you're saying that Sony's 10TF GPU could outperform the 22GB/s of it's hardware block but for some reason a 30TF Ampere couldn't?

Besides, there are no bottlenecks on PS5, there are still plenty on PC + RTX. In practice PS5 I/O will still be quite faster.

I assume you have detailed insider knowledge of exactly how RTX IO and Direct Storage work in order to know this. Would you care to share the details?

We'll have a better assessment with games like The Witcher 3 with its quite long loading times. It's going to be hard to deny anything then with such a fair comparison.

It depends how much they've re-architectured the game to take advantage of the new IO paradigms. Sony certainly has a better chance of that but even Cerny has said that games won't automatically benefit from ultra fast loading times because older software simply isn't designed with these new paradigms in mind.
 
I could have done with Nvidia's slide PC architecture bottlenecks about six months back to the folks who struggled to understand it (or even believe it).

I seem to remember some members struggling to believe something like RTX IO was even possible on a PC ;)

Unless I'm missing something, RTX I/O only solves half the problem and requires a new implementation to game data structuring issue to achieve that.

Yes, they've specifically stated that games need to be "Direct Storage enabled" to take advantage of this. The good news is that PC's don't need to be Direct Storage capable in order to run Direct Storage enabled games. So developers can use it without worrying about whether a PC can run it or not. That should help adoption significantly.

But for compressed data read off storage that is for sole use by the CPU, or is needed by both CPU and GPU like geometry data where AI, collision detection and any other interactions are handled by the CPU, you're still waiting on the CPU - which will have had some load lifted.

It'll be interesting to see how RTX IO handles this. i.e. does everything go through the GPU for decompression first and then get doled out to the CPU or GPU as required? Or does it work as you say with the CPU handling the decompression of it's own data? Nvidia's claims of the overhead reduction suggest the former but if it's the latter then you're still talking about an 80%+ reduction in the load on the CPU (typical percentage of streamed game content made up of textures according to Microsoft).

I think the conundrum comes to how games, or rather installers, package data. If you have a supported Geforce RTX card (this only shows the GeForce 30xx series on Nvidia's site but surely must include first generation RTX cards), a PCIe 4.x board and fast NVMe drive all of your games ever released still have all the data shoved together in one pack, stored for the CPU to pick apart. You know need GPU-only and CPU-only data stored separately so they can be routed to the appropriate RAM pool right up front.

PCIe 4.x isn't a requirement, and yes Turing is also supported. Time for me to upgrade!

If this takes off, will this result in a proliferation in games patches that re-organize game data to support Nvidia's brand of DirectStorage? What if AMD come up with a different implementation?

Can you have your own brand of Direct Storage? The whole point of an API like this is so that any game that supports it will run on any hardware that supports it, regardless of how it's implemented. I'm sure AMD will have their own implementation of Direst Storage but I don't expect games to have to cater to one or the other.
 
My understanding was DirectStorage was sort of a api / "norm", like Direct3D ? So, if nvidia solution utilise DirectStorage, and AMD too, it should not be a problem for the devs since for them they only need to make it work for DirectStorage ?
Then why are Nvidia branding this as Nvidia RTX I/O and not just DirectStorage? Perhaps it's just marketing. But some standard needs adhering too for interoperability wit the GPU. Maybe this is the equivalent of Nvidia API extensions for DirectStorage.

I seem to remember some members struggling to believe something like RTX IO was even possible on a PC ;)
Touché ;-). I know this is in jest but it's solving half of the problem but the lifting the load will help in equal amounts I'd have thought.

Yes, they've specifically stated that games need to be "Direct Storage enabled" to take advantage of this. The good news is that PC's don't need to be Direct Storage capable in order to run Direct Storage enabled games. So developers can use it without worrying about whether a PC can run it or not. That should help adoption significantly.
This may take a while to gain much traction, or supporting will transition in over time. But you have to start somewhere. Direct3D took a while to gain traction. I'd expect DirectStorage 1.0 to be the beginning of a more comprehensive solution which will require further tweaks to the PC's architectural arrangement.

It'll be interesting to see how RTX IO handles this. i.e. does everything go through the GPU for compression first and then get doled out to the CPU or GPU as required? Or does it work as you say with the CPU handling the decompression of it's own data? Nvidia's claims of the overhead reduction suggest the former but if it's the latter then you're still talking about an 80%+ reduction in the load on the CPU (typical percentage of streamed game content made up of textures according to Microsoft).

This will be interesting to watch. By flipping the role of who decompresses, the worst case scenario should be that it is no worse than situation now (with the CPU taking this role), but better because the GPU is faster at decompressing but decompression is half of the equation, it depends how much of the data you decompressed needs to be in the other RAM pool. Having literally no data at all to look at, I would be really quite surprised if more data was required by the CPU/man RAM than the GPU/VRAM. We know how massive geometry and texture data is. There may be some edge cases but this surely also has to be a win.

But there is a question of how much data in existing games is packed optimally for the GPU decompressors. It's not always the case that textures are here, shaders are there, geometry data is here - I've seen data packed in very weird ways, some games pack shader data and other graphics tech into the world geometry data (thank you Witcher 3 and Infamous Second Son that's genius :nope:)

PCIe 4.x isn't a requirement, and yes Turing is also supported. Time for me to upgrade!
PCIe 4.x isn't a technical requirement but without it you're losing a lot of potential bandwidth without it.

Can you have your own brand of Direct Storage? The whole point of an API like this is so that any game that supports it will run on any hardware that supports it, regardless of how it's implemented. I'm sure AMD will have their own implementation of Direst Storage but I don't expect games to have to cater to one or the other.
Many of the DirectX set of APIs have a core set, plus a method of extension. You want an API to set the standard but not limit future hardware. Nvidia and AMD-specific graphics extensions have been common on graphics cards for many years remember. :yes:
 
Nvidia told this is best case compression. It will be often less than this. The compression ratio will vary from level to level depending of the set of textures. There is nothing as a unique compression ratio even in the same game.

I imagine in stating "best case compression" they're referring to the physical speed of the drive rather than the compression ratio. I'm fairly sure I've seen them state elsewhere that the typical compression ratio is 2:1 (just like MS claims for BCPACK) but that will only achieve 14GB/s on a best case 7GB/s PCIe4 NVMe drive.
 
I would be really quite surprised if more data was required by the CPU/man RAM than the GPU/VRAM. We know how massive geometry and texture data is. There may be some edge cases but this surely also has to be a win.

Microsoft claim 80% of streamed game data is textures. So yes, certainly seems like a big win.

But there is a question of how much data in existing games is packed optimally for the GPU decompressors.

My guess would be none as Nvidia have already stated a game needs to be Direct Storage compatible to work with RTX IO. Looks like we're looking at a whole new paradigm for developers.

Many of the DirectX set of APIs have a core set, plus a method of extension. You want an API to set the standard but not limit future hardware. Nvidia and AMD-specific graphics extensions have been common on graphics cards for many years remember. :yes:

Fair point, it'll be interesting to see how that turns out. It'd certainly be a massive handicap if games have to specifically support RTX IO rather than just Direct Storage. I don't see it going that way personally but it does seem to be a possibility.
 

This is particularly interesting and possibly gives us a hint at the peak performance of RTX IO.

Andrew Goossen told Digital Foundry that the XSX was doing work equivalent to 5x Zen2 cores at the drives max speed (2.4GB/s) if Direct Storage and the hardware decompression block weren't involved. That comes out at 480MB/s per CPU core. Nvidia's claims of needing 14 CPU cores to do the same for a 7GB/s drive fall right in line with that (anyone else wonder if they're singling from the same hymn book?).

RTX IO is 3.33x faster than 24 Zen 2 cores in this example. Or the equivalent of 80(!) Zen 2 cores. 80*480MB/s = 38.4GB/s.


Actually since only 14 24 cores is required to max out the decompression requirements of a 7GB/s drive, the above makes no sense since a 24 core threadripper should already completely remove the decompression bottleneck unless they're using some kind of crazy RAID setup to achieve 23GB/s+ SSD throughput. So perhaps what we're seeing here is the non-decompression related benefits of RTX IO / Direct Storage.
 
Last edited:
Then why are Nvidia branding this as Nvidia RTX I/O and not just DirectStorage? Perhaps it's just marketing. But some standard needs adhering too for interoperability wit the GPU. Maybe this is the equivalent of Nvidia API extensions for DirectStorage.

What I understand is that RTX I/O is the marketing name to the changes/drivers Nvidia have made to be DirectStorage compatible.
 
My guess would be none as Nvidia have already stated a game needs to be Direct Storage compatible to work with RTX IO. Looks like we're looking at a whole new paradigm for developers.
Yup, and normally this would be an ask for developers - asking them to change the way data is organised, structured and compressed. If Nvidia had rolled this out in isolation I'd be skeptical of it's adoption (speaking as a 2080Ti owner) but with this benefiting nextgen consoles as well. Hopefully devs will make the effort to embrace the paradigm shift.

It is a shame that few existing games will benefit but we don't yet know what performance leaps console games will get on nextgen hardware either. Given their dog slow HDDs I'd expect massive leaps forward but equally I'm not expecting miracles.
 
Status
Not open for further replies.
Back
Top