Next-Generation NVMe SSD and I/O Technology [PC, PS5, XBSX|S]

Kugai Calo

Regular
"When asked about the performance hit of RTX IO on the GPU itself, an NVIDIA representative responded that RTX IO utilizes only a tiny fraction of the GPU, “probably not measurable”.
That makes a lot of sense. Consider how a few Zen 2 cores with only 256 bits vector length can deliver considerable throughput, a GPU with way wider cores should indeed have “probably not measurable” utilization.
 

PSman1700

Legend
That makes a lot of sense. Consider how a few Zen 2 cores with only 256 bits vector length can deliver considerable throughput, a GPU with way wider cores should indeed have “probably not measurable” utilization.

That and GPU's are typically excellent for such tasks. I'd say its the more flexible solution aswell, aside from having the potentional to be faster.
 

snc

Veteran
I can't find the link right now but Nvidia have claimed that the performance impact on the GPU is negligible. It would presumably be done via async compute.

EDIT: Found it:

https://back2gaming.com/guides/nvidia-rtx-io-in-detail/

"When asked about the performance hit of RTX IO on the GPU itself, an NVIDIA representative responded that RTX IO utilizes only a tiny fraction of the GPU, “probably not measurable”.
possible, future and benchmarks will verify it as always
 

BRiT

(>• •)>⌐■-■ (⌐■-■)
Moderator
Legend
Supporter
There's handful of posts on PC GPU impacts from Kraken Compression providers themselves months ago if not last year when we had that discussion. It's not a concern in the least. They even said they improved things further.
 

snc

Veteran
There's handful of posts on PC GPU impacts from Kraken Compression providers themselves months ago if not last year when we had that discussion. It's not a concern in the least. They even said they improved things further.
sorry I'm not tracking rtx io so much as this tech. is already on consoles since last year but can you show me example of rtx io usage while rendering game (like portals in Ratchet) ? thx
 
Last edited:

Karamazov

Legend
Veteran
i don't think PC needs RTX I/O to match what's done in ratchet, as PC have access to more ram than PS5, which is obviously a lot faster than any SSD.
 

BRiT

(>• •)>⌐■-■ (⌐■-■)
Moderator
Legend
Supporter
sorry I'm not tracking rtx io so much as this tech. is already on consoles since last year but can you show me example of rtx io usage while rendering game (like portals in Ratchet) ? thx

They have even used Kraken GPU-based decompression on last-gen hardware like the PS4. Let me see what I can find through a search.
 

BRiT

(>• •)>⌐■-■ (⌐■-■)
Moderator
Legend
Supporter
Link to summary of earlier posts on 2020-09-19 @ https://forum.beyond3d.com/posts/2158006/ If you hadn't read the RadTools twitter thread, it's worth a read. They may have updates and newer discussions about it since then.

GPU decompression won't take much to exceed that of even the PS5, maybe 2TF or so. I posted it before, but on an early version 1.0 of GPU-based decompression they were getting 60-120 GB/s on a PS5. There's still possibilities of improvements, but even at first pass that's 6-12 GB/s per GPU TF used.

Oodle is so fast that running the compressors and saving or loading compressed data is faster than just doing IO of uncompressed data. Oodle can decompress faster than the hardware decompression on PS4 and XBox One.
 

snc

Veteran
Link to summary of earlier posts on 2020-09-19 @ https://forum.beyond3d.com/posts/2158006/ If you hadn't read the RadTools twitter thread, it's worth a read. They may have updates and newer discussions about it since then.
yeah but thats not what I was asking for, I thought there are already examples of usage during rtx io while rendering game like in Ratchet, only info its super fast on gpu doing only decompression and apparently its light background async task
 

t0mb3rt

Newcomer
yeah but thats not what I was asking for, I thought there are already examples of usage during rtx io while rendering game like in Ratchet, only info its super fast on gpu doing only decompression and apparently its light background async task
I always assumed RTX IO would hook into DirectStorage and that we won't see games making use of it until we get games made with DirectStorage.
 

dobwal

Legend
sorry I'm not tracking rtx io so much as this tech. is already on consoles since last year but can you show me example of rtx io usage while rendering game (like portals in Ratchet) ? thx

RTX IO uses Direct Storage API which as of last month wasn't available (has something changed). I doubt you can find real-world examples of it.
 

snc

Veteran
@dowbal @t0mbart yes but when I wrote that future will show and verify I got message that we already know results ;)
 

t0mb3rt

Newcomer
@dowbal @t0mbart yes but when I wrote that future will show and verify I got message that we already know results ;)
I don't really know what you're asking...

It's a pretty safe bet that the combination of DirectStorage, fast SSDs, and GPU decompression will offer performance that beats the PS5.

The PS5 is only impressive because nobody expected a console to have such fast storage but compared to the relatively unrestrained world of PC hardware, it's really not that impressive.

I'm just happy that games being built with fast storage in mind is now a baseline.
 

BRiT

(>• •)>⌐■-■ (⌐■-■)
Moderator
Legend
Supporter
@dowbal @t0mbart yes but when I wrote that future will show and verify I got message that we already know results ;)

We already know enough where we can say it won't genuinely have an impact when 1TF of GPU provides 6-12 GB/s of decompression even when using early revisions from 2020, at least not in comparison to console setups as PC has well in excess of 10-12 TF GPUs.
 

snc

Veteran
We already know enough where we can say it won't genuinely have an impact when 1TF of GPU provides 6-12 GB/s of decompression even when using early revisions from 2020, at least not in comparison to console setups as PC has well in excess of 10-12 TF GPUs.
yes and probably that will be the case but as physx showed sometimes things are more complicated and switching context of computing can be problematic but not saying I have some knowledge and will be the case in rtx io (as probably will be ok) but lets wait for some results to use declarative statement
 

BRiT

(>• •)>⌐■-■ (⌐■-■)
Moderator
Legend
Supporter
yes and probably that will be the case but as physx showed sometimes things are more complicated and switching context of computing can be problematic but not saying I have some knowledge and will be the case in rtx io (as probably will be ok) but lets wait for some results to use declarative statement

Perhaps, but even an Nvidia 3070 RTX is 20 TF and only using the excess TF (in relations to a PS5) would provide 60-120 GB/s of decompression from early revision code. That's enough of a performance cushion where I'm not concerned in the least about an impact, even less so when it only needs to target 12-24 GB/s MAX to be on-par or better-than the PS5 experience.
 

Kugai Calo

Regular
… and switching context of computing can be problematic but not saying I have some knowledge and will be the case in rtx io (as probably will be ok) but lets wait for some results to use declarative statement
? Async compute is already widely adopted since Xbox One/PS4, to the GPU the context switching you’re worried about is no more than another command list in its many compute & DMA command queues. If you’re referring to switching the binding of resources, a game would have many, many of them in a single render pass. GPUs of today can be quite different than those of D3D10 era.
 

snc

Veteran
? Async compute is already widely adopted since Xbox One/PS4, to the GPU the context switching you’re worried about is no more than another command list in its many compute & DMA command queues. If you’re referring to switching the binding of resources, a game would have many, many of them in a single render pass. GPUs of today can be quite different than those of D3D10 era.
wasn't async compute also in physx era and still works much worse on best on market single geforce vs performance one + lowend gpu, to this day don't know why, and every iteration of new geforce nvidia claim they improved async and still was same story edit: also I'm not worried ;d just theoretical discussion
 
Top