Next-Generation NVMe SSD and I/O Technology [PC, PS5, XBSX|S]

Shortbread

Island Hopper
Legend
I wanted to start a thread focusing on the capabilities and questions surrounding the next-generation of NVMe SSD and I/O technology within the PC and console gaming space. Please feel free to post any additional information or videos that should be added to the following post. And of course, Mod's can add or change any information within these post.

So, this thread/topics are a work in progress.
----------------------------------------------------------------------------------------------------

PC

General SSD & I/O Architecture Overview
Nvidia RTX IO Technology
Leveraging the advanced architecture of our new GeForce RTX 30 Series graphics cards, we’ve created NVIDIA RTX IO, a suite of technologies that enable rapid GPU-based loading and game asset decompression, accelerating I/O performance by up to 100x compared to hard drives and traditional storage APIs. When used with Microsoft’s new DirectStorage for Windows API, RTX IO offloads dozens of CPU cores’ worth of work to your GeForce RTX GPU, improving frame rates, enabling near-instantaneous game loading, and opening the door to a new era of large, incredibly detailed open world games.

Object pop-in and stutter can be reduced, and high-quality textures can be streamed at incredible rates, so even if you’re speeding through a world, everything runs and looks great. In addition, with lossless compression, game download and install sizes can be reduced, allowing gamers to store more games on their SSD while also improving their performance.

Microsoft is delighted to partner with NVIDIA to bring the benefits of next generation I/O to Windows gamers. DirectStorage for Windows will let games leverage NVIDIA’s cutting-edge RTX IO and provide game developers with a highly efficient and standard way to get the best possible performance from the GPU and I/O system. With DirectStorage, game sizes are minimized, load times reduced, and virtual worlds are free to become more expansive and detailed, with smooth & seamless streaming.” - Bryan Langley - Group Program Manager for Windows Graphics and Gaming

NVIDIA RTX IO plugs into Microsoft’s upcoming DirectStorage API, which is a next-generation storage architecture designed specifically for gaming PCs equipped with state-of-the-art NVMe SSDs, and the complex workloads that modern games require. Together, the streamlined and parallelized APIs, specifically tailored for games, allow dramatically reduced IO overhead and maximize performance/bandwidth from NVMe SSD to your RTX IO-enabled GPU.

Specifically, NVIDIA RTX IO brings GPU-based lossless decompression, allowing reads through DirectStorage to remain compressed while being delivered to the GPU for decompression. This removes the load from the CPU, moving the data from storage to the GPU in its more efficient, compressed form, and improving I/O performance by a factor of 2.

GeForce RTX GPUs are capable of decompression performance beyond the limits of even Gen4 SSDs, offloading dozens of CPU cores’ worth of work to deliver maximum overall system performance for next generation games.

Articles
TomsHardware 09/04/2020 Article
Microsoft this week said that it would bring preview of its DirectStorage application programming interface that powers the company’s Xbox Velocity Architecture to Windows 10 developers in 2021. The API is designed to speed up game loading times and improve performance of games by eliminating storage API-related bottlenecks and reducing CPU involvement, but on a client PC it can do much more than that. Nvidia has also adopted the technology, branded Nvidia RTX IO, for its Ampere graphics cards.
Modern PC games use tens of gigabytes of storage and to load them quickly one needs an SSD that supports the NVMe protocol and boasts a high sequential read speed. To further optimize performance by ensuring that all the necessary data like textures and sounds fits into memory (both system RAM and GPU RAM), contemporary game engines break the assets into blocks and load only those that are needed for the scene being rendered. These blocks may be rather small, but they are still larger than 4 KB blocks used to rate random input/output (I/O) performance of SSDs.
According to Microsoft, the custom SSD used in the upcoming Xbox Series X console generates well over 35,000 64 KB I/O requests per second to hit its peak sequential read speed of 2.4 GB/s. The NVMe protocol and modern SSDs can handle multiple queues simultaneously (which is called queue depth) and each of them can contain many requests. But raw performance of the drive is only a part of the equation.
Existing storage APIs require the application to manage its I/O requests sequentially: submit the request, wait for it to complete, handle its completion, move on to another request. Older games that generated hundreds of requests (as they were designed primarily with hard drives in mind) did not produce a significant overhead and therefore did not use too much CPU time. But with upcoming titles that generate tens of thousands of requests that overhead gets so substantial that it might prevent modern systems from taking full advantage of modern SSDs and/or leave no CPU horsepower for other tasks.
PCWorld 09/03/2020 Article
Nvidia’s Huang said that RTX IO offers “APIs for fast loading and streaming directly from SSD to GPU memory” and GPU lossless decompression. It’s unclear yet whether that’s a special sauce, or just Nvidia glomming onto the benefits of DirectStorage itself. Nvidia’s marketing did a killer job of tying real-time ray tracing to its RTX branding, but the technology is actually built on Microsoft’s underlying Direct Raytracing API, which is why you’ll be seeing it in the Xbox Series X and AMD’s RDNA 2-based “Big Navi” graphics cards later this year.
Microsoft’s post makes it clear that you’ll need an NVMe drive to tap into DirectStorage’s benefits, however. That’s because NVMe drives offer both extremely high bandwidth compared to traditional SATA-based storage, as well as multiple “NVMe queues” that can contain multiple IO requests, making them “a perfect match to the parallel and batched nature of modern gaming workloads”—and GPU capabilities.
DirectStorage looks like it’ll change that—when it arrives on PCs, that is. While the technology will be part of the Velocity Architecture inside the Xbox Series X this fall, Microsoft says it’s hoping to get a DirectStorage preview in the hands of PC developers sometime in 2021. If the dream of instantly loading worlds turns into a gaming reality, the wait will be worth it.
HotHardware
A demo to show the theoretical benefits of NVIDIA RTX IO, that works in conjunction with Microsoft's DirectStorage API, was also shown. During the demo, handling the level load and decompression took about 4X as long on a PCIe Gen 4 SSD using current methods and used significantly more CPU core resources. The demo was run on a 24-core Threadripper system and the standard load / decompress took over 5 seconds. With RTX IO, that time was cut to just 1.61 seconds. We won’t even talk about the hard drive’s performance here. Ouch – it hurts just to look at the chart.

Reveal and Deep Dive Videos
Queued @22:50
Queued @23:45

NVMe SSD & I/O Performance Showcase Videos (pending)
 
Last edited:
Sony PlayStation 5

Official SSD & I/O Architecture Overview
JTnbWdi.jpg


Articles

Wired 04/16/2019 Article
When Spidey reappears in a totally different spot in Manhattan, 15 seconds have elapsed. Then Cerny does the same thing on a next-gen devkit connected to a different TV. (The devkit, an early “low-speed” version, is concealed in a big silver tower, with no visible componentry.) What took 15 seconds now takes less than one: 0.8 seconds, to be exact.
Wired 10/08/2019 Article
"If you look at a game like Marvel's Spider-Man," Cerny says, "there are some pieces of data duplicated 400 times on the hard drive." The SSD sweeps away the need for all that duping—so not only is its raw read speed dramatically faster than a hard drive, but it saves crucial space. How developers will take advantage of that space will likely differ; some may opt to build a larger or more detailed game world, others may be content to shrink the size of the games or patches. Either way, physical games for the PS5 will use 100-GB optical disks, inserted into an optical drive that doubles as a 4K Blu-ray player.
Eurogamer 10/9/2019 Article
In terms of the advantages, there are speed improvements, efficiency advantages (specifically, not having to replicate data across the drive owing to the virtual elimination of seek-time in solid-state media) while Bluepoint president Marco Thrush describes how instant access to data reduces friction in-game - meaning the elimination of locked doors that exist only to slow the player down while new data streams in behind the scenes (Microsoft made the exact same point in its Scarlett teaser).
Digital Foundry 03/29/2020 Article
Sony is doubling down on solid-state storage in providing a truly transformative next generation experience. Every couple of years, Mark Cerny travels the world, meeting dozens of developers and publishers and the integration of the SSD was the number one next-gen request. Sony's actual implementation is something else, with performance rated at two orders of magnitude faster than PlayStation 4. 2GB of data can be loaded in one quarter of a second, meaning that in theory, the entirety of PS5's 16GB can be filled in just two seconds. "As game creators, we go from trying to distract the player from how long fast travel is taking - like those Spider-Man subway rides - to being so blindingly fast that we might even have to slow that transition down," says Cerny.
Delivering two orders of magnitude improvement in performance required a lot of custom hardware to seamlessly marry the SSD to the main processor. A custom flash marries up to the SSD modules via a 12 channel interface, delivering the required 5.5GB/s of performance with a total of 825GB of storage. This may sound like a strange choice for storage size when considering that consumer SSDs offer 512GB, 1TB or more of capacity, but Sony's solution is proprietary, 825GB is most optimal match for the 12-channel interface and there are other advantages too. In short, Sony had more freedom to adapt its design: "We can look at the available NAND flash parts and construct something with optimal price performance. Someone constructing an M.2 drive presumably does not have that freedom, it would be difficult to market and sell if it were not one of those standard sizes," Mark Cerny says.
The controller itself hooks up to the main processor via a four-lane PCI Express 4.0 interconnect, and contains a number of bespoke hardware blocks designed to eliminate SSD bottlenecks. The system has six priority levels, meaning that developers can literally prioritise the delivery of data according to the game's needs.
The controller supports hardware decompression for the industry-standard ZLIB, but also the new Kraken format from RAD Game Tools, which offers an additional 10 per cent of compression efficiency. The bottom line? 5.5GBs of bandwidth translates into an effective eight or nine gigabytes per second fed into the system. "By the way, in terms of performance, that custom decompressor equates to nine of our Zen 2 cores, that's what it would take to decompress the Kraken stream with a conventional CPU," Cerny reveals.
A dedicated DMA controller (equivalent to one or two Zen 2 cores in performance terms) directs data to where it needs to be, while two dedicated, custom processors handle I/O and memory mapping. On top of that, coherency engines operate as housekeepers of sorts.
"Coherency comes up in a lot of places, probably the biggest coherency issue is stale data in the GPU caches," explains Cerny in his presentation. "Flushing all the GPU caches whenever the SSD is read is an unattractive option - it could really hurt the GPU performance - so we've implemented a gentler way of doing things, where the coherency engines inform the GPU of the overwritten address ranges and custom scrubbers in several dozen GPU caches do pinpoint evictions of just those address ranges."
All of this is delivered to developers without them needing to do anything. Even the decompression is taken care of by the custom silicon. "You just indicate what data you'd like to read from your original, uncompressed file, and where you'd like to put it, and the whole process of loading it happens invisibly to you and at very high speed," Cerny explains.
CBloom Rants
The Sony PS5 will have the fastest data loading ever available in a mass market consumer device, and we think it may be even better than you have previously heard. What makes that possible is a fast SSD, an excellent IO stack that is fully independent of the CPU, and the Kraken hardware decoder. Kraken compression acts as a multiplier for the IO speed and disk capacity, storing more games and loading faster in proportion to the compression ratio.
Sony has previously published that the SSD is capable of 5.5 GB/s and expected decompressed bandwidth around 8-9 GB/s, based on measurements of average compression ratios of games around 1.5 to 1. While Kraken is an excellent generic compressor, it struggled to find usable patterns on a crucial type of content : GPU textures, which make up a large fraction of game content. Since then we've made huge progress on improving the compression ratio of GPU textures, with Oodle Texture which encodes them such that subsequent Kraken compression can find patterns it can exploit. The result is that we expect the average compression ratio of games to be much better in the future, closer to 2 to 1.

Reveal and Deep Dive Videos


NVMe SSD & I/O Performance Showcase Videos
Queued @2:01
 
Last edited:
Microsoft Xbox Series S/X

Official SSD & I/O Architecture Overview

Xbox.com
Xbox Velocity Architecture – The Xbox Velocity Architecture is the new architecture we’ve created for the Xbox Series X to unlock new capabilities never-before seen in console development. It consists of four components: our custom NVMe SSD, a dedicated hardware decompression block, the all new DirectStorage API, and Sampler Feedback Streaming (SFS). This combination of custom hardware and deep software integration allows developers to radically improve asset streaming and effectively multiply available memory. It will enable richer and more dynamic living worlds unlike anything ever seen before. It also effectively eliminates loading times, and makes fast travel systems just that: fast..
Xbox.com
The Xbox Velocity Architecture was designed as the ultimate solution for game asset streaming in the next generation. This radical reinvention of the traditional I/O subsystem directly influenced all aspects of the Xbox Series X design. If our custom designed processor is at the heart of the Xbox Series X, the Xbox Velocity Architecture is the soul. Through a deep integration of hardware and software innovation, the Xbox Velocity Architecture will power next-gen gaming experiences unlike anything you have seen before.
The Xbox Velocity Architecture comprises four major components: our custom NVME SSD, hardware accelerated decompression blocks, a brand new DirectStorage API layer and Sampler Feedback Streaming (SFS).
Let’s dive deep into each component:
Custom NVME SSD: The foundation of the Xbox Velocity Architecture is our custom, 1TB NVME SSD, delivering 2.4 GB/s of raw I/O throughput, more than 40x the throughput of Xbox One. Traditional SSDs used in PCs often reduce performance as thermals increase or while performing drive maintenance. The custom NVME SSD in Xbox Series X is designed for consistent, sustained performance as opposed to peak performance. Developers have a guaranteed level of I/O performance at all times and they can reliably design and optimize their games removing the barriers and constraints they have to work around today. This same level of consistent, sustained performance also applies to the Seagate Expandable Storage Card ensuring you have the exact same gameplay experience regardless of where the game resides.
Hardware Accelerated Decompression: Game packages and assets are compressed to minimize download times and the amount of storage required for each individual game. With hardware accelerated support for both the industry standard LZ decompressor as well as a brand new, proprietary algorithm specifically designed for texture data named BCPack, Xbox Series X provides the best of both worlds for developers to achieve massive savings with no loss in quality or performance. As texture data comprises a significant portion of the total overall size of a game, having a purpose built algorithm optimized for texture data in addition to the general purpose LZ decompressor, both can be used in parallel to reduce the overall size of a game package. Assuming a 2:1 compression ratio, Xbox Series X delivers an effective 4.8 GB/s in I/O performance to the title, approximately 100x the I/O performance in current generation consoles. To deliver similar levels of decompression performance in software would require more than 4 Zen 2 CPU cores.
New DirectStorage API: Standard File I/O APIs were developed more than 30 years ago and are virtually unchanged while storage technology has made significant advancements since then. As we analyzed game data access patterns as well as the latest hardware advancements with SSD technology, we knew we needed to advance the state of the art to put more control in the hands of developers. We added a brand new DirectStorage API to the DirectX family, providing developers with fine grain control of their I/O operations empowering them to establish multiple I/O queues, prioritization and minimizing I/O latency. These direct, low level access APIs ensure developers will be able to take full advantage of the raw I/O performance afforded by the hardware, resulting in virtually eliminating load times or fast travel systems that are just that . . . fast.
Sampler Feedback Streaming (SFS): Sampler Feedback Streaming is a brand-new innovation built on top of all the other advancements of the Xbox Velocity Architecture. Game textures are optimized at differing levels of detail and resolution, called mipmaps, and can be used during rendering based on how close or far away an object is from the player. As an object moves closer to the player, the resolution of the texture must increase to provide the crisp detail and visuals that gamers expect. However, these larger mipmaps require a significant amount of memory compared to the lower resolution mips that can be used if the object is further away in the scene. Today, developers must load an entire mip level in memory even in cases where they may only sample a very small portion of the overall texture. Through specialized hardware added to the Xbox One X, we were able to analyze texture memory usage by the GPU and we discovered that the GPU often accesses less than 1/3 of the texture data required to be loaded in memory. A single scene often includes thousands of different textures resulting in a significant loss in effective memory and I/O bandwidth utilization due to inefficient usage. With this insight, we were able to create and add new capabilities to the Xbox Series X GPU which enables it to only load the sub portions of a mip level into memory, on demand, just in time for when the GPU requires the data. This innovation results in approximately 2.5x the effective I/O throughput and memory usage above and beyond the raw hardware capabilities on average. SFS provides an effective multiplier on available system memory and I/O bandwidth, resulting in significantly more memory and I/O throughput available to make your game richer and more immersive.
Through the massive increase in I/O throughput, hardware accelerated decompression, DirectStorage, and the significant increases in efficiency provided by Sampler Feedback Streaming, the Xbox Velocity Architecture enables the Xbox Series X to deliver effective performance well beyond the raw hardware specs, providing direct, instant, low level access to more than 100GB of game data stored on the SSD just in time for when the game requires it. These innovations will unlock new gameplay experiences and a level of depth and immersion unlike anything you have previously experienced in gaming
Articles
WindowsCentral
Microsoft's new custom SSD storage is central to Velocity Architecture on Xbox Series X, adopting an in-house NVMe solution, delivering unseen speeds in past generations. That provides 2.4 GB/s raw I/O throughput — or 4.8 GB/s compressed, enabled by a custom decompression block. Compared to the 120MB/s offered by Xbox One X, quick maths reveals up to 40 times increases could be a reality.
The hardware decompression block plays a vital role, allowing games to consume less space via compression on the SSD. That hardware is devoted to tackling run-time decompression, keeping games running smoothly without giving more work to the CPU. It uses Zlib, a general-purpose data-compression library, and a mysterious new system named "BCPack," geared to GPU textures.
We also have DirectStorage, building upon DirectX, and aimed at further reducing CPU workloads. The new Microsoft-built API seeks to optimize the efficiency of Xbox Series X asset streaming, with plans to expand to Windows devices moving forward. That couples with Sampler Feedback Streaming (SFS), streamlining GPU usage and loading only portions of textures demanded by a setting. Both provide software solutions that enhance the efficiency of games on Xbox Series X, taking full advantage of CPU and GPU gains.
PureXbox.com
As reported by IGN, the internal SSD provides you with 802GB to store your next-gen games, with the rest being taken up by OS and system files. It's not entirely unexpected, but it's a bit of a downer.
CNet
In an unscientific test, loading a Red Dead Redemption 2 save took around two minutes on an Xbox One X and about 30 seconds on the Xbox Series X.
Quick Resume feels like a game-changer. You can jump between games in about 10 seconds and Mike had four different games running at once. Games resume in the exact state you left them in, no reloading saves or returning to menu screens required.

Reveal and Deep Dive Videos


NVMe SSD & I/O Performance Showcase Videos
 
Last edited:
I believe NV had a demo on the RTX IO nvme storage, i think it was the marble night demo, showing improved loading times, not sure though.

That demo was mostly covering enhanced RT techniques and scaling/resolution performance when compared to the prior Turing architecture.
 
Nice, very competitive with RTX IO 14GB/S then.
You do realize that the 14GB/s number is just an assumption. They are not giving real world example, just like Xbox Velocity architecture assume a 2:1 compression ratio, thus you get double the speed while Sony 8-9GB/s number is their typical number using games. Basically Sony is showing a realistic number while RTX IO and Velocity is showing an ideal number. We still know very little about RTX IO, whether the throughput is limited to 14GB/s or it can scale higher assuming the ratio is higher (probably the later). If it can scale then how far can it scale? is a 4:1 compression ratio will be 28GB/s or there is a limit on that?
Afaik, for Xbox series, the maximum is over 6GB/s (source) so assuming that you put something in it that has 3:1 compression ratio it will not have a 3x multiplier to its speed but around 2.5 because the decompressor in series X will become the bottleneck. For PS5, that limit is 22GB/s (source). For RTX IO? Of course the possibility of RTX IO ended up being faster than PS5 IO even in apple vs apple comparison is high, but right now we still don't know much about it.

Having said all of this, all of them are mainly using compression not to gain speed but to gain space (use less space). Extra speed that come from the compression is just a bonus. You still make Xbox series games expecting you can get at least 2.4GB/s speed from the SSD and not 4.8GB/s since that 4.8GB/s can't be guaranteed. If we take RTX IO number at face value, then game install size targeting RTX IO (and Xbox series) should 50% smaller (compared to uncompressed install) while the same game on PS5 will only be around 30%-ish smaller. Or to look it from a different angle, Xbox series games install size should be around 25% smaller than the PS5 version, at least in theory.
Also if 2.4GB/s is the guaranteed speed for Xbox series, 5.5GB/s for PS5, then what is the guaranteed speed for RTX IO games? I can tell you, it will not be 7GB/s, probably not 3.5GB/s, probably 100MB/s for the near future because nobody will make an RTX IO only games. When SSD (and direct storage) become a requirement, games on RTX IO will still not target 7GB/s. Probably the target for PC games regarding the storage speed in the future is either 600MB/s (because of SATA SSD) or 2GB/s (around Xbox series speed). If you have 7GB/s, everything will load quicker, but for something involving gameplay like PS5 RnC, they will need to make it work on those slower SSD.
 
Basically Sony is showing a realistic number while RTX IO and Velocity is showing an ideal number.

RTX io is as ’realistic’ as PS5’s, we have only data from the companies behind the products. Youd have to measure each, which isnt done on any. I dont believe NV is lying, or ms or sony, there can be other threads about corruption and lies perhaps.
They all talk ideal situations.

Also, because of Xbox/pc velocity arch integration and teaming up with NV, we probably see games taking advantage of nvme/rtx tech on pc.
 
You do realize that the 14GB/s number is just an assumption. They are not giving real world example, just like Xbox Velocity architecture assume a 2:1 compression ratio, thus you get double the speed while Sony 8-9GB/s number is their typical number using games. Basically Sony is showing a realistic number while RTX IO and Velocity is showing an ideal number. We still know very little about RTX IO, whether the throughput is limited to 14GB/s or it can scale higher assuming the ratio is higher (probably the later). If it can scale then how far can it scale? is a 4:1 compression ratio will be 28GB/s or there is a limit on that?
Afaik, for Xbox series, the maximum is over 6GB/s (source) so assuming that you put something in it that has 3:1 compression ratio it will not have a 3x multiplier to its speed but around 2.5 because the decompressor in series X will become the bottleneck. For PS5, that limit is 22GB/s (source). For RTX IO? Of course the possibility of RTX IO ended up being faster than PS5 IO even in apple vs apple comparison is high, but right now we still don't know much about it.

Having said all of this, all of them are mainly using compression not to gain speed but to gain space (use less space). Extra speed that come from the compression is just a bonus. You still make Xbox series games expecting you can get at least 2.4GB/s speed from the SSD and not 4.8GB/s since that 4.8GB/s can't be guaranteed. If we take RTX IO number at face value, then game install size targeting RTX IO (and Xbox series) should 50% smaller (compared to uncompressed install) while the same game on PS5 will only be around 30%-ish smaller. Or to look it from a different angle, Xbox series games install size should be around 25% smaller than the PS5 version, at least in theory.
Also if 2.4GB/s is the guaranteed speed for Xbox series, 5.5GB/s for PS5, then what is the guaranteed speed for RTX IO games? I can tell you, it will not be 7GB/s, probably not 3.5GB/s, probably 100MB/s for the near future because nobody will make an RTX IO only games. When SSD (and direct storage) become a requirement, games on RTX IO will still not target 7GB/s. Probably the target for PC games regarding the storage speed in the future is either 600MB/s (because of SATA SSD) or 2GB/s (around Xbox series speed). If you have 7GB/s, everything will load quicker, but for something involving gameplay like PS5 RnC, they will need to make it work on those slower SSD.
It's pretty clear that Microsoft is banking on Sampler Feedback Streaming to be the hero here and not just sheer throughout. It's pretty unfair to talk about the PS5's IO solution and then ignore a big part of the Xbox Series X/S's IO solution.
 
Only time and the games will tell...
Yup. Things like loading times and hitches in in-game streaming will be visible in games like Assassin's Creed Valhalla day one, i.e. in five-to-six weeks! :runaway: What we'll never know, unless a dev speaks out, is how any of these smart systems impacts performance because there's just no way to measure it outside the dev environment.
 
Does the new Nvidia card perform the decompression on dedicated hardware that's present on the card?

I'm wondering if it'd need to partition some CUDA cores for the decompression, if so then presumably an open world game requiring constant loading would impact performance.
 
You do realize that the 14GB/s number is just an assumption. They are not giving real world example, just like Xbox Velocity architecture assume a 2:1 compression ratio, thus you get double the speed while Sony 8-9GB/s number is their typical number using games. Basically Sony is showing a realistic number while RTX IO and Velocity is showing an ideal number.

This is not correct.

Nvidia have been explicit that 2:1 is the expected typical compression ratio with that resulting in a 2x effective uplift to IO. See here:

Nvidia said:
Compression ratios are typically 2:1, so that would effectively amplify the read performance of any SSD by 2x.

I see no reason to doubt this, especially with RAD Gametools recent announcement of the PS5 being able to achieve a similar real world compression ratio. Heck, given that RTXIO uses shaders for decompression, and thus is not necessarily limited to one specific compression routine, it's entirely possible that it can also use Oodle Texture + Kraken just like the PS5.

We still know very little about RTX IO, whether the throughput is limited to 14GB/s or it can scale higher assuming the ratio is higher (probably the later).

Again, Nvidia have been explicit that it can:

Nvidia said:
RTX GPU's are capable of decompression performance beyond the limits of even Gen4 SSDs,

If it can scale then how far can it scale?

This we don't know, however we can imply some scenario's. Nvidia have been quoted as saying the performance impact on the GPU of RTX IO is tiny:

Nvidia Interview said:
When asked about the performance hit of RTX IO on the GPU itself, an NVIDIA representative responded that RTX IO utilizes only a tiny fraction of the GPU, “probably not measurable”.

Which GPU we don't know. But since we do know that RTX IO will work on all RTX GPU's including Turing, then we can assume a 2060 is sufficient to handle full rate decompression at 14GB/s without crippling game performance, so a 3080 should be capable of 2-3x that.

is a 4:1 compression ratio will be 28GB/s or there is a limit on that? Afaik, for Xbox series, the maximum is over 6GB/s (source) so assuming that you put something in it that has 3:1 compression ratio it will not have a 3x multiplier to its speed but around 2.5 because the decompressor in series X will become the bottleneck. For PS5, that limit is 22GB/s (source). For RTX IO? Of course the possibility of RTX IO ended up being faster than PS5 IO even in apple vs apple comparison is high, but right now we still don't know much about it.

Compression ratio's are determined by the routine that's used. This is fixed on the consoles because they use hardware decompression blocks. So Microsoft is limited by the compression ratio of BCPACK and Sony is limited by the compression ratio of Kraken (with RDO encoded textures). Both end up around 2:1 and that's unlikely to go any higher. RTX IO appears to use compute shaders so may be able to leverage new, higher compression ratio algorithms in the future if such a thing becomes available but I wouldn't count on it. It's probably a good assumption to make that we'll be at 2:1 for the remainder of this generation unless ML texture upscaling becomes a thing (in which Nvidia would have a massive advantage thanks to the Tensor cores).

So scaling will come from the drive speed itself. Fixed in both consoles and effectively on PC until the launch of PCIe 5.0, at which point we'll probably be onto RTX IO2 anyway!

Also if 2.4GB/s is the guaranteed speed for Xbox series, 5.5GB/s for PS5, then what is the guaranteed speed for RTX IO games? I can tell you, it will not be 7GB/s, probably not 3.5GB/s, probably 100MB/s for the near future because nobody will make an RTX IO only games. When SSD (and direct storage) become a requirement, games on RTX IO will still not target 7GB/s. Probably the target for PC games regarding the storage speed in the future is either 600MB/s (because of SATA SSD) or 2GB/s (around Xbox series speed). If you have 7GB/s, everything will load quicker, but for something involving gameplay like PS5 RnC, they will need to make it work on those slower SSD.

That's why games scale. Just as cutting edge AAA titles can run on PC graphics hardware much weaker than the consoles while scaling up beyond console graphics at the high end, so too will storage requirements. There are plenty of ways to scale storage requirements, i.e. reduce texture resolution, reduce LOD/draw distance, add in game load screens where required (like Oblivion), pre-cache more on systems with plenty of RAM etc..

There's no reason why games can't take full advantage of the highest speeds available from RTX IO while still scaling all the way down to SATA SSD's and maybe even mechanical HDD's - although I do expect SSD's to become a minimum requirement on many PC games in the near-mid term.
 
Last edited:
Does the new Nvidia card perform the decompression on dedicated hardware that's present on the card?

I'm wondering if it'd need to partition some CUDA cores for the decompression, if so then presumably an open world game requiring constant loading would impact performance.

No it's in shaders (async). I don't think anything is specifically partitioned off for this, especially given that it scales back to the Turing generation. It's simply down to the developer to allocate whatever portion of the GPU they feel appropriate for their game (although it's unclear how much control they'll have over this, my guess is not much). In either case the performance impact is supposedly very small:

https://www.back2gaming.com/guides/nvidia-rtx-io-in-detail/

When asked about the performance hit of RTX IO on the GPU itself, an NVIDIA representative responded that RTX IO utilizes only a tiny fraction of the GPU, “probably not measurable”. Developers will have full freedom how they utilize RTX IO especially for games that are GPU-intensive, the developer understands the needs best and will have the best knowledge in which method to do.
 
If it's both ~14GB/s decompression and only a "tiny fraction" of the GPU, then that's very impressive. Unfortunately we don't know what proportion of the GPU is required for 100MB/s compared to 14GB/s.

The IO hardware on the Series X doesn't take up much space on the APU. Interesting to see how it compares to the PS5, if that console is estimated to be twice as fast, then will the IO be twice as large?

I wonder why they decided to have dedicated hardware for decompression if the GPUs are so good at it. At least then a non-open world game could have more shader power when not decompressing. Well, a little bit extra.
 

Attachments

  • mU23LQCtEe9ePVZe3DpU7S-1200-80.jpg
    mU23LQCtEe9ePVZe3DpU7S-1200-80.jpg
    177 KB · Views: 9
Nvidia have been explicit that 2:1 is the expected typical compression ratio with that resulting in a 2x effective uplift to IO. See here:
Except compression ratios in games typically don't even reach 2:1 on average.
Unless they're claiming those compression ratios are already counting with the delta color compression that is natively processed by the GPU, at which point the 14Gbps statement is just dishonest.

Ampere's 14GB/s claim is similar to the PS5's theoretical max of 23GB/s claim. Don't be surprised if the PS5 gets closer to 23GB/s than any Ampere gets to 14GB/s*
* - assuming the graphics card hasn't died of a faulty PCB yet.


Besides, Sony is showing measured throughput numbers backed up by developers, with actual games running on it.
So far nvidia has shown a slide and PR answers to an online FAQ.
 
Back
Top