AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

Long time, no see... :)

Does nowadays screen refresh stuff (Ramdacs are long gone, but that's where I am coming from back here.) need solid memory area to read for screen refresh or are they able to read individual tiles making the screen out of them?
 
I think at 4k they would almost certainly want to pin the framebuffer, because otherwise that and the texture data would just flush everything every frame and only leave them gaining advantage in locality in address, without any benefit from locality in time.
RDNA inherited binning rasterizers from Vega, didn't it?

Moreover, Linux driver patches do seem to hint that memory pages can be marked with LLC No Allocate. So in theory, the driver can mark pages of resources to skip LLC, either based on presets, or in a fancier way, live performance counters from the LLC.
 
I'm not convinced, reading a 4k frame buffer at 100 FPS, requires only 3.2 GB/s, and could be even less with compression.
GPUs render the frame buffer in some tiled way, frequently reusing tiles preventing them being swapped out the cache by other data.

There are lots of other buffers to worry about too that are read multiple times each frame. I would think tiling is less helpful for chip wide L2/3 caches where there is no tile based affinity.
 
Do you really think AMD who designed the SoCs for the XSX and PS5 don't have a full fledged implementation of DS? No-one likes proprietary standards, not even Nvidia. The scant adoption of RTX before it became a part of DirectX shows as much.

But what is a full fledged implementation of DS? No-one knows yet as to date, details have been fairly light. Does it mandate that data transfers directly from SSD to GPU via P2P DMA without any involvement from the CPU and that decompression of the IO stream is done on the GPU? Or is it (as Microsoft's commentary so far suggests) just an API for greatly lowering the system overhead of IO requests, with all that other stuff being a propriety Nvidia (RTX-IO) solution.

I seriously hope it's the former, and the Anandtech commentary posted above give us some hope in that regard, but if it were then I would have expected AMD to make that very clear in their own announcement given the amount of attention RTX-IO has received. That fact that they didn't has me worried. This would have been a very. very easy thing for them to shut down in the reveal, but they didn't. I hope I'm just being overly pessimistic.

Either ways, DS is not really relevant for the PC market at the moment and won't be for some time due to the fragmented nature of PC hardware and software. It will be a while before an SSD is a hard requirement for a PC game..

I'd definitely disagree on this. Direct Storage is needed right now. Plenty of cross platform games will be supporting it within weeks on the XSX and if it were available on PC then those advantages would carry over for anyone that has an NVMe drive. The game doesn;t have to make NVMe or even a regular SSD a requirement for those that have them to be able to take advantage of faster load times.

It's also worth noting that AC Valhalla does state an SSD as a requirement in all but the lowest of pre-sets and even there is recommends one. I expect other games to follow suit in short order.

Maybe they want to upsell 3070 buyer to 6800. For $80 more you get double the VRAM. The base VRAM bandwidth is not only faster, but 6800 also have infinity cache. No matter how you look at it, 6800 is a much better buy than 3070, especially if you mainly game and don't care about tensor and CUDA stuff. Of course after that you might think "add $70 more and you get 6800XT".....

This is exactly where my thinking is right now. I was mostly settled on a 3070 before both launched but that performance and those prices have me thinking. That said, there are other factors to consider which we don't have enough information about yet to make a fully informed choice so I'm going to wait a little longer. Those being RT performance, Direct Storage capabilities (i.e. does RDNA 2 replicate what RTX-IO is doing), DLSS/AMD equivalent, and of course Nvidia's potentially very imminent 3070Ti.


I agree wit the overall sentiment but if this forces NV and Intel to address the same weakness in the PC's dGPU based architecture then I'm all for it. The more our separate CPU's and GPU's with separate memory pools can act as if they are on a UMA (without the inherent draw backs) the better for the PC platform as a whole.
 
wow, Nappe1. Haven't seen you for years!
yep, last time I posted here, it was 2013. Heck, one of my last threads I talked about "framing memories" in my and my girlfriend's new rental flat... and that was 2011! Even though we get married and moved to our own apartment, the chips are still framed and on the wall as display. Too bad that few people nowadays know that they are looking an unicorn and it's doo doo while watching them. :D
From my real "active" years it has been almost 2 decades. It was 2002-2004 time frame when everything I was interested for went horribly horribly wrong.

I did notice launch of Iris Pro, but it was not enough to get me coming back. However when there's a mention of remarkable amount on-chip RAM in GFX chip, you bet I am reading the information. :) I pretty quickly calculated the same maths as people have done here: 128MiB 4096 bits bus which is most likely divided to two 2048 bits wide parts. both serving 4 32bit wide external memory channels. I also started to wonder how they have designed the frame buffer writes and readouts so that it does not ruin the cache efficiency too much....

So, any idea if this is ArtX style SRAM or perhaps Iris Pro style eDRAM? I am betting the first one, just because AMD/ATI's history and at least it used to be easier to approach than eDRAM.
 
I did notice launch of Iris Pro, but it was not enough to get me coming back. However when there's a mention of remarkable amount on-chip RAM in GFX chip, you bet I am reading the information. :) I pretty quickly calculated the same maths as people have done here: 128MiB 4096 bits bus which is most likely divided to two 2048 bits wide parts. both serving 4 32bit wide external memory channels.

That's not the math. Most likely, it's split into 16 entirely separate cache slices, each of which serves 512 bits per cycle. It's very instructive to know that it's fundamentally quite similar to the L3 in modern Zen CPUs, just (probably) cache lines that are twice as long and with twice the bus width per slice.

On GDDR6, the external memory channels are 16-bit wide, and there are two of them per chip. So one 8MB, 512-bit cache slice per memory channel.

And yes, it's definitely SRAM. eDRAM is gone, it's not compatible with modern logic lithography.
 
Nvidia's potentially very imminent 3070Ti.
Just buy it. You'll be waiting a year+ before DirectStorage actually makes a difference to any games.

I agree wit the overall sentiment but if this forces NV and Intel to address the same weakness in the PC's dGPU based architecture then I'm all for it.
How do you know NVidia isn't already doing this? Why would Intel be involved?
 
That's not the math. Most likely, it's split into 16 entirely separate cache slices, each of which serves 512 bits per cycle. It's very instructive to know that it's fundamentally quite similar to the L3 in modern Zen CPUs, just (probably) cache lines that are twice as long and with twice the bus width per slice.

On GDDR6, the external memory channels are 16-bit wide, and there are two of them per chip. So one 8MB, 512-bit cache slice per memory channel.

And yes, it's definitely SRAM. eDRAM is gone, it's not compatible with modern logic lithography.

Thanks for the info... It makes perfectly sense. Nevertheless, I am still interested how they cope with the frame buffers...
Sorry being soooo outdated, but As I asked in my previous post, how the screen refresh works nowadays? in ramdac days, you had to have solid front buffer for the ram dac to read the scanlines for sending analog RGB values which electron tube then refreshed to the screen. Do you need such thing anymore or can the rendered tiles read straight to the screen?
 
On GDDR6, the external memory channels are 16-bit wide, and there are two of them per chip. So one 8MB, 512-bit cache slice per memory channel.
You sure they've split it into (more than two) slices? I don't think many of the theorized uses for it would really like small slices like that. I'd put my money on either 64 or 32 MB slices.
 
You sure they've split it into (more than two) slices? I don't think many of the theorized uses for it would really like small slices like that. I'd put my money on either 64 or 32 MB slices.

Then you'd have to either deal with more than once access per slice per clock, or with very wide busses. It's much easier to just split the cache into slices that each conveniently serve a request per slice.

It's hinted to in AMDs slide here:

3xQgfOY.png

But they referred to the cache being based on what's used in Zen in the presentation, and Zen caches are split into 4MB slices (for Zen2 at least).

(edit) and this has no impact on use. All addresses are spread across the slices based on few of the last bits in the address. Any user will evenly access all of the slices, except rops which are homed to a specific slice, probably.
 
Just buy it. You'll be waiting a year+ before DirectStorage actually makes a difference to any games.

I certainly hope it won't be that long before we see basic DS integration but we know both RDNA2 and Ampere support that so no worries there. If CPU bypass and GPU decompression are Nvidia exclusives though then yes I agree we could be looking at those timescales before we see games using it. And in fact usage is likely to be limited in that case anyway so arguably it won't matter that much anyway. Still, it'll make RDNA2 more attractive to me if it does support same functionality as RTX-IO as a fundamental part of Direct Storage.

How do you know NVidia isn't already doing this? Why would Intel be involved?

I assumed since you need a 5000 series CPU and 500 series motherboard to make this work then there is some specific enablement required on the CPU/platform side.
 
NVidia marketing designed to make stupid people think it's something more than DirectStorage.

I'm actually not sure that is the case here. As someone commented earlier DirectStorage does not seem to have specifics with respect to decompression.

For instance for the XBX Microsoft seems to be specific and separate out hardware decompression and DirectStorage as distinct parts of it's "Velocity Architecture"

https://news.xbox.com/en-us/2020/07/14/a-closer-look-at-xbox-velocity-architecture/

The Xbox Velocity Architecture comprises four major components: our custom NVME SSD, hardware accelerated decompression blocks, a brand new DirectStorage API layer and Sampler Feedback Streaming (SFS).

Also with how it's phrased in the DX blog it also suggests that how decompression is handled is a separate entity as well, while DirectStorage is just an API that is more efficient at handling data I/O requests - https://devblogs.microsoft.com/directx/directstorage-is-coming-to-pc/

Similarly how it's phrased in the Ampere whitepaper RTX I/O is what they term for their GPU hardware decompression that DirectStorage can leverage.

Now presumably AMD will have some mechanism in place that is similar (as it doesn't seem like a hard challenge), it could just be they chose not to name it (or haven't) yet.
 
I'm actually not sure that is the case here. As someone commented earlier DirectStorage does not seem to have specifics with respect to decompression.

DirectStorage has to come with some set of recommended/supported algorithms. Otherwise there would have to be install time compression to the format preferred by user's system. This would potentially lead to all kinds of problems such as making the game engine development/optimization tricky/impossible as engine wouldn't know how the compressed data is laid out on disc. This would also lead to complications on updating games(decompress game, update, compress again). We will get to know the algorithms used once microsoft releases beta of directstorage some time next year.
 
Last edited:
NVidia marketing designed to make stupid people think it's something more than DirectStorage.

Stupid people and the entire gaming media prior to yesterday it would seem. Unless I've missed a publication, or Microsoft announcement that explains that RTX-IO is specifically doing nothing outside of the base Direct Storage functionality. Because I've certainly seen plenty of articles suggesting the opposite.

Here's Microsofts explanation of what Direct Storage is. They go into quite a bit of detail here but no mention of GPU based decompression or direct GPU to SSD data transfers:

https://devblogs.microsoft.com/directx/directstorage-is-coming-to-pc/

And here's Nvidia's explanation of RTX-IO:

https://www.nvidia.com/en-gb/geforce/news/rtx-io-gpu-accelerated-storage-technology/

Nvidia said:
NVIDIA RTX IO plugs into Microsoft’s upcoming DirectStorage API, which is a next-generation storage architecture designed specifically for gaming PCs equipped with state-of-the-art NVMe SSDs, and the complex workloads that modern games require. Together, the streamlined and parallelized APIs, specifically tailored for games, allow dramatically reduced IO overhead and maximize performance/bandwidth from NVMe SSD to your RTX IO-enabled GPU.

Specifically, NVIDIA RTX IO brings GPU-based lossless decompression, allowing reads through DirectStorage to remain compressed while being delivered to the GPU for decompression. This removes the load from the CPU, moving the data from storage to the GPU in its more efficient, compressed form, and improving I/O performance by a factor of 2.

Note the plurality of API's. This isn't a case of RTX-IO = Direct Storage re-branded. These are two separate technologies working in tandem

Here is what TechPowerUp thinks of it:

https://www.techpowerup.com/271705/...stack-here-to-stay-until-cpu-core-counts-rise

With rise in storage bandwidth, the IO load on the CPU rises proportionally, to a point where it can begin to impact performance. Microsoft sought to address this emerging challenge with the DirectStorage API, but NVIDIA wants to build on this.

....

NVIDIA RTX IO is a concentric outer layer of DirectStorage, which is optimized further for gaming, and NVIDIA's GPU architecture. RTX IO brings to the table GPU-accelerated lossless data decompression,

....

There is, however, a tiny wrinkle. Games need to be optimized for DirectStorage. Since the API has already been deployed on Xbox since the Xbox Series X, most AAA games for Xbox that have PC versions, already have some awareness of the tech, however, the PC versions will need to be patched to use the tech. Games will further need NVIDIA RTX IO awareness, and NVIDIA needs to add support on a per-game basis via GeForce driver updates.

I guess even Digital Foundry would fall into the stupid category:

Digital Foundry said:
Working alongside the DirectStorage API built into the upcoming Xbox Series X, RTX IO "enables rapid GPU-based loading and game asset decompression, accelerating input/out performance by up to 100x compared with hard drives and traditional storage APIs." That should allow for higher frame-rates as well as "near-instantaneous game loading" - not bad if it lives up to that description!

To be clear, I'm not saying that direct data transfers from SSD to GPU along with GPU based decompression aren't a fundamental and mandatory requirement of Direct Storage support - I would be very happy if they are. But I am saying that none of the information we've had on it to date suggests that it is, and much of that information at least hints that it may not be.
 
Back
Top