AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

manux · Dec 2, 2020

trinibwoy said:
I’m more interested to see how infinity cache scales down.

But would amd even need infinity cache on lower end products given 256bit gddr6 bus would provide enough bandwidth? Do we know if infinity cache is in all rdna2 chips or only in the high end? I wonder which is better compromise for smaller chips, wider bus or make chip bigger by adding cache? One would also have to consider manufacturing capability, more chips per wafer and wafers are scarce resource at the moment. I suppose another angle is to have narrower bus and more memory to play the memory capacity game while taking hit on chip size. Though memory is not cheap. More memory will add to bom and making competing with price more difficult.

trinibwoy · Dec 2, 2020

manux said:
But would amd even need infinity cache on lower end products given 256bit gddr6 bus would provide enough bandwidth? Do we know if infinity cache is in all rdna2 chips or only in the high end? I wonder which is better compromise for smaller chips, wider bus or make chip bigger by adding cache? One would also have to consider manufacturing capability, more chips per wafer and wafers are scarce resource at the moment. I suppose another angle is to have narrower bus and more memory to play the memory capacity game while taking hit on chip size. Though memory is not cheap. More memory will add to bom and making competing with price more difficult.

Presumably AMD sponsored games and AMDs drivers will be tuned to take advantage of the cache. It will be interesting to see how performance scales if the cache is reduced or removed on downmarket parts.

manux · Dec 2, 2020

trinibwoy said:
Presumably AMD sponsored games and AMDs drivers will be tuned to take advantage of the cache. It will be interesting to see how performance scales if the cache is reduced or removed on downmarket parts.

I would have expected the obvious optimization to use infinitycache for ray tracing bvh structure while using main ram for something else. But this doesn't manifest itself in dxr enabled games/3dmark benchmarks. Perhaps sharing tmu's between raster and ray tracing prevents this.

arandomguy · Dec 2, 2020

The thing I haven't really seen brought up regarding Infinity Cache how well it scale given future data loads? Unless I'm not clear on it's functionality, I don't believe it's strictly correlated to resolution (frame buffer) size? Which could interesting given the questions regarding how much VRAM will be needed for future workloads if there also ends up being higher memory pressure on cache sizes as well.

Also has anyone tested in practice how running multi-displays/multi-tasking impacts performance due to the unique nature of IC setup?

manux said:
But would amd even need infinity cache on lower end products given 256bit gddr6 bus would provide enough bandwidth? Do we know if infinity cache is in all rdna2 chips or only in the high end? I wonder which is better compromise for smaller chips, wider bus or make chip bigger by adding cache? One would also have to consider manufacturing capability, more chips per wafer and wafers are scarce resource at the moment. I suppose another angle is to have narrower bus and more memory to play the memory capacity game while taking hit on chip size. Though memory is not cheap. More memory will add to bom and making competing with price more difficult.

According to AMD a non trivial part of the efficiency uplift (resulting from "IPC" uplift) is due to Infinity Cache.

From a product stand point there is also the factor that AMD is targeting the mobile market with RDNA2 as well, not just desktops.

Also from a product stand point I suspect another factor in using Infinity Cache and the relatively smaller buses is to save on memory costs and therefore enabling higher memory sizes also as a product advantage.

manux · Dec 2, 2020

arandomguy said:
The thing I haven't really seen brought up regarding Infinity Cache how well it scale given future data loads? Unless I'm not clear on it's functionality, I don't believe it's strictly correlated to resolution (frame buffer) size? Which could interesting given the questions regarding how much VRAM will be needed for future workloads if there also ends up being higher memory pressure on cache sizes as well.

Also has anyone tested in practice how running multi-displays/multi-tasking impacts performance due to the unique nature of IC setup?

According to AMD a non trivial part of the efficiency uplift (resulting from "IPC" uplift) is due to Infinity Cache.

From a product stand point there is also the factor that AMD is targeting the mobile market with RDNA2 as well, not just desktops.

Also from a product stand point I suspect another factor in using Infinity Cache and the relatively smaller buses is to save on memory costs and therefore enabling higher memory sizes also as a product advantage.

To add, I wonder how the cache scales if future games use more memory per frame. We already see performance degrading when resolution is increased. rdna2 is a beast compared to ampere @1080, gap is closed at 1440p and ampere wins at 4k. Is this scaling due to cache/limited bandwidth or something else?

Rootax · Dec 2, 2020

I wonder if IC is a long term plan (like, we we'll see that again in rdna3) or if it was a unique tech to catch up with ampere on rasterisation.

If it's a long term plan, does cache like that scale well with more advanced lithography ?

manux · Dec 2, 2020

Rootax said:
I wonder if IC is a long term plan (like, we we'll see that again in rdna3) or if it was a unique tech to catch up with ampere on rasterisation.

If it's a long term plan, does cache like that scale well with more advanced lithography ?

Another angle is if amd only had choice between gddr6 and hbm2 they had to do something to not starve the beast. Nvidia went with the surprise solution of gddr6x.

eastmen · Dec 2, 2020

Rootax said:
I wonder if IC is a long term plan (like, we we'll see that again in rdna3) or if it was a unique tech to catch up with ampere on rasterisation.

If it's a long term plan, does cache like that scale well with more advanced lithography ?

scaling isn't the best but i don't see why they would get rid of it. The vast majority of users wont go past 4k for a long time. So even if they just kept it at a 128mb it will still take strain off the graphics ram.

Digidi · Dec 2, 2020

CarstenS said:
Need to sort out irregularities first, waiting for answers from AMD right now.

@CarstenS Do you have an update on the list and strip 0% culling values? I'm realy intressted in this value for the 6800xt and 3090. Thank you in advanced.

NightAntilli · Dec 2, 2020

Is there any way to test the limits of the infinity cache right now...? I'm trying to think of a way, but I come up empty. It must require something that uses a lot of bandwidth, and is measurable, so you can see at which point the bandwidth drops off a cliff.

Scott_Arm · Dec 2, 2020

RDNA2 performance guide from gpuopen

https://gpuopen.com/performance/

manux · Dec 2, 2020

NightAntilli said:
Is there any way to test the limits of the infinity cache right now...? I'm trying to think of a way, but I come up empty. It must require something that uses a lot of bandwidth, and is measurable, so you can see at which point the bandwidth drops off a cliff.

Maybe trying to game on 8k and see if performance scales based on pixels rendered or if perf drops off? Would 16GB be enough for 8k?

Rootax · Dec 2, 2020

eastmen said:
scaling isn't the best but i don't see why they would get rid of it. The vast majority of users wont go past 4k for a long time. So even if they just kept it at a 128mb it will still take strain off the graphics ram.

Yeah. I mean, at one point it will be, is the cache more usefull than more calculations units I guess. Damn we need faster memory...

manux · Dec 2, 2020

eastmen said:
scaling isn't the best but i don't see why they would get rid of it. The vast majority of users wont go past 4k for a long time. So even if they just kept it at a 128mb it will still take strain off the graphics ram.

If more of that 16GB ram is used per frame that can put more strain on memory subsystem without needing to increase resolution.

3dilettante · Dec 2, 2020

Jawed said:
The point of my comparison: "work smarter nor harder".

AMD appears to be using primitive shaders, "to work smarter" in RDNA2. So we already have a proof. And that's before developers write mesh shaders.

Primitive shaders at least originally still slot into the traditional pipeline, which mesh shaders dispense with.
Some of the bottlenecks mesh shaders discard remain, although it's possible the description of what primitive shaders do has changed since AMD last discussed them in any depth with Vega. RDNA's primitive shader figures have peak figures that are significantly more modest than what Vega promised and failed to deliver.

The primary culling benefit primitive shaders provide is something Nvidia mentioned off-hand as a possibility for mesh shaders, if they felt the need. Nvidia mentioned a specific case where they seemed to allow for a benefit, but the it sounded like Nvidia's own front end hardware was capable enough that the additional step wasn't necessary.

Yes, it appears NVidia is spending way more die area on ray acceleration... and all the comparisons so far are based on code optimised for that hardware.

Without getting a look at silicon, it's assumed that Nvidia is adding more than a scheme that AMD's patent indicates is minimal extra area. In absolute terms, it may be comparing different single-digit percentages of overall die area.

There's no doubt that you don't do a dumb port of a MIMD algorithm to SIMD hardware, we've got over 10 years' proof of that

There's a long-time synergy between rasterization, SIMD, caches, and DRAM. A lot of the sizing for the various elements like screen tiles and their associated SIMD processors is that DRAM buses and DRAM arrays work very well at those granularities. There may need to be a more thorough accounting of what can be changed. Rasterization is nice in that there's now an established set of techniques that allow for rapidly building its acceleration structures on the fly, and the common case aligns with the hardware and memory architectures. Cache concepts and DRAM have changed even more slowly than the fixed-function pipeline.

BVH isn't built at the same time as geometry is being rasterized, and a lot of the research and complexity is trying to fit a divergent workload into the confines of an overall architecture that is not well-suited to it all the way to DRAM.

A custom BVH traversal algorithm that can sample from a texture might be useful

Too bad the sampling would directly compete with the BVH program running through the same cache, at least in AMD's case.

NightAntilli · Dec 2, 2020

Scott_Arm said:
RDNA2 performance guide from gpuopen

https://gpuopen.com/performance/

That's giving nVidia a handbook on how to tank their performance.

Digidi · Dec 2, 2020

Digidi said:
anybody know why we see here Raytracing task but no primitive shader? It is strange that when you use Raytracing you stick in the old render pipeline... @7:30

Realy strange that amd use for everything elese primitive shaders, but when you start ray tracing it go back to the old pipline. Sombody can tell me why?

Scott_Arm · Dec 2, 2020

Digidi said:
Realy strange that amd use for everything elese primitive shaders, but when you start ray tracing it go back to the old pipline. Sombody can tell me why?

I don't think it does. Primitive Shaders are not exposed directly on pc except maybe if they put out some vulkan extension. What'll happen is you'll write standard vertex shaders and the driver will convert them to primitive shaders as it sees fit, or at least that's how I understand it'll work.

Digidi · Dec 2, 2020

From a quote from a programmer, they are always on on RDNA2:

https://twitter.com/x/status/1329259816162824194

Scott_Arm said:
I don't think it does. Primitive Shaders are not exposed directly on pc except maybe if they put out some vulkan extension. What'll happen is you'll write standard vertex shaders and the driver will convert them to primitive shaders as it sees fit, or at least that's how I understand it'll work.

3dcgi · Dec 2, 2020

Digidi said:
Realy strange that amd use for everything elese primitive shaders, but when you start ray tracing it go back to the old pipline. Sombody can tell me why?

What do you mean? Ray tracing uses compute.

AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

manux

trinibwoy

Meh

manux

arandomguy

manux

Rootax

manux

eastmen

Digidi

NightAntilli

Scott_Arm

manux

Rootax

manux

3dilettante

NightAntilli

Digidi

Scott_Arm

Digidi

3dcgi

Similar threads