AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

The ray tracing units on Navi21 are doing the denoising too ?
AMD's Ray Accelerators take the definition of a ray and works out if the ray passes either:
  • in to the space defined by a cuboid
  • on to the surface of a triangle
The idea is to find all the triangles that a ray touches. First the ray is traced through a set of nested cuboids. Once the smallest cuboid has been identified, then the set of triangles inside it are examined to find the triangle that the ray hits.

Denoising takes a set of ray tracing results and fills in the gaps. It is a computational step after all of the rays have been traced. If you keep the ray results for several frames, then you can also apply temporal denoising, which is similar to temporal anti-aliasing.

So denoising is a computational process that doesn't directly involve tracing rays.

It can be argued that a denoiser could be improved by selectively tracing new rays where the noise is worst. Also, you can argue that by measuring noise over time, you can work out where to "bias" the tracing of rays to reduce noise in the worst parts of the frame most aggressively and as quickly as possible. So you could call this an "adaptive denoising tracer" or something...

https://alain.xyz/blog/ray-tracing-denoising

ideal-denoiser.svg


Denoising can help bridge the gap between a low sample per pixel image and ground truth by reusing previous samples through spatio-temporal reprojection - adaptively resampling radiance or statistical information for importance sampling, and using filters such as fast gaussian/bilateral filters or AI techniques like denoising autoencoders and upscaling though super sampling.

While denoising isn't perfect as temporal techniques can introduce a lag in radiance and any filter will introduce some loss of sharpness due to it attempting to blur the original image, guided filters can help maintain sharpness, and adaptively sampling or increasing samples per pixels for each frame can make the difference between denoised and ground truth images negligable. Still, there's no substitute for higher samples per pixel, so experiment with these techniques with different sample per pixel (spp) counts.
 
If i see these benchmarks we can say we are havely frontend bound. Nvidias high Shader count can not be filled with work because the frontend is to small for this. As more and more Pixel take in account (4k) as more and more nvidia is coming clother. I think if we have 8k benchamarks Nvidia will rule about AMD because of Shaderr power.

Will be nice to see 8k benchmarks on AMD and NVidia cards with heave shader effects on and off. If shadereffects off AMD will run circles with it's polygons and rops output because of higher pixel througouthput. If you now activate all shader effect the situation will turn and Nvidia is running circles around AMD.
 
If i see these benchmarks we can say we are havely frontend bound. Nvidias high Shader count can not be filled with work because the frontend is to small for this. As more and more Pixel take in account (4k) as more and more nvidia is coming clother. I think if we have 8k benchamarks Nvidia will rule about AMD because of Shaderr power.

Will be nice to see 8k benchmarks on AMD and NVidia cards with heave shader effects on and off. If shadereffects off AMD will run circles with it's polygons and rops output because of higher pixel througouthput. If you now activate all shader effect the situation will turn and Nvidia is running circles around AMD.

So basically, AMD does better in older style games whilest NVs gpus are more towards the future of games ala UE5 demo?
 
If i see these benchmarks we can say we are havely frontend bound. Nvidias high Shader count can not be filled with work because the frontend is to small for this. As more and more Pixel take in account (4k) as more and more nvidia is coming clother. I think if we have 8k benchamarks Nvidia will rule about AMD because of Shaderr power.

Will be nice to see 8k benchmarks on AMD and NVidia cards with heave shader effects on and off. If shadereffects off AMD will run circles with it's polygons and rops output because of higher pixel througouthput. If you now activate all shader effect the situation will turn and Nvidia is running circles around AMD.

At 8K the cache effect will be substantially less, more than shader issues.
 
That can be true in general, but not particulary when talking about Ampere. GDDR6X requires more power and offers higher bandwidth. But if your product isn't able utilize all the bandwidth, it just consumes more power and becomes less power-efficient. Look at RTX 3070 with GDDR6 and RTX 3080 with GDDR6X. The later has 70 %(!!!) higher theoretical bandwidth than the former, but offers only 26-32 % (1440p, 4k, ComputerBase) higher performance. RTX 3080 has the same number of ROPs as RTX 3070, slightly lower boost clock (so fillrate is probably lower for RTX 3080) and the bandwidth cannot be efficiently utilized. But the GDDR6Xs run at full speed and consumes more power.

No.

When a memory bus is idle, it does not consume (significant amount of) power. And the power spent for refreshing the dram arrays is not greater with gddr6x than gddr6.
 
@trinibwoy if we look at ampere, ampere have a huge shader amount and can utilize theme barly. AMD has 50% less shader than turing but performances at the same performance level. Even if you take the clockspeed in condideration AMD have only 15% higher clock than ampere, this means the most power from AMD is coming from utilizen 5000 Shaders and that ampere have a huge lagg of utilizing its 10.000 shader.

So ampere is indeed frontend bound.

AMD has build up a huge frontend if you can trust driver leaks. AMD has 8 Rasterizers at high clocks and each rasterizer have now 4 Packer instead of 2 commpared to Navi10.

AMDs arhchitecture is totaly bandwidth driven, that shaders always have somthing to do.
 
@trinibwoy if we look at ampere, ampere have a huge shader amount and can utilize theme barly. AMD has 50% less shader than turing but performances at the same performance level. Even if you take the clockspeed in condideration AMD have only 15% higher clock than ampere, this means the most power from AMD is coming from utilizen 5000 Shaders and that ampere have a huge lagg of utilizing its 10.000 shader.

So ampere is indeed frontend bound.

unless it’s fillrate limited.
 
unless it’s fillrate limited.
If its fillrate limit nvidia will not gain more and more power upwoards to 4k!

Fun fact the best jumps in generations we have seen, was when gpu manufacturer goges from high shader count design to low shader count design. Remember Pascal! This was the same siutation.
 
Last edited:
If its fillrate limit nvidia will not gain more and more power upwoards to 4k!

Fun fact the best jumps in generations we have seen, was when gpu manufacturer goges from high shader count design to low shader count design. Remember Pascal! This was the same siutation.

Wouldn’t you become more fillrate limited at lower resolutions with an uncapped framerate? If the lower resolution can’t keep shader cores filled because of fewer fragments won’t you push fillrate to max, assuming bandwidth is sufficient and you aren’t geometry bottlenecked first?
 
@Scott_Arm GPus are piplines. At higher resolution you need less polygons to fill up rops. In low Resolution 1 polyogn is transfered into 4 pixel. At higher resolution the same poylgon is now transfered to 16 pixels. At higher resolution you are more and more rops bound than on lower resolution.
 
When a memory bus is idle, it does not consume (significant amount of) power. And the power spent for refreshing the dram arrays is not greater with gddr6x than gddr6.
1. Memory bus is not idle during gaming.
2. Even when playing 4k HDR video, the memory clock (or memory utilization) is so high, that RTX 3090 consumes 40-50 more watts than 256bit GDDR6 solutions (eg. RTX 3070 / RXT 2070 / RX 5700XT)

geforce_rtx_30906oj2l.png

https://www.computerbase.de/2020-09...abyte-msi-test/6/#abschnitt_leistungsaufnahme

So, GDDR6X increases power consumption even if not fully utilized (maybe the problem is not in GDDR6X, but in the Ampere's memory controller, who knows), while GeForce RTX 3080/3090 is not able to utilize all the bandwidth. The result is obvious - it increases power consumption, without significant effect on performance.
 
If the 3080 cannot utilize it's bandwidth then neither can Turing and Pascal cards, since their bandwidth deltas correspond with perfomance deltas.
Only the 3090 has somewhat excessive bandwidth.
 
@Scott_Arm GPus are piplines. At higher resolution you need less polygons to fill up rops. In low Resolution 1 polyogn is transfered into 4 pixel. At higher resolution the same poylgon is now transfered to 16 pixels. At higher resolution you are more and more rops bound than on lower resolution.

I sort of forgot that the raster engines lose efficiency the lower pixel coverage goes. So they're likely able to generate fragments more efficiently at higher resolution, because primitives will cover more pixels, correct? That means at higher resolution you have more shader cores occupied, but the demands on fillrate increase because you have 4x as many pixels at 4k than 1080p, so bandwidth requirements also increase per frame. So RDNA2 would handle fragment generation faster because it has higher clocks, you'll notice it more at lower resolutions because you are not likely to be bandwidth, fillrate or shader limited per frame. That's the logic of it, right?
 
@Scott_Arm GPus are piplines. At higher resolution you need less polygons to fill up rops. In low Resolution 1 polyogn is transfered into 4 pixel. At higher resolution the same poylgon is now transfered to 16 pixels. At higher resolution you are more and more rops bound than on lower resolution.

Right. Higher resolutions put more pressure on bandwidth and fillrate. So if AMD has Ampere beat on the geometry frontend and fillrate they have a chance to sweep the board. I don’t know of any really ALU bound games.
 
Back
Top