AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Status
Not open for further replies.
It would be interesting to see the reception if those leaks are true and the deltas trend over to general game performance. What would people value more? A noticeable bump in rasterization or one in RT?
 
It would be interesting to see the reception if those leaks are true and the deltas trend over to general game performance. What would people value more? A noticeable bump in rasterization or one in RT?

The problem is, one may become indistinguishable from the other as more and more games become RT enabled thanks to the consoles. It'll be a shame if Big Navi does hold up to these performance rumours but it let down in RT performance. I'll be genuinely torn as to which to get.
 
The problem is, one may become indistinguishable from the other as more and more games become RT enabled thanks to the consoles. It'll be a shame if Big Navi does hold up to these performance rumours but it let down in RT performance. I'll be genuinely torn as to which to get.

I'm in the same boat. I play competitive games where all I want is to be able to play 1440p240 with low settings, but when I play single player I'm going to be looking for 1440p120 with the ray tracing pizzazz.
 
AMD is a polygon monster. It has 8 Rasterizer at 2.3 GHz. When i see the values between the two benchmarks the old one (fire strike) was polygon bound, the newer benchmark (time spy) is shader bound. Intersting that AMD can ceep up with 5000 Shaders with Nvidiass 10.000 Shader.

For those wondering about the differences in Time Spy and Fire Strike performance, this is straight from the 3dmark technical guide:

View attachment 4819

My guess would be on modern gpus that fire strike is going to be more of a fillrate test, or something.

Edit: Looking at clock speed, potential rop count (128) and these firestrike results, AMD has potential to be the ultimate "competitive settings" gpu for csgo, valorant, fortnite, apex etc, where people tend to play on very low settings and lower resolutions. "Ultra settings" comparisons is where things are looking close, but AMD still might win there too, but ray tracing may muddy those waters. I'm curious if igors labs port royal scores included dlss for the nvidia gpu. I'd like to see native vs native numbers for ray tracing.
 
AMD is a polygon monster. It has 8 Rasterizer at 2.3 GHz. When i see the values between the two benchmarks the old one (fire strike) was polygon bound, the newer benchmark (time spy) is shader bound. Intersting that AMD can ceep up with 5000 Shaders with Nvidiass 10.000 Shader.

Yah, I can see that perspective. Even though time spy has way more vertices and triangles per frame, the ratio is heavily skewed towards computer shader invocations, where in fire strike the ratio of compute shader invocations to triangles is massively lower.

Edit: It'd be interesting to know if there are enough vertices and triangles in fire strike to bottleneck before the rops max out your bandwidth.
 
This seems promising. But if the 6800XT alone is so fast, why did AMD show benchmarks at the Zen 3 launch were Navi is a decent margin slower than the 3080? I'd love it to be as fast as these rumours are suggesting, not least for the sake of pricing, but I think it might be a bit too soon to jump on this hype train.

Because they've been searching for a benchmark where the 6080 beats the 3080, and Firestrike Ultra is it? It's little different than Nvidia finding out gaming Ampere runs Doom Eternal super well, far better than the average uplift, and putting out a video benchmark of it after the announcement.

GPU archs seem to be diverging this year over what they run efficiently. There's a large disparity in performance gains per title for Nvidia and AMD both, and even Intel's HPG is fairly title dependent on performance versus the competition. So trying to go through recognizable titles seeing what works best more relevant than ever for PR.
 
Don't seem to understand this. Is there an issue with the 3DMark13 FireStrike benchmark?
The recent message about the performance of "Navi 21 XT" under the 3DMark13 FireStrike Ultra benchmark must be mitigated a bit by the point that the RDNA advantage in this benchmark is not only a few percentage points, but is actually very significant. According to comparative values, a Radeon RX 5700 XT is about +19% better than those should normally be in relation to a general performance index. Specifically, under this benchmark, that AMD card even beats a GeForce RTX 2080, although it is usually clearly faster than the Radeon RX 5700 XT. To what extent the upcoming RDNA2 cards show an equal affinity for this 3DMark13 test or whether their performance scaling in the high-end field is as good as with the Radeon RX 5700 XT can only be speculated at this time. But at least there is a clear indication that the 3DMark13 FireStrike Extreme is probably not particularly suitable for assessing "normal performance" between AMD and nVidia graphics cards.
https://www.3dcenter.org/news/hardware-und-nachrichten-links-des-22-oktober-2020
 
Depending on how far you want to go back: I remember the family party that was HD 2000 series.
Don't seem to understand this. Is there an issue with the 3DMark13 FireStrike benchmark?

https://www.3dcenter.org/news/hardware-und-nachrichten-links-des-22-oktober-2020
3DCenter says Fire Strike is an overperformer by 19% with RDNA1 already, so this might carry to the next generation. They insinuate to not take RX 6000 perf in this particular benchmark as indicative for general performance. That's what I read from the original german posting.
 
I'd say it's nothing in particular, just a setup code for SKUs with different amount of shader arrays. It even says so in the next commit:

Skip disabled sa to correct the cu_info and active_rbs for sienna cichlid.
 
The original comparison was with hypothetical RT block that only gave intersection results while not performing traversal, which would leave the SIMD in a position where determining the next node addresses would require explicit vector memory reads to data that would have been fetched and parsed by the RT unit already. AMD's method is at least less redundant than that.

Ok, yeah it does state that the list of node pointers is provided to the CU so there should be no need to fetch them again.

AMD's patent doesn't clearly outline where the process resides for the intermediate work between node evaluations. It highlights that the SIMD and CU have substantial storage available at no additional cost versus the likely hardware footprint of implementing sufficient storage on an independent unit.
AMD's claims are between their hybrid method and a dedicated unit implementing a unit that might be able to traverse a BVH to arbitrary depths without redoing traversal due to losing the full context of what had been traversed already.
Nvidia's scheme appears to have a traversal stack of finite depth that can lead to redundant node traversal, which makes it less expensive than what AMD was using as its baseline.

Whether AMD's method leverages registers, LDS, or possibly spills to memory isn't spelled out. Even if there were spills to memory, writing out data based on pointers and metadata from completed RT node evaluations to something like a stack seems like it could be less disruptive than the SIMD re-gathering node data on its own.

The stack and result queues on Ampere are likely fixed size and relatively small. Nvidia's patent didn't make any accommodation for spilling out to memory so I assume AMD will handle edge cases more gracefully. Hopefully AMD goes into detail on the inner workings of their RT flow.
 
You don’t need a deep stack to efficiently handle traversal, whether it’s stored on dedicated memory, registers or cache.
For instance see this recent Intel paper:
https://software.intel.com/content/...es/wide-bvh-traversal-with-a-short-stack.html
Compressed wide bounding volume hierarchies can significantly improve the performance of incoherent ray traversal, through a smaller working set of inner nodes and therefore a higher cache hit rate. While inner nodes in the hierarchy can be compressed, the size of the working set for a full traversal stack remains a significant overhead. In this paper we introduce an algorithm for wide bounding volume hierarchy (BVH) traversal that uses a short stack of just a few entries. This stack can be fully stored in scarce on-chip memory, which is especially important for GPUs and dedicated ray tracing hardware implementations. Our approach in particular generalizes the restart trail algorithm for binary BVHs to BVHs of arbitrary widths. Applying our algorithm to wide BVHs, we demonstrate that the number of traversal steps with just five stack entries is close to that of a full traversal stack. We also propose an extension to efficiently cull leaf nodes when a closer intersection has been found, which reduces ray primitive intersections by up to 14%.

Generally speaking a high performance SW implementation of BVH traversal on a GPU is probably not going to use a variable amount of state for the stack, as you want to avoid going off chip (and perhaps even off L2) and keep all of your data on registers and/or shared local memory.
 
Last edited:
Yes 5700XT does very well in FSU, though I don't think it's anywhere near 20%, that's probably a cherry picked AIB OC result. Average score is closer to 2070s. OTOH leaked score could be cherry picked AIB OC results too.
3DCenter cites their calculation basis in their linked article. According to that, it's not an outlier or cherrypicked result, but an average FS Ultra score vs. their calculated 4K performance index, which is based on the results from Kitguru, Overclockers Club and Tweakers.
 
I'm running a Ryzen 3900X. If you look at the PSU recommendation for RTX3080 it is a 750W PSU. The RTX3070 already recommends a 650W one for 220W. GPUs consumption has been increasing for a while from it's usual 150W to 180W for Gx104 class models. My PSU is nothing special, a Corsair VS650 I think.

True but those recommendations are of course keeping in mind quality of PSU and possibility of being paired with a high power Intel CPU. With a 3900X, you shouldn't have any problem running either a 3070 or N22 (which I'd expect is <200W).
The problem is, one may become indistinguishable from the other as more and more games become RT enabled thanks to the consoles. It'll be a shame if Big Navi does hold up to these performance rumours but it let down in RT performance. I'll be genuinely torn as to which to get.

Since the consoles are based on RDNA2, the RT performance could actually be better on Big Navi in the long run as games could be optimized for it.
 
Also what is up with that update patch and CU count? Jawed redemption arc or just BS?
Seems to imply that Navi 21 is the only GPU where ROPs and shader arrays are disabled.

I wonder if ROPs are bound to shader arrays.

So Navi 21 could be 60 CUs (WGPs?) with 96 ROPs (48?) with a single shader array entirely turned off and the corresponding set of ROPs, too.? It would seem AMD has a choice of at least 8 shader arrays when choosing which one to turn off.

Since the consoles are based on RDNA2, the RT performance could actually be better on Big Navi in the long run as games could be optimized for it.
If Navi 2x GPUs have a monster last level cache, then consoles games will not be built to take advantage of it, because consoles don't have such a monster cache.
 
3DCenter cites their calculation basis in their linked article. According to that, it's not an outlier or cherrypicked result, but an average FS Ultra score vs. their calculated 4K performance index, which is based on the results from Kitguru, Overclockers Club and Tweakers.

Performance drop off at 4K is pretty big for 5700XT. This is really not a 4K GPU. Even if 5700XT was as fast as 2080 in FSU on average which X doubt based on review that I posted, you are still looking at maybe 15% according to TPU at 1440p. which is a reasonable res to use.

But anyway, I agree that extrapolating any kind of gaming performance from leaked FSU scores with an unreleased GPU that appears to be a pretty BIG departure from RDNA1 is completely pointless.
 
Status
Not open for further replies.
Back
Top