But the 4080 with 76 SMs can keep up with the 7900 xtx with 96CUs in most games@mr magoo The article does state at some point in the conclusion that the Nvidia gpus are designed to have as many cores as possible at the expense of utilization because more cores generally wins vs fewer cores with higher utilization.
Most games have smaller shaders as to not bottleneck on the number of available registers - meaning the nvidia GPUs are well and can run many of them fast and saturate more of their pipelines with them.But the 4080 with 76 SMs can keep up with the 7900 xtx with 96CUs in most games
But the 4080 with 76 SMs can keep up with the 7900 xtx with 96CUs in most games
As an nvidia owner it sucks (for heavily optimized Xbox titles). But I think we can hope going forward into the future we see the frame Gen technologies, DLSS, and hardware ray tracing accelerators in action to make up for this loss.Basically, the shoe is on the other foot almost purely due to how heavily the game was optimized to work on the XBS consoles. Again not out of developers deliberately making NV GPUs worse, but just the reality of NV not having a presence on XBS consoles.
Most games have smaller shaders as to not bottleneck on the number of available registers - meaning the nvidia GPUs are well can run many of them fast in saturate more of their pipelines with them.
Starfield OTOH is heavily optimized for AMD GPUs because it's heavily optimized for the XBS consoles. Considering there was a 1 year delay in release just to get it into releasable form, NV GPUs on PC likely got a similar level of optimization time as AMD GPUs usually get in most other games.
The results imply a high level of tuning for RDNA. Those SIMD utilization numbers are extremely impressive. That kind of utilization doesn’t happen by accident. Despite that, utilization on Ampere and Ada is still pretty good. Better than most games I’ve profiled on a 3090.
Nvidia has been stingy on register capacity though. Even A100 and H100 are still on 64KB. Maybe it’s time for them to upgrade.
The chips and cheese article says they keep the register file small so they can fit more SMs. I'd be curious to know how much die space the register file takes as a percentage of the SM. It's 256kB per SM (64k 32bit entries). Each thread limits to 255 registers. Not sure how much they could grow the register file while still increasing SM count for the next architecture. I'm curious how the 255 register limit for a thread is set. I think there's 32 threads per warp, and a 48 warp limit per SM. Not sure how you get to that 255 register limit.
This dude has always been a grifter. His stupid driver overhead video was debunked almost immediately back in the day but I still see it pop up on occasion.FYI this is the "Nvidia removed their hardware scheduler and that's why Radeon's are so less CPU dependent" guy. This is just repeating what chipsandcheese reported*, not sure what the value of reposting the regurgitated opinions from this antivax loon has here.
*Actually, I'm being overly generous. He's interpreting what they're saying, and incorrectly.
It doesn't. Alex compared 6800 XT to the 3080 in his review, if I recall correctly. RDNA 2 features the same 256 kilobyte register file per Compute Unit (CU) as Ampere or Ada's 256 KB per SM. Chipsandcheese compared the register file size for the 7900 series' CU (384 KB), which isn't applicable to the RDNA 2 and lower-tier RDNA 3 chips that have 256 KB register file per CU. Per partition, RDNA2 features a 128 KB register file for 32 ALUs, whereas Ampere/Ada has just 64 KB for the 16+16 FMA light and heavy pipes. However, Ampere/Ada executes each warp over two cycles for floating-point operations as opposed to a single cycle on RDNA.I think it explains it. Limitation is the register size which lowers
It doesn't. Alex compared 6800 XT to the 3080 in his review, if I recall correctly. RDNA 2 features the same 256 kilobyte register file per Compute Unit (CU) as Ampere or Ada's 256 KB per SM. Chipsandcheese compared the register file size for the 7900 series' CU (384 KB), which isn't applicable to the RDNA 2 and lower-tier RDNA 3 chips that have 256 KB register file per CU. Per partition, RDNA2 features a 128 KB register file for 32 ALUs, whereas Ampere/Ada has just 64 KB for the 16+16 FMA light and heavy pipes. However, Ampere/Ada executes each warp over two cycles for floating-point operations as opposed to a single cycle on RDNA.
Even assuming a scenario where the whole game was 100% occupancy-limited — which is very far from truth — I cannot see how a 4060 Ti with more registers per chip and more threads in flight could be slower compared to an RX 7600, as seen here: https://www.techspot.com/photos/article/2731-starfield-gpu-benchmark/#Ultra-1080p. Therefore, their review doesn't provide any conclusive answers, unfortunately.
Not just RDNA, Vega 64 is 25% faster than GTX 1080 @1080p, RX 580 is 30% faster than GTX 1060.The article isn't conclusive as it didn't profile every compute dispatch or look at every performance indicator but does point to heavy optimization for RDNA.
If I understand the article correctly; occupancy isn’t a requirement for performance, since low occupancy does not correlate to poor performance.The article isn't conclusive as it didn't profile every compute dispatch or look at every performance indicator but does point to heavy optimization for RDNA.
The 7600 likely has much higher L2 bandwidth than the 4060 Ti based on the earlier analysis here. That could be one factor in its outperformance vs the 4060 Ti. The 6700 XT matches the 7600 in Starfield and has a similarly fast L2.
"In AMD’s favor, they have a very high bandwidth L2 cache. As the first multi-megabyte cache level, the L2 cache plays a very significant role and typically catches the vast majority of L0/L1 miss traffic. Nvidia’s GPUs become L2 bandwidth bound in the third longest shader, which explains a bit of why AMD’s 7900 XTX gets as close as it does to Nvidia’s much larger flagship. AMD’s win there is a small one, but seeing the much smaller 7900 XTX pull ahead of the RTX 4090 in any case is not in line with anyone’s expectations. AMD’s cache design pays off there."
Crazy. I think we’ve often theorized that this was possible, under very specific conditions of pure optimization; I didn’t really believe it could come true. Hell of a lot of things have to fall in place for this to happen though.Not just RDNA, Vega 64 is 25% faster than GTX 1080 @1080p, RX 580 is 30% faster than GTX 1060.
Starfield: PC Performance Benchmarks for Old Graphics Cards and Processors | RPG/Role Playing | GPU TEST
In this review, we will consider the release version of Starfield on old families of NVIDIA and AMD video cards at maximum graphics quality settings. We will analyze the performance of these graphicsgamegpu.tech
This has been the common scenario for quite some time now in these big multilplatform titles.Not just RDNA, Vega 64 is 25% faster than GTX 1080 @1080p, RX 580 is 30% faster than GTX 1060.
Starfield: PC Performance Benchmarks for Old Graphics Cards and Processors | RPG/Role Playing | GPU TEST
In this review, we will consider the release version of Starfield on old families of NVIDIA and AMD video cards at maximum graphics quality settings. We will analyze the performance of these graphicsgamegpu.tech