Bondrewd
Veteran
Very different things to ever be mentioned in the same sentence.why Navi 3x is reducing the SEs in general (or increasing the WGPs per SE).
Very different things to ever be mentioned in the same sentence.why Navi 3x is reducing the SEs in general (or increasing the WGPs per SE).
I don't get the ARM craze, it's like suddenly everyone decided to believe Apple's marketing team (remember their "stellar" GPU presentation) and some toy tests from an ARM fan at Anandtech. Pretty sure if it was so good, the big boys would already be doing it (and it seems Keller's K12.3 didn't quite pan out, so the magic ARM performance and efficiency wasn't really there).
Well, it's good for what it is, but it's too wide to be scalable to proper HEDT/enterprise level, it seems.
Kinda meh that ATi again abandons the middle-range market, hopefully the potentially successful RDNA3 won't be followed by R600 2.0...
Hey we still have PVC to laugh at.Few chips could ever be as disappointing as R600. Vega comes close though.
I don't get the ARM craze, it's like suddenly everyone decided to believe Apple's marketing team (remember their "stellar" GPU presentation) and some toy tests from an ARM fan at Anandtech. Pretty sure if it was so good, the big boys would already be doing it (and it seems Keller's K12.3 didn't quite pan out, so the magic ARM performance and efficiency wasn't really there). Well, it's good for what it is, but it's too wide to be scalable to proper HEDT/enterprise level, it seems.
Wut.more that's an viable alternative.
wutThere are other makes betting on ARM CPUs
how?soon x86 may lose it's supremacy in the marketing.
You're a fuckton years late since Seattle 'shipped' like 5 years ago.How long until Intel and AMD start making ARM CPUs to not lose market share?
major hopium momentAnd when that happens how long will it take to x86 become niche?
Probably both but more of the latter? The 6700 XT clocks ~12% higher than the 6900 XT on average. The VRAM bandwidth-per-WGP and LLC-amount-per-WGP (and probably the LLC bandwidth too) are all 50% higher on Navi 22 vs. Navi 21.
OTOH, it doesn't look like Navi 22 is losing all that much from halving the number of Shader Engines, which might be an indicator why Navi 3x is reducing the SEs in general (or increasing the WGPs per SE).
Maybe from pure highend gamers point of view and when limiting "Vega" to mean only Vega 10, buit that's the only view where Vega might be considered even "close" to R600.Few chips could ever be as disappointing as R600. Vega comes close though.
I was contrasting a workgroup of 64 pixels (work items) which are allocated to a single SIMD with a non-pixel-shader workgroup of 64 work items that will be allocated across both SIMDs. I was pointing out that the pixel shader allocation to SIMDs was unusual. And that LDS appears to these two SIMDs in a CU the same as in GCN (though shared by four SIMDs). A half of LDS (one array out of the two) is dedicated to a CU in CU mode.
But note a workgroup of 128 work items bound to an RDNA CU necessarily results in 2 work items sharing a SIMD lane. With increasing multiples for larger workgroups up to those of size 1024. So the pixel shader configured as wave64 appears to be a special case of workgroup. As a special case it's designed explicitly to gain from VGPR, LDS and TMU locality for pixels whose quad work item layout and use of quad-based derivatives is a special case of locality.
This then leads me to believe that "wave64" for pixel shading is a hack that ties together two hardware threads on a SIMD. Not only is there a clue in the gotcha that I described earlier, but one mode of operation (alternating halves per instruction) is just like two hardware threads that issue on alternating cycles - which is a feature of RDNA. You can see the problem AMD has introduced with RDNA: two different wavefront sizes. Are they both hardware threads? Or is wave64 an emulated hardware thread, simulated by running two hardware threads?
All this came up because I'm trying to work out how RDNA 3 with no CUs inside a WGP configures TMUs, RAs and LDS. Bearing in mind that a "WGP" seems as if it would actually be a "compute unit", a compute unit generally has a single TMU and a single LDS. So my question was whether a CU with 8 SIMDs can have adequate performance with a single TMU and a single LDS.
One major caveat of this 2.5X~3X raster scaling is that 1440p and lower is going to irrelevant, our current CPUs are not strong enough to produce, a 3090 is already CPU limited at 1440p, and 4K is going to be CPU limited in a significant way as well.
Perhaps the die space/power requirements are small enough that they don’t deem it worthwhile to change the SM structure yet.Yeah that's expected though given the rasterizer and triangle setup isn't the bottleneck the vast majority of the time. Navi 22 is still doing 2 tris per clock at a very high clock speed which is plenty geometry crunching power. It's a mystery why Nvidia continues to scale up their tessellation and triangle throughput. GA102 has 42 primitive setup units and 7 rasterizers while Navi 21 does just fine with only 4 of each.
Anybody can answer me the question about the imbalance of rasterizer and Scan Converter?
My second question is that also the Frontend is a black box for me. In driver you find always the hint that you have 4 Rasterizer but 8 Scan Converter. So Scan Converter is the main Part which transforms Polygons into pixels. So when 1 Polygon comes from Rasterizer but you have 2 Scan Converter, 1 Scan Converter is running empty?[
Just in time for Zen4 V-cache CPUs!One major caveat of this 2.5X~3X raster scaling is that 1440p and lower is going to irrelevant, our current CPUs are not strong enough to produce, a 3090 is already CPU limited at 1440p, and 4K is going to be CPU limited in a significant way as well.
All the air-cooled 6900XTs have 2000MHz memory. Some (all?) of the liquid cooled cards have 2250MHz, according to this list:I think it is possible to check it now with N21 xtxh SKUs, which apparently have memclock limit of 2450 mhz instead of 2150 mhz (although it seems that either memory chips themselves or IMC can't do much more than 2170 mhz or so)
One way to read it: 6900XT has 33% more bandwidth and 33% more power than 6700XT and is ~50% faster...The default assumption is that an RDNA 3 WGP is at least as fast as 2x RDNA 2 WGPs in flops dependent workloads. How games will scale though is a different matter. Ampere doubled flops and L1 bandwidth per SM but that didn’t result in 2x gaming performance. The 46SM 3070 is only 30% faster than the 46SM 2080.
RDNA 2 scaled very well with clock speed vs RDNA 1. Comparing the similar 40CU configs of the 6700xt and 5700xt there was a 35% improvement on paper due to higher clocks and actual results in games were pretty close to that number. This is a great result especially considering the lower off-chip bandwidth on the 6700xt. Scaling up RDNA 3 didn't quite hit the same mark. Comparing the 40CU 6700xt and 80CU 6900xt there was a 75% improvement on paper but only 50% in actual gaming numbers. This leads me to believe the 6700xt is benefiting from higher clocks on its fixed function hardware or the 6900xt is hitting a bandwidth wall. As mentioned earlier in the thread it's going to be interesting to see how AMD feeds such a beast.
#1 is GPU-specific, so it's not coming to PC.I don't expect anyone to stand still.
I'm fully expecting for Nvidia to use their richer developer influence to push for #2 as hard as they can because that's where they have an architectural advantage, and for AMD to focus on "console-multipliers" in the expectation that #1 is widely adopted.
Once upon a time 1070/Vega 56 was "enough" for 1440p gaming.One major caveat of this 2.5X~3X raster scaling is that 1440p and lower is going to irrelevant, our current CPUs are not strong enough to produce, a 3090 is already CPU limited at 1440p, and 4K is going to be CPU limited in a significant way as well.
I meant the OC limit in the Wattman, not the stock memory clocks. Of course, it's possble to check how how the GPU scales in supposedly memory-bound scenarios by simply decreasing the core clocks, but it might be confounded by things like some parts of the chip not really scaling with clocks and so on. So that's why I thought memory OC is the best way to check it, although IC and some form of in-built error correction (as far as I understood) makes this hard to gauge.All the air-cooled 6900XTs have 2000MHz memory. Some (all?) of the liquid cooled cards have 2250MHz, according to this list:
Just thought about this right before.#1 is GPU-specific, so it's not coming to PC.
You mean higher res gfx needs more CPU power up to the point the CPU throttles?One major caveat of this 2.5X~3X raster scaling is that 1440p and lower is going to irrelevant, our current CPUs are not strong enough to produce, a 3090 is already CPU limited at 1440p, and 4K is going to be CPU limited in a significant way as well.
Just thought about this right before.
AMD could patch custom traversal shaders into specific per game driver updates.
Not what i want, not sure how much sense it makes, but there are options in theory.