AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

I'd actually like to see some extensive benchmarks done with no AA and with FXAA, SMAA etc only. It's an unexpected performance issue.
Hardware Unboxed tested CMAA in Dirt4 and there was little performance drop as expected. Only when moving to high levels of MSAA did they experience massive drops.
 
From the whitepaper, the geometry numbers keep going up, I thought this was 11 before, ...
I wonder what's the additional benefit of each extra triangle.
Going 1 to 2, of course. 2 to 4, sure. But other than some pathological corner cases, going 4 to 8 is probably not going to lift the overall frame time by more than a few percent?

Otherwise, things like this would have been done much earlier.
 
I wonder what's the additional benefit of each extra triangle.
Going 1 to 2, of course. 2 to 4, sure. But other than some pathological corner cases, going 4 to 8 is probably not going to lift the overall frame time by more than a few percent?

Otherwise, things like this would have been done much earlier.

I doubt it would be hard to find CAD examples where it helps a lot. For games, I have no idea.
 
More triangles the better, even for games I guess ?
Of course. :) But it's all academic in the context of driver improvements etc, when it has the potential of improving performance by 1%.

I believe that, in the Nvidia architecture, triangle performance is joined at the hip with other functional blocks.

So if you have, say, 2 triangles per clock for some mid range device, and you scale it up, you get 4 or 6 depending on the ratio.

That doesn't mean that those extra units for the higher SKUs will be used at full capacity.

Not sure if that's the case for AMD, but I don't think it is.

Not saying that it was a mistake to have up to 11 or 17 or whatever triangles per clock, it may simply be a free benefit of some architectural choices. But I'm just questioning their value in terms of final gaming performance.
 
Triangles are used a lot with tesselation, no ? If so, we have seen a some games where Fiji was beaten by 480 and 1060 only because of the triangle rate. With that in mind, yes maybe 11, or 17, or whatever is "not worth it", but I guess they have to increase it anyway.
 
I'm kind of wondering if people are looking to some of the new features wondering why they're not giving the performance improvements they expect when really it's the MSAA performance that's tanking most of the PC results.
 
Hardware Unboxed tested CMAA in Dirt4 and there was little performance drop as expected. Only when moving to high levels of MSAA did they experience massive drops.

Yah, I'd like to see more games tested with no AA vs high levels of MSAA. Most sites seem to benchmark based on presets like Ultra and High, which makes sense, and I'm assuming for most games that means MSAA, but maybe not.
 
Of course. :) But it's all academic in the context of driver improvements etc, when it has the potential of improving performance by 1%.


I believe that, in the Nvidia architecture, triangle performance is joined at the hip with other functional blocks.

So if you have, say, 2 triangles per clock for some mid range device, and you scale it up, you get 4 or 6 depending on the ratio.

That doesn't mean that those extra units for the higher SKUs will be used at full capacity.

Not sure if that's the case for AMD, but I don't think it is.

Not saying that it was a mistake to have up to 11 or 17 or whatever triangles per clock, it may simply be a free benefit of some architectural choices. But I'm just questioning their value in terms of final gaming performance.

It might be a bit of a chicken-and-egg situation where developers are reluctant to use as many triangles as they'd like because existing GPUs would struggle with them, and IHVs might be reluctant to invest much in triangle throughput because they perceive no urgent need for it—at least that case could be made for AMD, much less for NVIDIA.

But in an environment where AMD's market share is dwindling, I think assuming that developers won't use too much tessellation (or whatever other feature) because it doesn't run well on AMD GPUs would be very risky, as devs might just deem this part of the install base (almost) irrelevant and do whatever works well on NVIDIA's hardware.

Edit: plus there's so much money to be (potentially) made in pro graphics that ignoring this market's needs would be folly.
 
Of course. :) But it's all academic in the context of driver improvements etc, when it has the potential of improving performance by 1%.


I believe that, in the Nvidia architecture, triangle performance is joined at the hip with other functional blocks.

So if you have, say, 2 triangles per clock for some mid range device, and you scale it up, you get 4 or 6 depending on the ratio.

That doesn't mean that those extra units for the higher SKUs will be used at full capacity.

Not sure if that's the case for AMD, but I don't think it is.

Not saying that it was a mistake to have up to 11 or 17 or whatever triangles per clock, it may simply be a free benefit of some architectural choices. But I'm just questioning their value in terms of final gaming performance.
Nvidia seems to scale primitive rate based on 3 things. A block at the front of the pipe that feeds the shaders, the cull rate scales with the number of SMs, and the rasterized triangle rate scales with the number of GPCs. AMD has typically scaled triangle performance based on the number of shader engines.
 
I wonder what's the additional benefit of each extra triangle.
Going 1 to 2, of course. 2 to 4, sure. But other than some pathological corner cases, going 4 to 8 is probably not going to lift the overall frame time by more than a few percent?

Otherwise, things like this would have been done much earlier.

I think the geometry performance doesn't scale with resolution and it'd have been enough that Vega had high enough clockspeeds with a doubling per shader engine, so that it didn't face Fiji's problem of underperformance at lower resolutions.

Digitalfoundry's review says as much, their numbers show Vega to be doing much better than Fury series at 1080p. So even the doubling doesn't seem to be that important. Now AMD claim more than quadrupling without anything to show in practice.
 
I wonder what's the additional benefit of each extra triangle.
Going 1 to 2, of course. 2 to 4, sure. But other than some pathological corner cases, going 4 to 8 is probably not going to lift the overall frame time by more than a few percent?
It's not about rates, it's about saturation.

Producer-consumer intermediate buffers are finite.

For illustrative purposes: if you have a triangle buffer that spends 100% of its time full, with 99% of the triangles culled after they leave the buffer then anything more than 1% culling before the triangles reach the buffer is going to be a win.

Conventional GPUs have finite buffers for triangles, which have to be assembled before being culled, with culling done by fixed function hardware. Even if this hardware ran at infinite speed, the finite buffer before it would limit throughput due to saturation.

To make the saturated buffer situation worse, the buffer has to hold attribute data, not just vertex position data. The alternative would be to defer non-position attribute-shading until after culling. Which is how the primitive shader kills more than one bird with a stone.

Otherwise, things like this would have been done much earlier.
Who says they weren't?

I can't help thinking that AMD seems to be late to the party. The primitive shader concept is applicable to all GPUs going right back to D3D10 at least.

Does anyone really believe NVidia isn't doing this already?

What bugs me is that AMD could have spent the last few years experimenting with this on Fury X, RX 480 or whatever. There is nothing here that old hardware couldn't do.

EDITED: late night sloppiness fixed
 
Last edited:
It's not about rates, it's about saturation.

Produce-consumer intermediate buffers are finite.

For illustrative purposes: if you have a triangle buffer that spends 100% of its time full, with 99% of the triangles culled after they leave the buffer then anything more than 1% culling before the triangles reach the buffer is going to be a win.

Conventional GPUs have finite buffers for triangles, which have to be assembled before being culled, with culling done by fixed function hardware. Even if this hardware ran at infinite speed, the finite buffer before it would limit throughput due to saturation.

To make the saturated buffer situation worse, the buffer has to hold attribute data, not just vertex position data. The alternative would be to defer non-position attribute-shading until after culling. Which is how the primitive shader kills more than one bird with a stone.

QUOTE]Otherwise, things like this would have been done much earlier.
Who says they weren't?

I can't help thinking that AMD seems to be late to the party. The primitive shader concept is applicable to all GPUs going right back to D3D10 at least.

Does anyone really believe NVidia isn't doing this already?

What bugs me is that AMD could have spent the last few years experimenting with this on Fury X, RX 480 or whatever. There is nothing here that old hardware couldn't do.[/QUOTE]
The reason is simple. Lack of resources.
 
so what actually went wrong? did rtg just spend all of their time to develop a "good" pro driver and let the gaming/consumer one for later on? i mean the card is a beast but nothing really works as intented
 
They probably should have offered a Fury/Fury X at 14nm as a stopgap. Vega is both late to the party and not that impressive in terms of performance. Vega 56 is a good deal when compared against the 1070/1080 but it's way too late for it to do anything in the market. You'd expect consumer Volta GPUs to be less than a year away (I'm thinking Spring 2018) and these will surely outclass anything AMD has to offer. I just hope Navi is much better, for the sake of competition.
 
Or a "big polaris" rather than a 14nm Fury. Anyway. Yeah Vega 56 is nice, but for people like me already having a Fury X, the only good thing is more vram, but raw performances when not vram limited aren't worth 399 (or 499 if they up the price by 100 once the first batch is sold like with vega 64...). I've a freesync monitor so my choice is limited. I will stay with my Fury X. It's still a beast@1440p.
I hope Navi (or a 7nm Vega ?) will be good, as you said, we need competition.
 
Back
Top