chavvdarrr
Veteran
1 year later, more power draw and less gamespeed per mm2 ... people Do expect something improved after so many PR events and announcements
sebbbi seems to think, that ROP-/L2-rework can save lots of cache flushes previously necessary [strike]specifically on deferred shading engines[/strike]: https://forum.beyond3d.com/posts/1987712/Which of the architectural changes should show up it's nose under current testing though?
How else can Vega's current uncanny similarity to Fiji's performance-per-clock be called despite having 40%/4.2B transistors more, then?
As much as the term "Fiji fallback" feels wrong to people on the inside, it sure looks like every single architectural difference that would make Vega 10 work faster than Fiji clock-for-clock is simply not working in games.
That's true and it goes the same with better DCC compression found in Polaris. But to see a benefit of this you'll need to pit Fury X vs Vega FE in a situation where Fury X will be bandwidth starved. In a game.sebbbi seems to think, that ROP-/L2-rework can save lots of cache flushes previously necessary [strike]specifically on deferred shading engines[/strike]: https://forum.beyond3d.com/posts/1987712/
Note that he posted this before receiving his Vega FE - so it's a guess, however educated it may be.
Do frequent cache flushes not also place a heavier tax on maintaining a high GPU occupancy? IOW - more register-state in flight.That's true and it goes the same with better DCC compression found in Polaris. But to see a benefit of this you'll need to pit Fury X vs Vega FE in a situation where Fury X will be bandwidth starved. In a game.
That's true and it goes the same with better DCC compression found in Polaris. But to see a benefit of this you'll need to pit Fury X vs Vega FE in a situation where Fury X will be bandwidth starved. In a game.
L2 cache flushes hurt even when you are not bandwidth bound. In this particular example case, the whole GPU needs to wait until the L2 cache flush is done before it can start executing the next shader. I would assume that frequent RT->texture transitions hurt Vega less than Fiji.That's true and it goes the same with better DCC compression found in Polaris. But to see a benefit of this you'll need to pit Fury X vs Vega FE in a situation where Fury X will be bandwidth starved. In a game.
You don't need more than 4 shader engines. Best example is comparing Nvidia GP102 and GP104. If you look at Polygonoutput test of Beyond3d suite you see no difference between GP102 and GP104 when culling comes into play.Someone please explain why GCN and apparently NCU is limited to a maximum of 4 shader engines? What's the pro and cons with such an limited architecture?
Someone please explain why GCN and apparently NCU is limited to a maximum of 4 shader engines? What's the pro and cons with such an limited architecture?
The primitive shader refers to vs+gs being executed as one shader (tesselation shader stages also get merged with others, with tesselation there's one shader pre-tesselation and one post-tesselation). This cannot be disabled in the driver, it has to be active at all times. (Potentially with extensions exposing this you could do some things more efficiently.)Primitive shader (what ever it is) is again something that's not in DX description and is presumably something that will have to be explicitly coded for somehow. Are there geometry throughput improvements outside of primitive shader that we know about?
I want to talk about Vega and RX here as much as everyone else since I'm a GPU enthusiast, but that's not in my wheelhouse (unless you're an NDA'd developer of course!), so I can't go into specifics.
About the driver comment: it's normal and completely expected for there to be common code in a GPU driver that applies to some or all of the GPUs a driver supports, alongside the specifics for the GPU being driven. That's hopefully just a given. So it was just a guiding hand to not conflate any commonality with it running the driver for a different ASIC, and then reading things into that.
I'd have said that regardless of working for AMD or not, since the above is true for all GPU vendors.
I want to talk about Vega and RX here as much as everyone else since I'm a GPU enthusiast, but that's not in my wheelhouse (unless you're an NDA'd developer of course!), so I can't go into specifics.
Sure, but again this is not likely something that happens automatically. There's a "fast geometry shader" logic on NV part as well and that's something that has to coded for specifically (via NvAPI not standard D3D).The primitive shader refers to vs+gs being executed as one shader (tesselation shader stages also get merged with others, with tesselation there's one shader pre-tesselation and one post-tesselation). This cannot be disabled in the driver, it has to be active at all times. (Potentially with extensions exposing this you could do some things more efficiently.)
It can only make a difference if geometry and/or tesselation shaders are in use, however.
Yes but how do you spot this in a FPS number? :smile: From a game of which you have no idea how many RT->texture transitions it's doing (or anything else for that matter). It's basically getting to a discussion about specifically targeted benchmarks to just to show that Fury X != Vega FE, which I think is ridiculous on so many levels.L2 cache flushes hurt even when you are not bandwidth bound. In this particular example case, the whole GPU needs to wait until the L2 cache flush is done before it can start executing the next shader. I would assume that frequent RT->texture transitions hurt Vega less than Fiji.
DCC obviously also helps, since it allows skipping decompress operations (which stall the GPU for much longer times than cache flushes). Publicly available DCC documentation about GCN3/4/5 however is pretty non-existent. This is the only thing available http://gpuopen.com/dcc-overview/. I would like to see more detailed DCC document of AMD PC hardware in the future.
Simple: FPS number is higher when there's less stalls and flushesYes but how do you spot this in a FPS number? :smile: From a game of which you have no idea how many RT->texture transitions it's doing (or anything else for that matter). It's basically getting to a discussion about specifically targeted benchmarks to just to show that Fury X != Vega FE, which I think is ridiculous on so many levels.
This is what really bothered me about what you said, yes he can. He has no obligation to say anything and again most likely isn't supposed to reveal anything. Your post almost comes off as a demand, and thats not right.
And would you have been satisfied with that?