Practically 100% scaling on any architecture.
Crossfire high-end Navi in the future gonna be nice.
Practically 100% scaling on any architecture.
From the graph' how much exactly the Radeon VII (at 984mV) card is consuming?
Sort of useless ...but interesting anyway.. Radeons (including VII) with primitive culling implemented on Linux:
https://www.phoronix.com/scan.php?page=news_item&px=RadeonSI-Prim-Culling-Async-Com
There are some statements from devs saying that using primitive shaders wouldn't be faster, just easier to implement.At first I was happy like "Oh ?! They finally implanted Primitive Shaders ??". Then I read "compute shaders" :/
zlatan said:The implementation is not hard in the engine. A well designed converter can do 90 percent of job automatically, the last 10 percent is really easy, and the result is much better primitive discard on Vega. But personally I don't like the idea, because GPGPU culling is better. It's uglier, and harder to implement to the engine, but it will work on every hardware that can run compute shader (pretty much everything nowadays). I think this approach might be faster than primitive shader. With rapid packed math and async compute this is almost guaranteed. The main advantage of primitive shader is the easier implementation. That's for sure. But GPGPU culling is just my own egoistic view, because it works on the consoles, so it can be a true cross-platform solution.
(...)
The NGG implementation can be a lot easier. I accept that some devs may not have the money to change the engine, so primitive shader is far better for them.
Thing is primitive shaders are not even "adoptable" right now since it's not exposed in any api
For me, it's could be a mix between broken hardware, and not enough human ressources to make it work.
In the gamer nexus video about PS, it seems more advanced than compute culling... It never worked anyway but the principles seem really nice.
The primitive shaders work because devs had the opportunity to do performance profiling on them, and AMD showed it working at CES 2017.
They're not working in any public driver release because AMD discontinued their development/support on Vega.
It doesn't even make much sense to assume the hardware is "broken", considering there are no less than 5 distinct Vega GPUs (Vega 10, Raven Ridge, Fenghuang, Vega 11, Vega 20) and the hardware blocks would have been fixed through each iteration.
Why is that a weirdness?Well, does this fully explain the NGG weirdness? I don't think so.
AMD Vega architecture footnote:
Data based on AMD Engineering design of Vega. Radeon R9 Fury X has 4 geometry engines and a peak of 4 polygons per clock. Vega is designed to handle up to 11 polygons per clock with 4 geometry engines. This represents an increase of 2.6x.
I quoted and linked to zlatan's post above, who's a dev that was working on a PSVR title a couple of years ago. I don't know what he's working on right now since he's preferred to stay anonymous so far, but at the anandtech forums he's known to have a lot of knowledge to spare in what relates to GPU performance profiling.Have you some sources on that ?
So, is each geometry engine processing 2.75 polys per clock?Well, does this fully explain the NGG weirdness? I don't think so.
AMD Vega architecture footnote:
Data based on AMD Engineering design of Vega. Radeon R9 Fury X has 4 geometry engines and a peak of 4 polygons per clock. Vega is designed to handle up to 11 polygons per clock with 4 geometry engines. This represents an increase of 2.6x.
No, the 11 polygons per clock is the theoretical max you could reach with NGG, without NGG they're doing 4 polygons per clock just like Fiji.So, is each geometry engine processing 2.75 polys per clock?
I quoted and linked to zlatan's post above, who's a dev that was working on a PSVR title a couple of years ago. I don't know what he's working on right now since he's preferred to stay anonymous so far, but at the anandtech forums he's known to have a lot of knowledge to spare in what relates to GPU performance profiling.
Whenever I read marketing material like that*, I start to ask myself, what's the minimum amount of hardware capability required to not make this an outright lie. Reasoning being: If it was a great thing, it would not need to be hidden away like that.No, the 11 polygons per clock is the theoretical max you could reach with NGG, without NGG they're doing 4 polygons per clock just like Fiji.
How could he say that using compute shaders with async and RPM would be faster without having seen how fast the primitive shaders were?To me he was talking on a theoretical level. But if I'm wrong, all right then. My last chats with somes devs was "We can't even test it, it's not exposed anywhere", meh...