AMD Radeon VII Announcement and Discussion

From the graph' how much exactly the Radeon VII (at 984mV) card is consuming?

They're not measuring.
In this case they're using the scripted Unigine Heaven test which doesn't tax the CPU at all AFAIK, so the total system without the card is probably just some 10-20W above the idle values (so 70 to 80W). That said, with the undervolt the card probably consumes around 210-220W in Heaven (other games like Metro are reported to use quite a bit more power on GPUs).
 
Sort of useless ...but interesting anyway.. Radeons (including VII) with primitive culling implemented on Linux:
image.php


https://www.phoronix.com/scan.php?page=news_item&px=RadeonSI-Prim-Culling-Async-Com

oN7HE74.jpg
 
Sort of useless ...but interesting anyway.. Radeons (including VII) with primitive culling implemented on Linux:
image.php


https://www.phoronix.com/scan.php?page=news_item&px=RadeonSI-Prim-Culling-Async-Com

oN7HE74.jpg



But it's taking compute ressources right ? What is the difference with what Wolfenstein 2 is doing (their is a culling option too, done with compute) ?

At first I was happy like "Oh ?! They finally implanted Primitive Shaders ??". Then I read "compute shaders" :/
 
At first I was happy like "Oh ?! They finally implanted Primitive Shaders ??". Then I read "compute shaders" :/
There are some statements from devs saying that using primitive shaders wouldn't be faster, just easier to implement.
The consensus would be that smaller devs making a new engine could implement primitive shaders, but for larger dev houses or devs who are using existing engines it wouldn't make much sense to use it instead of compute shaders which are IHV-agnostic. Particularly on Vega because of its 2xFP16 throughput and async compute capabilities.


https://forums.anandtech.com/thread...-for-primitive-shaders.2535025/#post-39276531

zlatan said:
The implementation is not hard in the engine. A well designed converter can do 90 percent of job automatically, the last 10 percent is really easy, and the result is much better primitive discard on Vega. But personally I don't like the idea, because GPGPU culling is better. It's uglier, and harder to implement to the engine, but it will work on every hardware that can run compute shader (pretty much everything nowadays). I think this approach might be faster than primitive shader. With rapid packed math and async compute this is almost guaranteed. The main advantage of primitive shader is the easier implementation. That's for sure. But GPGPU culling is just my own egoistic view, because it works on the consoles, so it can be a true cross-platform solution.
(...)
The NGG implementation can be a lot easier. I accept that some devs may not have the money to change the engine, so primitive shader is far better for them.

It's a shame that RTG has decided to communicate so poorly (or rather not communicate at all) about the reasons for the primitive shaders not being adopted. All of a sudden the usual FUDers were claiming "broken hardware", and that doesn't help AMD at all.
I understand that AMD marketed the primitive shaders as something that would bring a substantial boost to geometry performance, which is something nvidia was known to perform much better at. However, compared to GPGPU culling there's apparently no performance advantage and the only reason for implementing primitive shaders is time of development for engines that don't have the first. So the primitive shaders were never going to provide what AMD promised they'd provide.
Perhaps the primitive shaders will see wide adoption on next-gen consoles. Maybe some exclusive games for the Subor console are using it already, for example. But on the PC for an IHV that has little over 15% of the GPU market there is little incentive for a dev to use primitive shaders.
The only missing link here is the automated mode that @Rys mentioned, but I'm guessing the performance for that was below expectations, and the AAAs are all using GPGPU culling already so again the incentive for RTG to dedicate time&money on it was very small.
 
Thing is primitive shaders are not even "adoptable" right now since it's not exposed in any api

For me, it's could be a mix between broken hardware, and not enough human ressources to make it work.

In the gamer nexus video about PS, it seems more advanced than compute culling... It never worked anyway but the principles seem really nice.
 
Thing is primitive shaders are not even "adoptable" right now since it's not exposed in any api

For me, it's could be a mix between broken hardware, and not enough human ressources to make it work.

In the gamer nexus video about PS, it seems more advanced than compute culling... It never worked anyway but the principles seem really nice.

The primitive shaders work because devs had the opportunity to do performance profiling on them, and AMD showed it working at CES 2017.
They're not working in any public driver release because AMD discontinued their development/support on Vega.

It doesn't even make much sense to assume the hardware is "broken", considering there are no less than 5 distinct Vega GPUs (Vega 10, Raven Ridge, Fenghuang, Vega 11, Vega 20) and the hardware blocks would have been fixed through each iteration.
 
What I don't get about this whole affair: If it's working and AMD has a driver drop, why not include it in public drivers so any developer (not only the big ones with corp NDAs) can toy around with if they choose to. That way, AMD can get free feedback what would need to be improved to make it worthwhile. In order to cut down unnecessary support request, they can label it a beta implementation or whatnot.
 
Well, does this fully explain the NGG weirdness? I don't think so.

AMD Vega architecture footnote:
Data based on AMD Engineering design of Vega. Radeon R9 Fury X has 4 geometry engines and a peak of 4 polygons per clock. Vega is designed to handle up to 11 polygons per clock with 4 geometry engines. This represents an increase of 2.6x.
 
The primitive shaders work because devs had the opportunity to do performance profiling on them, and AMD showed it working at CES 2017.
They're not working in any public driver release because AMD discontinued their development/support on Vega.

It doesn't even make much sense to assume the hardware is "broken", considering there are no less than 5 distinct Vega GPUs (Vega 10, Raven Ridge, Fenghuang, Vega 11, Vega 20) and the hardware blocks would have been fixed through each iteration.

Have you some sources on that ? Because the few devs i asked on tweeter about PS performances (vs compute solution) said they never could work with it. It's not a "I don't believe you" kind of message, I'm genuinely curious about that.

For the broken part, I guess it costs money to fix thing, even when release new gpus based on the same arch. So, my guess is they said "screw Vega, we'll focus on Navi now".
 
Well, does this fully explain the NGG weirdness? I don't think so.

AMD Vega architecture footnote:
Data based on AMD Engineering design of Vega. Radeon R9 Fury X has 4 geometry engines and a peak of 4 polygons per clock. Vega is designed to handle up to 11 polygons per clock with 4 geometry engines. This represents an increase of 2.6x.
Why is that a weirdness?
 
Have you some sources on that ?
I quoted and linked to zlatan's post above, who's a dev that was working on a PSVR title a couple of years ago. I don't know what he's working on right now since he's preferred to stay anonymous so far, but at the anandtech forums he's known to have a lot of knowledge to spare in what relates to GPU performance profiling.
 
Well, does this fully explain the NGG weirdness? I don't think so.

AMD Vega architecture footnote:
Data based on AMD Engineering design of Vega. Radeon R9 Fury X has 4 geometry engines and a peak of 4 polygons per clock. Vega is designed to handle up to 11 polygons per clock with 4 geometry engines. This represents an increase of 2.6x.
So, is each geometry engine processing 2.75 polys per clock?
 
So, is each geometry engine processing 2.75 polys per clock?
No, the 11 polygons per clock is the theoretical max you could reach with NGG, without NGG they're doing 4 polygons per clock just like Fiji.
 
I quoted and linked to zlatan's post above, who's a dev that was working on a PSVR title a couple of years ago. I don't know what he's working on right now since he's preferred to stay anonymous so far, but at the anandtech forums he's known to have a lot of knowledge to spare in what relates to GPU performance profiling.


To me he was talking on a theoretical level. But if I'm wrong, all right then. My last chats with somes devs was "We can't even test it, it's not exposed anywhere", meh...
 
No, the 11 polygons per clock is the theoretical max you could reach with NGG, without NGG they're doing 4 polygons per clock just like Fiji.
Whenever I read marketing material like that*, I start to ask myself, what's the minimum amount of hardware capability required to not make this an outright lie. Reasoning being: If it was a great thing, it would not need to be hidden away like that.
While AMDs geometry engines each could fetch 3 vertices per clock, i.e. set up one independent triangle, the minimum to get from 4×3=12 vertices in 4 geometry engines to 12 polygone in 4 engines is to have a shared geometry cache, hence 11 polygons „handled“.

*the quote which yuri posted, not your post which I quoted. Just to be precise.
 
As CarstenS mentioned, the key word in that sentence is "handled".

We don't know how these 11 polygons were "handled" - my guess is that by handle they mean discard, rather than generate; the Vega white paper would point to this as well.
 
To me he was talking on a theoretical level. But if I'm wrong, all right then. My last chats with somes devs was "We can't even test it, it's not exposed anywhere", meh...
How could he say that using compute shaders with async and RPM would be faster without having seen how fast the primitive shaders were?
 
Back
Top