AMD Radeon VII Announcement and Discussion

vipa899 · Feb 13, 2019

ToTTenTranz said:
Practically 100% scaling on any architecture.

Crossfire high-end Navi in the future gonna be nice.

Deleted member 13524 · Feb 13, 2019

msia2k75 said:
From the graph' how much exactly the Radeon VII (at 984mV) card is consuming?

They're not measuring.
In this case they're using the scripted Unigine Heaven test which doesn't tax the CPU at all AFAIK, so the total system without the card is probably just some 10-20W above the idle values (so 70 to 80W). That said, with the undervolt the card probably consumes around 210-220W in Heaven (other games like Metro are reported to use quite a bit more power on GPUs).

Ike Turner · Feb 13, 2019

Sort of useless ...but interesting anyway.. Radeons (including VII) with primitive culling implemented on Linux:

https://www.phoronix.com/scan.php?page=news_item&px=RadeonSI-Prim-Culling-Async-Com

Magnum_Force · Feb 17, 2019

VII overclocking review - they manage to get it over 2Ghz at 360 watts:

https://www.hardwareluxx.de/index.p...c-test-mit-neuem-treiber-bei-ueber-2-ghz.html

Rootax · Feb 17, 2019

Ike Turner said:
Sort of useless ...but interesting anyway.. Radeons (including VII) with primitive culling implemented on Linux:

https://www.phoronix.com/scan.php?page=news_item&px=RadeonSI-Prim-Culling-Async-Com

But it's taking compute ressources right ? What is the difference with what Wolfenstein 2 is doing (their is a culling option too, done with compute) ?

At first I was happy like "Oh ?! They finally implanted Primitive Shaders ??". Then I read "compute shaders" :/

Kaotik · Feb 17, 2019

Apparently the reason why this feature yields such huge increases in ParaView is the fact that the application doesn't do any sort of culling beforehand, so injecting that asynchronous culling before vertex shader does the same job.
Games etc do this automatically and the feature doesn't help them

https://www.phoronix.com/scan.php?page=news_item&px=RadeonSI-Prim-Culling-Tests

Deleted member 13524 · Feb 18, 2019

Rootax said:
At first I was happy like "Oh ?! They finally implanted Primitive Shaders ??". Then I read "compute shaders" :/

There are some statements from devs saying that using primitive shaders wouldn't be faster, just easier to implement.
The consensus would be that smaller devs making a new engine could implement primitive shaders, but for larger dev houses or devs who are using existing engines it wouldn't make much sense to use it instead of compute shaders which are IHV-agnostic. Particularly on Vega because of its 2xFP16 throughput and async compute capabilities.

https://forums.anandtech.com/thread...-for-primitive-shaders.2535025/#post-39276531

zlatan said:
The implementation is not hard in the engine. A well designed converter can do 90 percent of job automatically, the last 10 percent is really easy, and the result is much better primitive discard on Vega. But personally I don't like the idea, because GPGPU culling is better. It's uglier, and harder to implement to the engine, but it will work on every hardware that can run compute shader (pretty much everything nowadays). I think this approach might be faster than primitive shader. With rapid packed math and async compute this is almost guaranteed. The main advantage of primitive shader is the easier implementation. That's for sure. But GPGPU culling is just my own egoistic view, because it works on the consoles, so it can be a true cross-platform solution.
(...)
The NGG implementation can be a lot easier. I accept that some devs may not have the money to change the engine, so primitive shader is far better for them.

It's a shame that RTG has decided to communicate so poorly (or rather not communicate at all) about the reasons for the primitive shaders not being adopted. All of a sudden the usual FUDers were claiming "broken hardware", and that doesn't help AMD at all.
I understand that AMD marketed the primitive shaders as something that would bring a substantial boost to geometry performance, which is something nvidia was known to perform much better at. However, compared to GPGPU culling there's apparently no performance advantage and the only reason for implementing primitive shaders is time of development for engines that don't have the first. So the primitive shaders were never going to provide what AMD promised they'd provide.
Perhaps the primitive shaders will see wide adoption on next-gen consoles. Maybe some exclusive games for the Subor console are using it already, for example. But on the PC for an IHV that has little over 15% of the GPU market there is little incentive for a dev to use primitive shaders.
The only missing link here is the automated mode that @Rys mentioned, but I'm guessing the performance for that was below expectations, and the AAAs are all using GPGPU culling already so again the incentive for RTG to dedicate time&money on it was very small.

Rootax · Feb 18, 2019

Thing is primitive shaders are not even "adoptable" right now since it's not exposed in any api

For me, it's could be a mix between broken hardware, and not enough human ressources to make it work.

In the gamer nexus video about PS, it seems more advanced than compute culling... It never worked anyway but the principles seem really nice.

Deleted member 13524 · Feb 18, 2019

Rootax said:
Thing is primitive shaders are not even "adoptable" right now since it's not exposed in any api

For me, it's could be a mix between broken hardware, and not enough human ressources to make it work.

In the gamer nexus video about PS, it seems more advanced than compute culling... It never worked anyway but the principles seem really nice.

The primitive shaders work because devs had the opportunity to do performance profiling on them, and AMD showed it working at CES 2017.
They're not working in any public driver release because AMD discontinued their development/support on Vega.

It doesn't even make much sense to assume the hardware is "broken", considering there are no less than 5 distinct Vega GPUs (Vega 10, Raven Ridge, Fenghuang, Vega 11, Vega 20) and the hardware blocks would have been fixed through each iteration.

CarstenS · Feb 19, 2019

What I don't get about this whole affair: If it's working and AMD has a driver drop, why not include it in public drivers so any developer (not only the big ones with corp NDAs) can toy around with if they choose to. That way, AMD can get free feedback what would need to be improved to make it worthwhile. In order to cut down unnecessary support request, they can label it a beta implementation or whatnot.

yuri · Feb 19, 2019

Well, does this fully explain the NGG weirdness? I don't think so.

AMD Vega architecture footnote:
Data based on AMD Engineering design of Vega. Radeon R9 Fury X has 4 geometry engines and a peak of 4 polygons per clock. Vega is designed to handle up to 11 polygons per clock with 4 geometry engines. This represents an increase of 2.6x.

Rootax · Feb 19, 2019

ToTTenTranz said:
The primitive shaders work because devs had the opportunity to do performance profiling on them, and AMD showed it working at CES 2017.
They're not working in any public driver release because AMD discontinued their development/support on Vega.

It doesn't even make much sense to assume the hardware is "broken", considering there are no less than 5 distinct Vega GPUs (Vega 10, Raven Ridge, Fenghuang, Vega 11, Vega 20) and the hardware blocks would have been fixed through each iteration.

Have you some sources on that ? Because the few devs i asked on tweeter about PS performances (vs compute solution) said they never could work with it. It's not a "I don't believe you" kind of message, I'm genuinely curious about that.

For the broken part, I guess it costs money to fix thing, even when release new gpus based on the same arch. So, my guess is they said "screw Vega, we'll focus on Navi now".

CarstenS · Feb 19, 2019

yuri said:
Well, does this fully explain the NGG weirdness? I don't think so.

AMD Vega architecture footnote:
Data based on AMD Engineering design of Vega. Radeon R9 Fury X has 4 geometry engines and a peak of 4 polygons per clock. Vega is designed to handle up to 11 polygons per clock with 4 geometry engines. This represents an increase of 2.6x.

Why is that a weirdness?

Deleted member 13524 · Feb 19, 2019

Rootax said:
Have you some sources on that ?

I quoted and linked to zlatan's post above, who's a dev that was working on a PSVR title a couple of years ago. I don't know what he's working on right now since he's preferred to stay anonymous so far, but at the anandtech forums he's known to have a lot of knowledge to spare in what relates to GPU performance profiling.

milk · Feb 19, 2019

yuri said:
Well, does this fully explain the NGG weirdness? I don't think so.

AMD Vega architecture footnote:
Data based on AMD Engineering design of Vega. Radeon R9 Fury X has 4 geometry engines and a peak of 4 polygons per clock. Vega is designed to handle up to 11 polygons per clock with 4 geometry engines. This represents an increase of 2.6x.

So, is each geometry engine processing 2.75 polys per clock?

Kaotik · Feb 19, 2019

milk said:
So, is each geometry engine processing 2.75 polys per clock?

No, the 11 polygons per clock is the theoretical max you could reach with NGG, without NGG they're doing 4 polygons per clock just like Fiji.

Rootax · Feb 19, 2019

ToTTenTranz said:
I quoted and linked to zlatan's post above, who's a dev that was working on a PSVR title a couple of years ago. I don't know what he's working on right now since he's preferred to stay anonymous so far, but at the anandtech forums he's known to have a lot of knowledge to spare in what relates to GPU performance profiling.

To me he was talking on a theoretical level. But if I'm wrong, all right then. My last chats with somes devs was "We can't even test it, it's not exposed anywhere", meh...

CarstenS · Feb 19, 2019

Kaotik said:
No, the 11 polygons per clock is the theoretical max you could reach with NGG, without NGG they're doing 4 polygons per clock just like Fiji.

Whenever I read marketing material like that*, I start to ask myself, what's the minimum amount of hardware capability required to not make this an outright lie. Reasoning being: If it was a great thing, it would not need to be hidden away like that.
While AMDs geometry engines each could fetch 3 vertices per clock, i.e. set up one independent triangle, the minimum to get from 4×3=12 vertices in 4 geometry engines to 12 polygone in 4 engines is to have a shared geometry cache, hence 11 polygons „handled“.

*the quote which yuri posted, not your post which I quoted. Just to be precise.

Magnum_Force · Feb 19, 2019

As CarstenS mentioned, the key word in that sentence is "handled".

We don't know how these 11 polygons were "handled" - my guess is that by handle they mean discard, rather than generate; the Vega white paper would point to this as well.

Deleted member 13524 · Feb 19, 2019

Rootax said:
To me he was talking on a theoretical level. But if I'm wrong, all right then. My last chats with somes devs was "We can't even test it, it's not exposed anywhere", meh...

How could he say that using compute shaders with async and RPM would be faster without having seen how fast the primitive shaders were?

AMD Radeon VII Announcement and Discussion

vipa899

Deleted member 13524

Guest

Ike Turner

Magnum_Force

Rootax

Kaotik

Drunk Member

Deleted member 13524

Guest

Rootax

Deleted member 13524

Guest

CarstenS

Moderator

yuri

Rootax

CarstenS

Moderator

Deleted member 13524

Guest

milk

Like Verified

Kaotik

Drunk Member

Rootax

CarstenS

Moderator

Magnum_Force

Deleted member 13524

Guest

Similar threads