AMD Vega Hardware Reviews

CarstenS · Nov 5, 2017

Digidi said:
Discarging is a week Point at Vega without primitive shader -.- Thats why i'm asking for. If it Comes i think we will see a huge benefit.

And yet, half a billion tris/sec is laughable for a potent GPU like this. In case you forgot (or are mixing up german Billionen with the english billion): RX Vega can discard even without primitive shaders around 6 billion vertices/sec.

Bondrewd · Nov 5, 2017

CarstenS said:
RX Vega can discard even without primitive shaders around 6 billion vertices/sec.

It's not about handling them, it's about doing it faster.

CarstenS · Nov 5, 2017

Bondrewd said:
It's not about handling them, it's about doing it faster.

I'm not even talking about primitive shaders. I'm talking about the oh-so massive 500M triangles/sec. that was brought up here being no problem at all for any Vega card (yet launched) - depending on what is done with them.

Bondrewd · Nov 5, 2017

Oops, the prices are skyrocketing along with remaining cards left.

manux · Nov 5, 2017

Digidi said:
Back faced culled, maybe small to hit and if you dont look ther frustrum?

It's not how it works. Back face culling culls triangles not facing camera. Obstructed elements can be both facing camera and inside viewing frustrum. How not rendering obstructed objects is solved is typically by rasterizing+z-buffer. Once you start to go towards that path you will soon notice it's quite complicated and you would have implemented big parts of gpu to primitive shaders. Of course in game engine level where you have more context it's possible to do better but primitive shaders would not have information like bounding box/sphere/whatnot inside which object is. In game engine it would be possible and sensible to do all kinds of occlusion implementation which is completely independent of primitive shaders.

What you likely can do with primitive shaders is to cull triangles not facing camera, cull triangles not inside viewing frustrum and do some sort of binning to tiles to allow rasterizer to operate more efficiently. Maybe also something like discarding triangles with 0 area and tesselation?

Digidi · Nov 6, 2017

But it will help with 220m polygons per frame
https://www.pcper.com/reviews/Graph...w-Redesigned-Memory-Architecture/Primitive-Sh

Bondrewd · Nov 6, 2017

Digidi said:
But it will help with 220m polygons per frame
https://www.pcper.com/reviews/Graph...w-Redesigned-Memory-Architecture/Primitive-Sh

That's somewhat cherrypicked example.

But yes, it should make them go fast (they are red already, AND RED IS SUPPOSED TO GO FASTA' YA GROT).

silent_guy · Nov 6, 2017

Infinisearch said:
Also @MDolenc do you see any practical performance benefit from primitive shaders? If not, why?

edit - also if you don't think so then why did AMD bother with them?

I think it's helpful to give a pointer to a discussion that I had with Jawed past August. He explains how primitive shaders work around certain bottlenecks in the pipeline.

The meaty parts starts here, and the discussions continues from there in the following pages:
https://forum.beyond3d.com/threads/...rs-and-discussion.59649/page-183#post-1996460

From my understanding back then, it has indeed nothing to do with TBDR.

Right now, I'm using the following mental analogy of primitive shaders: I see geometry stages in the pipeline as relatively small functions in a C program that are getting called in succession. Without an optimizing compiler, there's a lot of overhead just passing around function parameters and results, pushing and popping things to and from the stack etc. But an optimizing compiler would simply inline the different subroutines and all that overhead goes away.

MDolenc · Nov 6, 2017

Grall said:
What's with the attitude?

Anyway, TBDRs have been throwing away all opaque overdraw since the 1990s. It's quite possible to do.

I'm a bit tired of having to repeatedly point out to the absolute basics of the 3D pipeline on a forum where I would have hoped at least "the life of a triangle" would be clear.
And there are cavets of TBDRs too. The most important one in this part of the discussion is that even in that case triangles are still rasterized. And we know how many triangles Vega can rasterize. It's 4. It can do better then 4 when it can throw something away prior to rasterization. This essentially means if it can determine that triangle does not contribute anything to the scene without looking into Z-buffer.
And even then: all recent GPUs are able to reject triangles at a faster rate then they are able to rasterize them. Vega with primitive shaders is just much faster at it then the rest of the AMD GPU line.

Infinisearch said:
Also @MDolenc do you see any practical performance benefit from primitive shaders? If not, why?

edit - also if you don't think so then why did AMD bother with them?

There are. Overkill tessellation is one obvious place where it would help. But I see most of the uses for VR tricks, but we don't know enough about it.

Digidi said:
But it will help with 220m polygons per frame
https://www.pcper.com/reviews/Graph...w-Redesigned-Memory-Architecture/Primitive-Sh

That's another figure blown out of proportion... Those 220M polygons don't all get submitted to the GPU.

entity279 · Nov 6, 2017

silent_guy said:
The meaty parts starts here, and the discussions continues from there in the following pages:
https://forum.beyond3d.com/threads/...rs-and-discussion.59649/page-183#post-1996460

I remember that discussion. Link is wrong? page 183

silent_guy · Nov 6, 2017

entity279 said:
I remember that discussion. Link is wrong? page 183

Link works fine on my end, but maybe you have a different posts-per-page setting. Try this one instead: https://forum.beyond3d.com/posts/1996460/

Anarchist4000 · Nov 6, 2017

manux said:
Of course in game engine level where you have more context it's possible to do better but primitive shaders would not have information like bounding box/sphere/whatnot inside which object is. In game engine it would be possible and sensible to do all kinds of occlusion implementation which is completely independent of primitive shaders.

Keep in mind primitive shaders mention "with the right knowledge", so engine based criteria may very well be passed in to facilitate culling. Binning might also be a possibility, but more documentation is required.

CarstenS · Nov 6, 2017

No need to cross-post.

Infinisearch · Nov 6, 2017

Anarchist4000 said:
Binning might also be a possibility,

What type of binning are you referring to?

3dilettante · Nov 6, 2017

silent_guy said:
Right now, I'm using the following mental analogy of primitive shaders: I see geometry stages in the pipeline as relatively small functions in a C program that are getting called in succession. Without an optimizing compiler, there's a lot of overhead just passing around function parameters and results, pushing and popping things to and from the stack etc. But an optimizing compiler would simply inline the different subroutines and all that overhead goes away.

The merging of shader stages, multiple combinations of API shader types, different enablement states, dynamic state, and the DSBR(and its multiple modes) might explain why this optimizing compiler analogy hasn't provided the results necessary to provide it generally. Worse if exposing some the internals of those subroutines also exposed or created an x87-type oversight or bug, where some nasty corner case winds up pessimizing the overall critical path.

The remaining transition points in the pipeline are also centered around hand-offs between hardware blocks with different amplification and reduction possibilities, so the fixed-function stages might be better characterized as independent threads with access to dedicated libraries, specialized formatting intrinsics, a hardware-compressed more complete representation of architectural behavior.

The primitive shaders are described as being implemented serially with respect to a significant portion of the existing path, and they are implemented as conservative (imperfect and optional) representations of the optimized subroutines.
The optimizing compiler in this analogy may find that an optimized sequence of cost X might translate into X+n in the critical path, plus feeding into a pre-existing concurrent machine. The vertex processing path is generally characterized as being less able to hide latency, and the DSBR's related patents indicate a serial component and limited scope for concurrent batching.
The binning process might also have some inflection points related to batching latency versus a longer window for coalescing/culling.

The old way might have burned more resources and bandwidth, but that might matter more for a small implementation like Raven Ridge, whereas an implementation like Vega 64 finds its scads of resources and larger power budget less able hide the incrementally higher serial overhead revealed by its much more parallel engine and broader fixed-function resources.

Anarchist4000 · Nov 7, 2017

Infinisearch said:
What type of binning are you referring to?

Binning of geometry in general, however various reordering and submission schemes may apply. It may be more applicable to coarsely rasterizer the bounding box of a patch to facilitate occlusion. Culling or hinting with the additional data generated. Reorder patches, as opposed to triangles or tessellated geometry. Perhaps even chaining primitive shaders together. Even with occlusion there is no reason to reorder everything, just figure out what's on top. That may very well be deferred from the previous frame or some form of feedback in the system.

Infinisearch · Nov 7, 2017

Anarchist4000 said:
Binning of geometry in general, however various reordering and submission schemes may apply.

Wouldn't this fall to the DSBR hardware? Also when programming the graphics pipeline you are guaranteed that triangles are drawn in submission order. IIRC there is a way to bypass this but I don't remember if it is IHV specific or if it exists at all.

MDolenc · Nov 7, 2017

Infinisearch said:
IIRC there is a way to bypass this but I don't remember if it is IHV specific or if it exists at all.

Vulkan specific AMD extension.

Anarchist4000 · Nov 7, 2017

Infinisearch said:
Wouldn't this fall to the DSBR hardware? Also when programming the graphics pipeline you are guaranteed that triangles are drawn in submission order. IIRC there is a way to bypass this but I don't remember if it is IHV specific or if it exists at all.

Yes, however DSBR is likely just a higher abstraction of primitive shaders and the pipeline. Much of the data used by primitive shaders for culling would be relevant for binning. So it stands to reason they are intertwined.

Guarantees have a habit of being fungible when drivers detect they can get away with it. As linked above, there are mechanisms to explicitly control it. As an optimization all drivers likely go out of order where possible. There was a comment to that effect on a recent linux commit. How aggressive the IHVs are I couldn't say. It's one of those black box things that could change somewhat randomly.

MDolenc · Nov 7, 2017

It does say quite specifically in the GPUOpen article I linked above that:

You won’t see any benefit if the driver had enabled it automatically of course (for instance, depth-only rendering). In nearly all other cases, the driver has to play safe and cannot enable it even though there wouldn’t be any visible artifacts.

AMD Vega Hardware Reviews

CarstenS

Moderator

Bondrewd

CarstenS

Moderator

Bondrewd

manux

Digidi

Bondrewd

silent_guy

MDolenc

entity279

silent_guy

Anarchist4000

CarstenS

Moderator

Infinisearch

3dilettante

Anarchist4000

Infinisearch

MDolenc

Anarchist4000

MDolenc

Similar threads