AMD Vega Hardware Reviews

Discarging is a week Point at Vega without primitive shader -.- Thats why i'm asking for. If it Comes i think we will see a huge benefit.
And yet, half a billion tris/sec is laughable for a potent GPU like this. In case you forgot (or are mixing up german Billionen with the english billion): RX Vega can discard even without primitive shaders around 6 billion vertices/sec.
 
It's not about handling them, it's about doing it faster.
I'm not even talking about primitive shaders. I'm talking about the oh-so massive 500M triangles/sec. that was brought up here being no problem at all for any Vega card (yet launched) - depending on what is done with them.
 
Back faced culled, maybe small to hit and if you dont look ther frustrum?

It's not how it works. Back face culling culls triangles not facing camera. Obstructed elements can be both facing camera and inside viewing frustrum. How not rendering obstructed objects is solved is typically by rasterizing+z-buffer. Once you start to go towards that path you will soon notice it's quite complicated and you would have implemented big parts of gpu to primitive shaders. Of course in game engine level where you have more context it's possible to do better but primitive shaders would not have information like bounding box/sphere/whatnot inside which object is. In game engine it would be possible and sensible to do all kinds of occlusion implementation which is completely independent of primitive shaders.

What you likely can do with primitive shaders is to cull triangles not facing camera, cull triangles not inside viewing frustrum and do some sort of binning to tiles to allow rasterizer to operate more efficiently. Maybe also something like discarding triangles with 0 area and tesselation?
 
Also @MDolenc do you see any practical performance benefit from primitive shaders? If not, why?

edit - also if you don't think so then why did AMD bother with them?
I think it's helpful to give a pointer to a discussion that I had with Jawed past August. He explains how primitive shaders work around certain bottlenecks in the pipeline.

The meaty parts starts here, and the discussions continues from there in the following pages:
https://forum.beyond3d.com/threads/...rs-and-discussion.59649/page-183#post-1996460

From my understanding back then, it has indeed nothing to do with TBDR.

Right now, I'm using the following mental analogy of primitive shaders: I see geometry stages in the pipeline as relatively small functions in a C program that are getting called in succession. Without an optimizing compiler, there's a lot of overhead just passing around function parameters and results, pushing and popping things to and from the stack etc. But an optimizing compiler would simply inline the different subroutines and all that overhead goes away.
 
What's with the attitude?

Anyway, TBDRs have been throwing away all opaque overdraw since the 1990s. It's quite possible to do.
I'm a bit tired of having to repeatedly point out to the absolute basics of the 3D pipeline on a forum where I would have hoped at least "the life of a triangle" would be clear.
And there are cavets of TBDRs too. The most important one in this part of the discussion is that even in that case triangles are still rasterized. And we know how many triangles Vega can rasterize. It's 4. It can do better then 4 when it can throw something away prior to rasterization. This essentially means if it can determine that triangle does not contribute anything to the scene without looking into Z-buffer.
And even then: all recent GPUs are able to reject triangles at a faster rate then they are able to rasterize them. Vega with primitive shaders is just much faster at it then the rest of the AMD GPU line.

Also @MDolenc do you see any practical performance benefit from primitive shaders? If not, why?

edit - also if you don't think so then why did AMD bother with them?
There are. Overkill tessellation is one obvious place where it would help. But I see most of the uses for VR tricks, but we don't know enough about it.

That's another figure blown out of proportion... Those 220M polygons don't all get submitted to the GPU.
 
Of course in game engine level where you have more context it's possible to do better but primitive shaders would not have information like bounding box/sphere/whatnot inside which object is. In game engine it would be possible and sensible to do all kinds of occlusion implementation which is completely independent of primitive shaders.
Keep in mind primitive shaders mention "with the right knowledge", so engine based criteria may very well be passed in to facilitate culling. Binning might also be a possibility, but more documentation is required.
 
Right now, I'm using the following mental analogy of primitive shaders: I see geometry stages in the pipeline as relatively small functions in a C program that are getting called in succession. Without an optimizing compiler, there's a lot of overhead just passing around function parameters and results, pushing and popping things to and from the stack etc. But an optimizing compiler would simply inline the different subroutines and all that overhead goes away.

The merging of shader stages, multiple combinations of API shader types, different enablement states, dynamic state, and the DSBR(and its multiple modes) might explain why this optimizing compiler analogy hasn't provided the results necessary to provide it generally. Worse if exposing some the internals of those subroutines also exposed or created an x87-type oversight or bug, where some nasty corner case winds up pessimizing the overall critical path.

The remaining transition points in the pipeline are also centered around hand-offs between hardware blocks with different amplification and reduction possibilities, so the fixed-function stages might be better characterized as independent threads with access to dedicated libraries, specialized formatting intrinsics, a hardware-compressed more complete representation of architectural behavior.

The primitive shaders are described as being implemented serially with respect to a significant portion of the existing path, and they are implemented as conservative (imperfect and optional) representations of the optimized subroutines.
The optimizing compiler in this analogy may find that an optimized sequence of cost X might translate into X+n in the critical path, plus feeding into a pre-existing concurrent machine. The vertex processing path is generally characterized as being less able to hide latency, and the DSBR's related patents indicate a serial component and limited scope for concurrent batching.
The binning process might also have some inflection points related to batching latency versus a longer window for coalescing/culling.

The old way might have burned more resources and bandwidth, but that might matter more for a small implementation like Raven Ridge, whereas an implementation like Vega 64 finds its scads of resources and larger power budget less able hide the incrementally higher serial overhead revealed by its much more parallel engine and broader fixed-function resources.
 
What type of binning are you referring to?
Binning of geometry in general, however various reordering and submission schemes may apply. It may be more applicable to coarsely rasterizer the bounding box of a patch to facilitate occlusion. Culling or hinting with the additional data generated. Reorder patches, as opposed to triangles or tessellated geometry. Perhaps even chaining primitive shaders together. Even with occlusion there is no reason to reorder everything, just figure out what's on top. That may very well be deferred from the previous frame or some form of feedback in the system.
 
Binning of geometry in general, however various reordering and submission schemes may apply.
Wouldn't this fall to the DSBR hardware? Also when programming the graphics pipeline you are guaranteed that triangles are drawn in submission order. IIRC there is a way to bypass this but I don't remember if it is IHV specific or if it exists at all.
 
Wouldn't this fall to the DSBR hardware? Also when programming the graphics pipeline you are guaranteed that triangles are drawn in submission order. IIRC there is a way to bypass this but I don't remember if it is IHV specific or if it exists at all.
Yes, however DSBR is likely just a higher abstraction of primitive shaders and the pipeline. Much of the data used by primitive shaders for culling would be relevant for binning. So it stands to reason they are intertwined.

Guarantees have a habit of being fungible when drivers detect they can get away with it. As linked above, there are mechanisms to explicitly control it. As an optimization all drivers likely go out of order where possible. There was a comment to that effect on a recent linux commit. How aggressive the IHVs are I couldn't say. It's one of those black box things that could change somewhat randomly.
 
It does say quite specifically in the GPUOpen article I linked above that:
You won’t see any benefit if the driver had enabled it automatically of course (for instance, depth-only rendering). In nearly all other cases, the driver has to play safe and cannot enable it even though there wouldn’t be any visible artifacts.
 
Back
Top