AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Strange that Resource Binding ist listed for Pascal still as Tier 2. Recent drivers showed it to be Tier 3. And the Vega FE driver at least did not yet support Standard Swizzle - good to see it's finally getting picked up by someone!
 
Have you experienced anything similar to the weird texture fillrate results and unusually low effective memory bandwidth in your work with Vega?
Radeon GPU profiler doesn't have bandwidth graphs yet. I have requested this feature (along other bottleneck graphs). This tool has proven already to be a huge improvement for understanding AMD PC GPUs and drivers at low level. When we eventually get bandwidth graph support, I can analyze some Vega captures. Don't expect this to happen at RX launch however.
 
My guess is that Vega is likely bandwidth bound as the tiled rasterizer is not enabled.
It's possible, but I'm going to guess not. From the tone of the conversations I had, while DSBR will improve things, everyone was quick to point out that the gains would be higher on a more resource-constrained card. Those aren't the kind of comments I'd expect if they thought performance would make a huge jump with DSBR.

dsbr_savings.png
 
Last edited:
BW limited or not, a 1080 has a BW of 256GB/s. Vega has almost double that. The lack of DSBR is not a sufficient explanation.
Pascal has very good tiled rasterizer, very good DCC and a large L2 cache. Benchmarks have proven Pascal DCC to be significantly ahead of Polaris (GCN4). Vega FE current drivers have disabled tiled rasterizer. Because of all these advantages, Pascal can get away with 256 GB/s. And obviously these bandwidth saving mechanisms also save significant amount of power.
 
GTX 1060 was beating RX 480 by 12% in TechSpot launch review. They ran benchmarks again with newest driver (at RX 580 launch). The difference had dropped to 1%. That's a 11% increase. And that's been AMDs track record for ages. There's plenty of reviews stating the same thing for different AMD GPUs.
It's been AMD's track record for ages, and it has the track record of competitive success to match.
It's helpful to improve over time if one starts in a leading or competitive position, and it's something of a double-edged sword with the following:

Vega is the biggest change to GCN architecture since GCN launch. Would it be wrong to assume that AMD needs more time than Polaris required to get the drivers up to peak performance?
To note, the quoted review was a nearly 1-year later retrospective, and that was for an easier transition.
I don't mind seeing things dissected a long while after the fact, but having a multiplier applied to the a year's worth of incremental improvement will leave me struggling to be interested.

AMD today officially told us that the tiled rasterizer has been disabled in the current drivers.
Perhaps if they ever open up as to why, it might be interesting. It's pointing to a potentially fragile or flawed feature, which means perhaps waiting for version 2.0 or perhaps someone else who has a somewhat similar tech going to an even higher version number.
 
Strange that Resource Binding ist listed for Pascal still as Tier 2. Recent drivers showed it to be Tier 3. And the Vega FE driver at least did not yet support Standard Swizzle - good to see it's finally getting picked up by someone!
I am glad for standard swizzle too, but we need also Nvidia support to make it a relevant feature. Pascal was upgraded to binding Tier 3 a few days ago. Most likely these slides were made before that.
 
It's possible, but I'm going to guess not. From the tone of the conversations I had, while DSBR will improve things, everyone was quick to point out that the gains would be higher on a more resource-constrained card. Those aren't the kind of comments I'd expect if they thought performance would make a huge jump with DSBR.

AMD's patents put the emphasis of a binning deferred rasterizer on the avoidance of pixel shader work in the presence of overdraw. The choice to measure DSBR in terms of bandwidth saved doesn't directly measure that impact, which may align with statements that it matters more for resource-constrained cards.
One potential wrinkle among many is that a decent chunk of the pipeline is dependent on wave packing, pixel quads, ROP tiles, or cache lines as the granularity, and the DSBR in complex scenarios would be prone to spitting out fragments that would wind up taking up resources or data movement at those levels regardless. The rumors of SIMD-length changes or better packing seem to be a "wait for gen N+1" at this point.
 
Perhaps if they ever open up as to why, it might be interesting. It's pointing to a potentially fragile or flawed feature, which means perhaps waiting for version 2.0 or perhaps someone else who has a somewhat similar tech going to an even higher version number.
Software engineering has grown to be a huge part of any hardware launch. I would guess that this feature requires lots of driver code (some running on CPU, some on command processors). You can ship a fully functional hardware without good software support. Obviously this is not ideal situation for marketing, since people tend to judge products at launch date. But at least they now have a tiled rasterizer that they can improve upon (whether software or hardware changes or both).
 
More efficient shading with the draw-stream binning rasterizer
AMD is significantly overhauling Vega's pixel-shading approach, as well. The next-generation pixel engine on Vega incorporates what AMD calls a "draw-stream binning rasterizer," or DSBR from here on out. The company describes this rasterizer as an essentially tile-based approach to rendering that lets the GPU more efficiently shade pixels, especially those with extremely complex depth buffers. The fundamental idea of this rasterizer is to perform a fetch for overlapping primitives only once, and to shade those primitives only once. This approach is claimed to both improve performance and save power, and the company says it's especially well-suited to performing deferred rendering.

The DSBR can schedule work in what AMD describes as a "cache-aware" fashion, so it'll try to do as much work as possible for a given "bundle" of objects in a scene that relate to the data in a cache before the chip proceeds to flush the cache and fetch more data. The company says that a given pixel in a scene with many overlapping objects might be visited many times during the shading process, and that cache-aware approach makes doing that work more efficient. The DSBR also lets the GPU discover pixels in complex overlapping geometry that don't need to be shaded, and it can do that discovery no matter what order that overlapping geometry arrives in. By avoiding shading pixels that won't be visible in the final scene, Vega's pixel engine further improves efficiency.


To help the DSBR do its thing, AMD is fundamentally altering the availability of Vega's L2 cache to the pixel engine in its shader clusters. In past AMD architectures, memory accesses for textures and pixels were non-coherent operations, requiring lots of data movement for operations like rendering to a texture and then writing that texture out to pixels later in the rendering pipeline. AMD also says this incoherency raised major synchronization and driver-programming challenges.


To cure this headache, Vega's render back-ends now enjoy access to the chip's L2 cache in the same way that earlier stages in the pipeline do. This change allows more data to remain in the chip's L2 cache instead of being flushed out and brought back from main memory when it's needed again, and it's another improvement that can help deferred-rendering techniques.

The draw-stream binning rasterizer won't always be the rasterization approach that a Vega GPU will use. Instead, it's meant to complement the existing approaches possible on today's Radeons. AMD says that the DSBR is "highly dynamic and state-based," and that the feature is just another path through the hardware that can be used to improve rendering performance. By using data in a cache-aware fashion and only moving data when it has to, though, AMD thinks that this rasterizer will help performance in situations where the graphics memory (or high-bandwidth cache) becomes a bottleneck, and it'll also save power even when the path to memory isn't saturated.

By minimizing data movement in these ways, AMD says the DSBR is its next thrust at reducing memory bandwidth requirements. It's the latest in a series of solutions to the problem of memory-bandwidth efficiency that AMD has been working on across many generations of its products. In the past, the company has implemented better delta color compression algorithms, fast Z clear, and hierarchical-Z occlusion detection to reduce pressure on memory bandwidth.

http://techreport.com/review/31224/the-curtain-comes-up-on-amd-vega-architecture/3

Hmm yeah, doubt we'll see any performance and better power usage once that is enabled...
 
Strange that Resource Binding ist listed for Pascal still as Tier 2. Recent drivers showed it to be Tier 3. And the Vega FE driver at least did not yet support Standard Swizzle - good to see it's finally getting picked up by someone!

Did anyone confirm that they do support Tier 3 and that it wasn't just the driver reporting incorrectly?
 
Did anyone confirm that they do support Tier 3 and that it wasn't just the driver reporting incorrectly?
That being a WHQL driver, I'd try to get a refund on the fees paid to microsoft for certification if such a big mistake slipped past. :D
 
Those are all very old titles, who knows when that was tested, maybe it was tested a year ago and something broke and caused them to have to rework it. We don't know
Sorry, that's just grasping at straws, AMD has the data, means they have the feature enabled in their current driver.

They already have the Vega driver up and running (17.30-170711).

slides-raja-52.jpg

GTX 1060 was beating RX 480 by 12% in TechSpot launch review. They ran benchmarks again with newest driver (at RX 580 launch). The difference had dropped to 1%. That's a 11% increase. And that's been AMDs track record for ages. There's plenty of reviews stating the same thing for different AMD GPUs.
This is completely different, the set of games tested have changed between launch of 480 and 580, some new games favored the RX480 more, other favored NV more.
Vega is the biggest change to GCN architecture since GCN launch. Would it be wrong to assume that AMD needs more time than Polaris required to get the drivers up to peak performance? AMD today officially told us that the tiled rasterizer has been disabled in the current drivers.
They also didn't claim any significant performance increases after enabling the feature, they also released final performance targets for their hardware which were not that different from Vega FE. We can't draw sweeping conclusions if the vendor isn't willing to draw any.
Expecting 10%+ performance gains during the first year of driver upgrades is plausible.
It's plausible. Just not plausible now at launch when AMD themselves are implying otherwise.
 
Software engineering has grown to be a huge part of any hardware launch. I would guess that this feature requires lots of driver code (some running on CPU, some on command processors). You can ship a fully functional hardware without good software support.
It's certainly true that the complexity of these systems is massive, but the difference between "could" and "should" is increasing in the current competitive climate.

Obviously this is not ideal situation for marketing, since people tend to judge products at launch date. But at least they now have a tiled rasterizer that they can improve upon (whether software or hardware changes or both).
One could judge a vendor by how often and how badly they do this, as a measure of how likely they are to re-offend, or how much they are willing to accept as payment for an incomplete product.


I think it's plausible that Nvidia and Intel's engineers considered an opportunistic hybrid tiling solution with hidden surface removal around the same time or earlier than AMD.
They may have also had a good idea of the complexities or shortcomings that would be encountered.

Nvidia, at least, went with a specific form of tiling, but not with the complexity adder of full HSR. That might point to one call being the better one.
I've been mulling over whether the date of AMD's patents on the concept may have meant it missed the level of adoption of more deferred engines or things like the visibility-buffer based engine concepts.
 
Is this like the „Fermi is doing tessellation in software“-software discussion? Or is it coming from the discussion whether or not the roughly corresponding Open CL limit is a hard limit or software enforced?
As I told you, i only read this somewhere. Anyway, if true, it is probably the second option.
 
Back
Top