AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Or it comes from fixing any bugs that are holding things back. Just two weeks ago one got fixed that would have been a show stopper for certain key aspects of binning and primitive shaders. Not that long ago even FP16 between Vega and pre-Vega wasn't working due to opcode issues. It stands to reason there are some complicated shader compiler issues that are somewhat beyond the scope of just drivers. The LLVM tool chain with SM6+ probably takes some work and would likely be the priority for some contracts.

Would you be able to link to the specific references, to clarify the context of these fixes?
The open-source efforts for the graphics features were stated in some areas to be trailing the closed-source Windows efforts, and the specific project can determine how closely it follows.

Opcode issues would have been obvious when the encoding was decided upon, years ago.
 
I tested a Little bit around with Deus Ex Mankind divide. Intresting Fact: Disabling all Features and Keep Pollygon Features at Ultra Level keeps my Frames at the same Level 62 vs 60. Not much Change, so Vega is definetly Polygon bound.
 
I tested a Little bit around with Deus Ex Mankind divide. Intresting Fact: Disabling all Features and Keep Pollygon Features at Ultra Level keeps my Frames at the same Level 62 vs 60. Not much Change, so Vega is definetly Polygon bound.

Are those average FPS numbers?
Those numbers seem like they're close to a rounding error or normal variance between runs at the same settings with nothing changed.

It's not immediately clear from what was described if geometry processing has been isolated. Going by a naive assumption that there's geometry/vertex processing at the start of a frame followed by a non-zero period of pixel-dominated processing towards the end, it seems too small a change unless the pixel-related options have minimal impact.

If there's not something like a CPU limitation or the engine not applying settings as expected, it points to some other reasons for a lack of variation.
 
Last edited:
Would you be able to link to the specific references, to clarify the context of these fixes?
The open-source efforts for the graphics features were stated in some areas to be trailing the closed-source Windows efforts, and the specific project can determine how closely it follows.
https://lists.freedesktop.org/archives/mesa-dev/2017-November/177950.html

Efforts were trailing, but much of it was ported directly from a shared codebase. How they ended up with so many lines of code in the first place. Stands to reason a messed up bitfield wouldn't change all that much. Although it's also possible the bug was found via some fuzzing and not known to cause any issues. There could be private code we can't see that is affected as well. That wouldn't be unreasonable for DSBR or primitive shaders and the FP16 issue suggests they share compilers as FP16 on Polaris messed up around the same time Vega launched.
 
I tested a Little bit around with Deus Ex Mankind divide. Intresting Fact: Disabling all Features and Keep Pollygon Features at Ultra Level keeps my Frames at the same Level 62 vs 60. Not much Change, so Vega is definetly Polygon bound.
You have a tendency to reach conclusions based on minimal experiments.
 
Just watching in one direction with a lot of Buildings (polygons)
I don't know about DE:MD but buildings are usually composed of fewer larger polygons. Then there are things like LOD, PVS, and occulusion culling that could cull triangles from the bulidings and from stuff inside the buildings and behind the buildings.

If you want a lot of polys go somewhere with a lot of characters and/or props.
 
It appears to be a correction for unexpected compiler behavior at an uncommonly large range of texture sizes, and is applied from R600 onwards.
It seems like the code generation from the compiler would have led to a malformed data structure whose impact wouldn't be limited to binning or a primitive shader.

Stands to reason a messed up bitfield wouldn't change all that much. Although it's also possible the bug was found via some fuzzing and not known to cause any issues.
From my following the chain of previous patches, there was an attempt to pack the structure better by reducing element sizes where possible. The case of large array textures came up, and the change in base unit for the field and handling of the math to get around unexpected code generation behaviors followed.
If it weren't for the similarly recent repacking attempt, it doesn't seem like this corner case would have existed or done anything to stymie GFX9 features that predate this attempt. There's no strong motivation it seems to figure out to what extent this is a compiler or standard issue, or if it is a long-standing one. If this was some land-mine of significant consequence all along, it apparently didn't come up as a problem for a lot of prior architectures.

That wouldn't be unreasonable for DSBR or primitive shaders and the FP16 issue suggests they share compilers as FP16 on Polaris messed up around the same time Vega launched.
Which patch provides context on the FP16 issue?
 
If it weren't for the similarly recent repacking attempt, it doesn't seem like this corner case would have existed or done anything to stymie GFX9 features that predate this attempt. There's no strong motivation it seems to figure out to what extent this is a compiler or standard issue, or if it is a long-standing one. If this was some land-mine of significant consequence all along, it apparently didn't come up as a problem for a lot of prior architectures.
It wouldn't necessarily affect prior architectures until ported towards LLVM where Vega started. With HBCC Vega could accommodate far larger textures as well beyond larger memory capacities on top of that. It may very well not have been an issue before. As for the surface formats, it's difficult to know how large the fields get when binning, compression, swizzle, etc get thrown in. I'm assuming the code for non-functional parts isn't available here. I'd agree the issues are larger or they'd have tracked down the bugs already if it was just this.

Which patch provides context on the FP16 issue?
Not a patch, just that at the same time Polaris and Vega had conflicting opcodes, FP16 support broke on Polaris in many titles. Was the premise for that discussion a while back in regards to Carsten's benchmark issues. No hard evidence, but seems a likely culprit and indicates the same or similar compiler being used for the Windows code base.

Also, here's a new Region-based Image Decompression I haven't quite figured out. Fairly broad, but compressed mip-trees for binning/occlusion or perhaps a wireless VR compression? Lossy compression, while not technically accurate, could be a substitute for foveated rendering. Might also be practical for compressing occluded portions of a frame for reprojection as they do mention motion estimation.
 
Or it comes from fixing any bugs that are holding things back. Just two weeks ago one got fixed that would have been a show stopper for certain key aspects of binning and primitive shaders. Not that long ago even FP16 between Vega and pre-Vega wasn't working due to opcode issues. It stands to reason there are some complicated shader compiler issues that are somewhat beyond the scope of just drivers. The LLVM tool chain with SM6+ probably takes some work and would likely be the priority for some contracts.

It’s been 6 months since the release and how long since the driver development has started? If AMD is committing their software development man hours to bells and whistles like Adrenalin (NICE bells and whistles, but just that nonetheless) while major performance/feature showstoppers are just sitting there, unaddressed, I would seriously question their corporate sanity.

Est quod vides, I think.
 
It wouldn't necessarily affect prior architectures until ported towards LLVM where Vega started.
It's in the code in header files intended to be applied to multiple architectures. If it's compiled and run on the relevant hardware, it would affect the GPU.

With HBCC Vega could accommodate far larger textures as well beyond larger memory capacities on top of that. It may very well not have been an issue before.
It wasn't an issue before because the fields were 64-bit values and the nonsense code generation didn't happen until they tried changing that.
Unless the assertion is that Vega's driver and hardware cannot handle 64-bit values, the final structure would appear to be less onerous for hardware to handle than what came before it.
This issue does not seem to have existed early enough in order to give Vega's missing features an excuse.

Not a patch, just that at the same time Polaris and Vega had conflicting opcodes, FP16 support broke on Polaris in many titles. Was the premise for that discussion a while back in regards to Carsten's benchmark issues. No hard evidence, but seems a likely culprit and indicates the same or similar compiler being used for the Windows code base.
The opcode conflict I can think of stems from changes for the LLVM ROCm stack, dealing with AMD's decision to assign the pre-existing higher-level operation names from prior ISA versions to new variants with slightly different semantics in Vega.
This required introducing new names for tools like dissassemblers to use for referencing the unchanged legacy binary encodings.
The problems with purposefully creating a naming collision like that on purpose aside:
The idea that doing this for a newer code branch must damage functionality for pre-existing and functional drivers, for a separate functioning architecture, on titles that were likely using higher-level abstractions, and this wasn't noticed until years after the binary encodings were decided, months-years after the hardware should have been a prototyping target, and work towards correcting it didn't ramp until 2017 and after that silicon was final and either in production/shipping/retail points to a far more massive/fatal problem with RTG.

(edit: Granted, I have serious doubts that it can be that bad. I'd rather chalk the publicly visible indications to a trailing-edge project coupled with AMD's chronic underinvestment in software or due-diligence. The above scenario is crazy.)

Also, here's a new Region-based Image Decompression I haven't quite figured out. Fairly broad, but compressed mip-trees for binning/occlusion or perhaps a wireless VR compression? Lossy compression, while not technically accurate, could be a substitute for foveated rendering. Might also be practical for compressing occluded portions of a frame for reprojection as they do mention motion estimation.
Per reading at least part of the text, this is a continuation of a 2012 and 2011 filing. If it has VR implications, it probably didn't at the time.
The method in question has a cycle in the encoding path, which means non-deterministic time to encode and an asymmetry in encode/decode complexity. This was considered problematic for in-line latency-sensitive hardware encode/decode for DCC.
The logic also appears to be data-dependent on the content of a tile, not its global position versus something like the region corresponding to the fovea.
 
@3dilettante

AMD is currently looking for a Senior Shader Compiler Engineer: https://jobs.amd.com/job/Frimley-Senior-Shader-Compiler-Engineer/422922700/
According to the PCSX2 dev who maintains the gsdx graphics plugin, Adrenalin broke HW vertex processing: https://github.com/PCSX2/pcsx2/issues/1552 (scroll to the very bottom)
Also, PGCH germany stated that Vega is using a completely new Shader Compiler: http://www.pcgameshardware.de/Vega-...als/Architektur-NCU-HBCC-Vorstellung-1217460/

All of these things make it seem likely to me that they are having compiler issues. Does anyone know if the shader compiler in the AMD windows driver is also LLVM based or do they use something completely different?
 
It’s been 6 months since the release and how long since the driver development has started? If AMD is committing their software development man hours to bells and whistles like Adrenalin (NICE bells and whistles, but just that nonetheless) while major performance/feature showstoppers are just sitting there, unaddressed, I would seriously question their corporate sanity.

Est quod vides, I think.
Not all programming or engineer skillsets are equal. Creating graphical overlays is a bit different from heavy compiler work. Something could have gone wrong, fallen through the cracks, or resources were limited and allocated elsewhere. Regardless, Vega is selling as fast as AMD can apparently make them. So not that much of a showstopper.

It wasn't an issue before because the fields were 64-bit values and the nonsense code generation didn't happen until they tried changing that.
Unless the assertion is that Vega's driver and hardware cannot handle 64-bit values, the final structure would appear to be less onerous for hardware to handle than what came before it.
This issue does not seem to have existed early enough in order to give Vega's missing features an excuse.
It didn't happen before because I believe it was using a different computer and/or simply not encountered. AMD has been working on unifying all the drivers: win, Linux, Mac, and some unspecified others. Vega likely started on LLVM with SM6+ as the focus. Polaris and prior ported later. It wouldn't be surprised if something was missed.

The assertion would be that somewhere there exist 64 bit values. The bug likely wouldn't be limited to just the one commit it question and it could be situational. Enough so that a feature performs unreliably. VGPR indexing for example could be using 49bit addresses in conjunction with HBCC and various paging mechanisms. That bug could affect shaders as well as drivers. Keep in mind that was a capability noted as "would like to have, doesn't work" so was commented out. Commenting out code that will never work doesn't seem all that useful.

The idea that doing this for a newer code branch must damage functionality for pre-existing and functional drivers, for a separate functioning architecture, on titles that were likely using higher-level abstractions, and this wasn't noticed until years after the binary encodings were decided, months-years after the hardware should have been a prototyping target, and work towards correcting it didn't ramp until 2017 and after that silicon was final and either in production/shipping/retail points to a far more massive/fatal problem with RTG.
I'd agree it's not a good sign, but a team bringing up a new toolset independently could very well encounter that problem and going off the bug report obviously did encounter it. They just brought up the code with only one architecture in mind, or it was intended to be merged but got pushed public first along with an open source push. Years ago they may not have anticipated Microsoft moving DirectX onto the LLVM stack. Plans changed and staff was short. Tracks with the program manager and director positions also listed. Not to mention all the engineering spots.

All of these things make it seem likely to me that they are having compiler issues. Does anyone know if the shader compiler in the AMD windows driver is also LLVM based or do they use something completely different?
It seems likely, which is what I alluded to with the FP16 issue. AMD is maintaining their own internal staging branches that are likely ahead of the public repos, but that may just be the shader compiler for older cards. SM6 using LLVM kind of requires that. The driver side would just pass around compiled binaries. Linux at least has been all LLVM for a few years, but they started open.
 
All of these things make it seem likely to me that they are having compiler issues. Does anyone know if the shader compiler in the AMD windows driver is also LLVM based or do they use something completely different?
AMD could very well be having compiler issues, but the cited case is not reflective of their internal compiler status. It's a case of shrinking a structure defined with overly large fields, catching a possible case of wraparound at 32 bits, and then running across a gray area in gcc and Clang compilation behaviors. Various scenarios were listed had seemingly correct code generation, once the specific length threshold and wraparound case weren't in play--and they weren't prior to this.


It didn't happen before because I believe it was using a different computer and/or simply not encountered.
I think it didn't happen because the original field lengths were physically unreachable, and so the combination of casting values and operations at a specific length threshold problematic for specific compilers at specific settings didn't happen.

The assertion would be that somewhere there exist 64 bit values.
I think the context is that the structure was defined originally without space efficiency in mind, and the safe default was 64 bit fields for objects or resources that would get nowhere near the limits of the encoding.
 
I think it didn't happen because the original field lengths were physically unreachable, and so the combination of casting values and operations at a specific length threshold problematic for specific compilers at specific settings didn't happen.
So why fix it now and work around it if it was never a problem in the past? The entire argument is that we don't know where else that error would have occurred. Not without searching for 64 bit integers in public and private repos. Being a compiler issue it won't be limited to that one specific case. Memory addresses, execution masks, even intercepts all could get hit with that error. As I said before, AMD has been rewriting the new drivers from scratch in the process of unifying everything. At least for the Linux portion and llvm graphics would be relatively new. Only with SM6 would that transition really be required.
 
So why fix it now and work around it if it was never a problem in the past?
Going back a few messages, there were originating changes to pack the structure more efficiently.

The entire argument is that we don't know where else that error would have occurred. Not without searching for 64 bit integers in public and private repos.
This specific scenario didn't apply to 64-bit values, rather there is extra care taken with the shortened 32-bit field where there could be wraparound and some odd behavior when casting back to 64 as part of the workaround.

Being a compiler issue it won't be limited to that one specific case. Memory addresses, execution masks, even intercepts all could get hit with that error.
What value is there in casting a 64-entry execution mask or address pointer down to 32 bits?
These aren't addresses or masks, nor are they necessarily integrated into any shader code. The item of particular concern is a section calculating a size value for a given subsection of a resource, which would be decided long before a shader would be invoked using that surface. This is driver code, and the structs being defined have a lot of metadata such as reserved bits for specific driver projects, GPU device state, and specific checks for whether a surface is a Vulkan resource. I think this the wrong compiler and wrong side of the driver/GPU from shader compilation.
 

That's a description of the internal subdivision of the vertex and geometry stages, and what has been changed for GFX9.
The items in the GFX9 section all exist in the GFX6 section, just with some of the older stages removed and their functionality merged into the next.

The description of the primitive shaders from AMD seems to indicate that it would exist somewhere in the VS or GS-VS area, if enabled. The items in the table are what I would assume are the non-optional elements.
 
Back
Top