AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Oh yes, they surely would be. Unfortunately, as long as it is arbitrarily disabled in the drivers, developers cannot count on there being an installed base so those who are being ruled by their CFOs would think twive about using it. There needs to be concise drivers support, even it is only benefical in terms of register space saved.
Has there been further news indicating if this is based solely on GCN revision or if it also takes into account consumer/pro/instinct product line?
 
Not that I know of, apart from that it's apparently every capable GCN generation except for Vega. I have no access to a professional grade non-Vega card atm, that at the same time also provides FP16 precision, be it register packing only.
 
Wut? What DX12 game cannot be run on NV cards??? You crazy.
Full blown multi-engine, asynchronous execution, and bindless resources no. The current selection I wouldn't call comparable as they follow the traditional models. The GPU driven approaches I'd imagine are problematic for Nvidia and part of the reason for Volta's changes.

Does this patent that @Anarchist4000 linked a while back have anything to do with the "programmable" features in Vega? Or maybe something that can be added to a SoC with either Vega or Zen (or both)? Vega has a lot of SRAM, right?

Computer architecture using rapidly reconfigurable circuits and high-bandwidth memory interfaces

A programmable device comprises one or more programming regions, each comprising a plurality of configurable logic blocks, where each of the plurality of configurable logic blocks is selectively connectable to any other configurable logic block via a programmable interconnect fabric. The programmable device further comprises configuration logic configured to, in response to an instruction in an instruction stream, reconfigure hardware in one or more of the configurable logic blocks in a programming region independently from any of the other programming regions.

One embodiment of a programmable device is a Field-Programmable Gate Array (FPGA) having multiple configuration domains that can be reconfigured in parallel and independently of each other based on instructions in an instruction stream. Configuration data for each of the multiple configuration domains can be stored in three-dimensional (3D) stacked memory, which provides high-bandwidth access to the configuration data. The partitioning of the programmable logic in the device in conjunction with the high memory bandwidth allows for reconfiguration of the programmable logic within a few clock cycles, allowing for flexible pipelines that can be reconfigured to accommodate different types of instructions.

In an FPGA, the logic blocks can include elements such as lookup tables (LUTs) and other fixed functions that are programmed by inserting values into small Static Random Access Memories (SRAMs) or registers.

The implementation of flexible pipelines allows for greater flexibility in instruction scheduling, in contrast with fixed function processing pipelines, since any pipeline can be reconfigured to execute any of multiple types of instructions without interrupting execution of the instruction stream. In such a system, different threads executing different functions can be scheduled in a single cycle across multiple execution lanes.

I know basically nothing about this stuff... Just saying.
Really depends how it's used. Originally my thinking was Infinity between CUs. In which case it could allow the chip to be reconfigured to create pipelines as have been discussed. Currently it just seems it would be the SoC remapping with disabled cores or MCM designs. Epyc more likely. In Vega you'd be stuck disconnecting a memory controller or video block to salvage a chip. CU to CU links might still work, but the documentation suggested Vega had the usual internal crossbar for communication.
 
Not that I know of, apart from that it's apparently every capable GCN generation except for Vega. I have no access to a professional grade non-Vega card atm, that at the same time also provides FP16 precision, be it register packing only.
Speculating here, but it may be part of a greater compiler rewrite. If you look at the LLVM documentation, it wouldn't be unreasonable to unify the drivers around it. That's the basis of Linux drivers which are supposedly shared with Windows. The memory model changed, even for old cards, so AMD may still be messing with it. May not be unrelated to Vega issues, just Vega was prioritized. It's also possible once SM6 hits all these features come back.
 
Speculating here, but it may be part of a greater compiler rewrite. If you look at the LLVM documentation, it wouldn't be unreasonable to unify the drivers around it. That's the basis of Linux drivers which are supposedly shared with Windows. The memory model changed, even for old cards, so AMD may still be messing with it. May not be unrelated to Vega issues, just Vega was prioritized. It's also possible once SM6 hits all these features come back.
Then why would AMD not announce this instead of seemingly randomly disable features at a whim? Makes absolutely no sense to me.
 
The GPU driven approaches I'd imagine are problematic for Nvidia and part of the reason for Volta's changes.
Which changes would those be?

Really depends how it's used. Originally my thinking was Infinity between CUs. In which case it could allow the chip to be reconfigured to create pipelines as have been discussed. Currently it just seems it would be the SoC remapping with disabled cores or MCM designs. Epyc more likely. In Vega you'd be stuck disconnecting a memory controller or video block to salvage a chip. CU to CU links might still work, but the documentation suggested Vega had the usual internal crossbar for communication.
It seems rather specific to a reprogrammable device, where the improvement is the integration of an external memory device via TSV and no longer requiring exclusive chip-wide configuration and execution modes. Encountering an instruction the FPGA has a configuration for would initiate a localized reprogramming, if necessary.
The static configuration of disabled cores, which is generally fuse-blown permanently, doesn't involve much in the way needing to interpret an instruction stream.
 
Then why would AMD not announce this instead of seemingly randomly disable features at a whim? Makes absolutely no sense to me.

Maybe there is a bug and MicroSoft intervened.
Edit: I assume MS runs unit-tests to prove a indicated feature is good. Even for non-WHQL.
Edit2: The drivers ought to be unified, so it's not isolated to a specific chip-line.
 
Then why would AMD not announce this instead of seemingly randomly disable features at a whim? Makes absolutely no sense to me.
Doesn't make sense to me either, but are any games using it currently? I'd still guess it's a temporary thing, and communication hasn't been the greatest of late in regards to drivers. Does explain why they listed a marketing coordinator position for drivers.

Which changes would those be?
Configuring the pipeline within the GPU and however they managed to implement a full heap.

It seems rather specific to a reprogrammable device, where the improvement is the integration of an external memory device via TSV and no longer requiring exclusive chip-wide configuration and execution modes. Encountering an instruction the FPGA has a configuration for would initiate a localized reprogramming, if necessary.
The static configuration of disabled cores, which is generally fuse-blown permanently, doesn't involve much in the way needing to interpret an instruction stream.
No, but Infinity does possess the ability to route the network around congestion. Reading the patent more carefully it does seem a bit more than that with DRAM, SRAM, FPGA, processor, and interposer all stacked within a chip. What processor and logic dies are doing is unknown. Not to mention a cache die for configuration storage. That's a lot of configuring. TSVs surely won't provide enough IO for cache to communicate through either. Maybe FPGA is indicative of replacing SIMDs with systolic arrays? Would explain why all the cache is for configuration and not data.

EDIT. Maybe it's a free standing tensor core? Giant networked array, little need for data cache, and known configurations for dimensions. In a MCM it would make more sense.
 
Last edited:
No, but Infinity does possess the ability to route the network around congestion.
It doesn't interpret program instruction streams, and as a data fabric likely shouldn't.

Reading the patent more carefully it does seem a bit more than that with DRAM, SRAM, FPGA, processor, and interposer all stacked within a chip. What processor and logic dies are doing is unknown. Not to mention a cache die for configuration storage. That's a lot of configuring. TSVs surely won't provide enough IO for cache to communicate through either. Maybe FPGA is indicative of replacing SIMDs with systolic arrays? Would explain why all the cache is for configuration and not data.
I think it's really focused on a run-time configurable FPGA in a die stack. What the DRAM or processor are doing is not particularly important. The various dies are not implementing formerly intra-die or intra-unit functionality, just a processor doing what it always does, DRAM doing what it does, and an FPGA doing FPGA things.
 
Maybe there is a bug and MicroSoft intervened.
Edit: I assume MS runs unit-tests to prove a indicated feature is good. Even for non-WHQL.
Edit2: The drivers ought to be unified, so it's not isolated to a specific chip-line.
Maybe, but unified or not, it's still supported for Vega, even though you could argue that that's the only GPU with real (so to say) FP16 support in AMDs line up.
 
It doesn't interpret program instruction streams, and as a data fabric likely shouldn't.
As I put in my edit, swizzle for tensor cores in a FPGA? Implementing the broadcast networks would make sense. From the context it's some sort of bus routing, but I'd agree Infinity seems too far removed the more I look at it.

Maybe, but unified or not, it's still supported for Vega, even though you could argue that that's the only GPU with real (so to say) FP16 support in AMDs line up.
In the latest LLVM code they somehow ended up with "legacy" FP16 instructions for pre-Vega that I think shared opcodes. Can't recall what the deal was or what was different, but likely the culprit. Only a phone on me, so difficult to search atm.
 
In the latest LLVM code they somehow ended up with "legacy" FP16 instructions for pre-Vega that I think shared opcodes. Can't recall what the deal was or what was different, but likely the culprit. Only a phone on me, so difficult to search atm.
I think in this instance, the latest FP16 functionality slightly modified even the non-packed FP16 semantics.
AMD decided to give the names of the old FP16 instructions to the subset of the new ones they closely resembled.

The pre-existing binary encodings and semantics were re-labeled as legacy, which meant dredging up all affected code references and correcting them. The change notes were somewhat undiplomatic in describing this.
 
Maybe, but unified or not, it's still supported for Vega, even though you could argue that that's the only GPU with real (so to say) FP16 support in AMDs line up.

Vega probably uses an forked driver branch until it's stable. If the branch was merged into main-line it might have broken all other platforms. Just a possibility. (But I think I withdraw the suggestion based on the thought below)

Fiji seems not to have full FP16 support, I haven't seen the driver generate FP16 arithmetic ops, just logical ops. I haven't looked at Polaris assembly. I don't even know if I can now that the flag has been dropped.

What's kind of weird is that the flag might have been allowed in the past as "the driver can eat min16float, so all is fine", and now MicroSoft says "no FP16 arithmetic? no IEEE FP16 standard behaviour? drop the flag please". Could be because results between Vega and the other chips would differ (although in predictable ways). If that's the case I'd assume min16uint could still possible, as the arithmetic can hardly be unprecise, 24 bit int is hardware supported since ages.

Now, I wouldn't excluse the possibility the driver-team actually makes the FP16/32 mixed stuff IEEE compliant, who knows? Packing is still supported, it's an independent intrinsic f16tof32/f32tof16, and it should be safe to assume that the GCN instruction doing that is IEEE conformant.

It might be possible to see if GLSL with lowp/mediump is still producing fp16 code using RGA/CodeXL. Oh BTW, renderdoc has RGA integrated now. Maybe it makes it a lot easier to test this now, no more mad batch files.
 
The current selection I wouldn't call comparable as they follow the traditional models. The GPU driven approaches I'd imagine are problematic for Nvidia and part of the reason for Volta's changes.
Yet there are no DX12 games that don't run on NV cards, and there won't be - ever - or in other words, at least until pascal-generation hardware is entirely obsolete. You don't lock the market leader's cards out of your software. It's ludicrous to think that would ever happen.

Also, compared to pascal, vega statistically doesn't even exist as a GPU yet, and prior AMD hardware is not only also poorly represented in the market, they also lack DX12 features just as pascal does. So it's a little odd to name vega the only 'true' DX12 GPU... :p
 
and prior AMD hardware is not only also poorly represented in the market, they also lack DX12 features just as pascal does. So it's a little odd to name vega the only 'true' DX12 GPU... :p
If any game decided to utilize some IQ enhancing features of DX12, the entire AMD GPU lineup (except Vega) will be left unable to even render such effects, as they lack Raster Order and Conservative Rasterization that are required to render such effects. Unlike Maxwell and Pascal which do support them.
 
In the short term no, but support isn't that difficult and there is an upside of future sales. Ryzen Mobile, and future products, will support RPM with a larger market. Not all that different from Skyrim still being ported to mobile platforms like Switch. Even without RPM, the packed registers are beneficial to a wider range of hardware.

Intel's Gen 9 architecture supports 2x FP16. And they have majority of the market. Doesn't that indicate there's a significant base for supporting half float in games?
 
Also, compared to pascal, vega statistically doesn't even exist as a GPU yet, and prior AMD hardware is not only also poorly represented in the market, they also lack DX12 features just as pascal does. So it's a little odd to name vega the only 'true' DX12 GPU...
Take a closer look at what the features do though. Separate the graphics features from the resource management and scheduling ones. The latter are needed for the foundation of a new engine. Packed math, conservative raster, etc can be worked around as they have for a while. Features that involve "full heap" are a good indicator here and have been supported on AMD for a while. Nvidia only recently enabled some so there is a wider market, but I'd seriously question how well they perform.

Intel's Gen 9 architecture supports 2x FP16. And they have majority of the market. Doesn't that indicate there's a significant base for supporting half float in games?
True, but even with that support, many of those systems aren't up to the task of playing much beyond platformers and some esports if that. The hope would be Ryzen Mobile raising the bar a bit, being more affordable, and pushing Intel to do the same. Get more of those integrated systems closer to midrange, likely through more affordable EDRAM or a single HBM2 stack to get the necessary bandwidth.
 
Back
Top