AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

accvgprs were mentioned before, for example in the first of your additional links. In fact, they were mentioned as far back as july 2019.

Thanks for mentioning. I should've gone to Specsavers. :)
I did search for "acc" , but I did it in the 908 target for some odd reason. ¯\_(ツ)_/¯
 
One RDNA 2 bit that interests me is "CPU can cache GPU memory". APU hardware has been claimed to support coherent accesses to pageable system memory (well... these accesses dodge all levels of GPU caches though). So this new claim does sound like enabling CPU cache coherent access to SVM buffers allocated in GPU local memory! That's the missing piece once promised in the good old FSA 2012 roadmap. :D

If you have a device-local buffer, you definitely meant to take advantage of the GPU local bandwidth. But given that these buffers would be cacheable by CPU cores, naively speaking it would need to probe (and be probed by) CPUs and GPU neighbours for all the read/write traffic, alongside with GPU atomics having to work with MDOEFSI states. :runaway:

I figured this might be the root cause of the RDNA-CDNA architecture split in the end. Thinking deeper about it, they would likely have to put in at least an IF Home Coherence Controller to serve neighbour memory requests (and probably GPU system-coherent atomics, if GPU L2 will not cache system coherent lines). Probe filters would have to be enabled for optimal local access bandwidth & energy efficiency, because snooping the entire system of 10 NUMA nodes (2 CPU + 8 GPUs) for all requests is not a sustainiable idea. Moreover, I wouldn't be surprised that they might want to allow GPU L2 to hold system coherent cache lines, e.g. for reducing traffic via write combining. This would then require GPU L2 to either serve probes directly, or have extras like shadow tags to absorb the traffic.

The sad fact is that all these are irrelevant to consumer GPUs for the time being, and hence the split makes sense. No major consumer platform (MSFT/APPL/Android) seems to have an incentive to push heterogeneous computing in consumer/mobile world. XSX/PS5 is likely not touching this either. I can only hope NG consoles in 3-5 years might pick up the torch on the consumer computing front, since they have been an avid fan of APUs. :-|
 
Last edited:
There was a patch on Arcturus talking about something new called ”AccVGPRs”. Previously there has been mentions of AGPRs , but to my knowledge it has never been clarified what the “A” stood for. Is it safe to assume that it stands for “Accelerator”?
The "A" is for Accumulator. It is used for accumulating results during matrix FMA.
 
There's Raja in the architecture day stream talking (with a smile) about still having scars on his back for trying to bring expensive like HBM to gaming at least twice." (timestamp 1:26:48)

I believe Fiji was a pipecleaner for HBM. AMD co-financed and co-developed HBM for years so they had to use it sometime/somewhere to prove the concept, so that's why it was used in Fiji despite the capacity limit. My guess is he's talking about Vega 10 and Kaby Lake G.


As for Vega 10, there's a lot of clues pointing to Raja / RTG planning for the chip to clock a whole lot higher than it ever did. At an average 1750MHz (basically the same as the GP102 with similar size and supposedly similar 16FF process), a full Vega 10 with standard ~1.05V vcore would have been sitting closer to the 1080Ti (like Vega VII does) which at the time sold for higher than $700.
Even their HBM2 clocks came up shorter than they predicted, as Micron edit: SK Hynix (with whom AMD developed HBM and would probably supply them the memory for significantly cheaper than Samsung) couldn't supply standard 2Gbps HBM2 to them, and only Samsung got close at the time.

Had Vega 10 clocked like AMD planned since the beginning, they'd have 64 CUs @ 1750MHz and 512GB/s bandwidth (not to mention some stuff that didn't work out as they planned, like the primitive shaders) with a performance level that would have allowed them to sell the card for over $700. Instead they had to market the card against the GTX 1080, for less than $500, which in turn gave them much lower profit margins.

Of course, shortly after Vega came out, the crypto craze went up, ballooning the prices of every AMD card out there, so in the end it didn't go so bad.


So just to get to my point: I think Raja's mistake was not to implement HBM in consumer cards. It was to implement HBM in consumer cards that failed to meet their performance targets. I guess if Pascal chips had hit a power consumption wall above ~1480MHz, their adoption of GDDR5X would have been considered a mistake as well. Though a lesser one since they could always scratch the GDDR5X versions and use GDDR5 for everything, of course.
It was a problem of implementation cost vs. average selling price of the final product. Apple seems to be pretty content with HBM2 on their exclusive Vega 12 and Navi 12 laptop GPUs, for example.
 
Last edited by a moderator:
Had Vega 10 clocked like AMD planned since the beginning
But Vega20 also clocked like turd even with a shrink.
They've just fucked up.
as Micron (with whom AMD developed HBM and would probably supply them the memory for significantly cheaper than Samsung)
Hynix.
It was Hynix.
Micron did HMC and didn't even enter the HBM race until like last year.
 
I think Raja's mistake was not to implement HBM in consumer cards. It was to implement HBM in consumer cards that failed to meet their performance targets.
Hmm, nope.

HBM gen1 failed horribly due to capacity limit at that time - the Hawaii refresh had 8GB, but shiny the HBM highend got 4GB. Fiji was more like an engineering sample which simply had to be shipped to cover RaD, as you mentioned.

HBM gen2 was IMO also a huge fail, since they bet the whole Vega roadmap on that. Vega 10, was a horrible bottlenecked bugged fireball. Vega 11 (Polaris replacement) got canned completely. Vega 12 was an Apple exclusive. Kaby G got EoLed pretty quickly. Dual-Vega 10 was canned. Vega 10 Nano was canned.

Vega 20 with HBM gen2 allowed AMD to finally refresh their elder HPC offerings. So, I guess, that one wasn't that bad. However, dual-Vega 20 was just an Apple exclusive again...
 
HBM gen2 was IMO also a huge fail
HBM2 is very successful and it's present in over a dozen different products from AMD, nvidia, NEC, Intel and maybe more. All of which with very high profit margins.

If HBM2 had been a huge fail, Intel and Micron wouldn't have scrapped HMC to use and fab HBM2.
 
HBM2 is very successful and it's present in over a dozen different products from AMD, nvidia, NEC, Intel and maybe more. All of which with very high profit margins.

If HBM2 had been a huge fail, Intel and Micron wouldn't have scrapped HMC to use and fab HBM2.
Well, the context was AMD introducing expensive HBM tech to consumer market. Neither nVidia, NEC, nor Intel (besides the very short lived Kaby G) employ HBM in their consumer-oriented products.
 
Back
Top