AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

It's not. Confusingly AMD GPU 'codenames' (based on some sort of internal ordering) and product names (based on CU count) overlap.
Not sure if I understand your post correctly.
My post was indeed referring to the slide CSI PC posted above, and if you're saying IGP in RR isn't Vega 11, AMD's James Prior has directly referred to the GPU in RR as Vega 11
 
So, the 7nm Vega 20 has 32GB HBM2 with (edit) 1.28 TB/s bandwidth, afterall.

U1W9Vgr.jpg


https://wccftech.com/amd-7nm-vega-20-32gb-gpu-3dmark-benchmark-leaked-up-to/
 
Last edited:
That's A LOT of megabytes/sec... Wow. My first (2D, VGA) videocard for the PC had 1.2GB/s IIRC, and still it noticably lost speed at 1600*1280 rez. (It also had shitty analog circuitry which made image fuzzy at such a high rez, and a 19"/17" effective CRT wasn't ideal either, so I generally settled quite a bit below that.)
 
Vega 11 is also the marketing name for Raven Ridge's iGPU when fully enabled. I'm aware of James Prior's statements, but they could be a curve ball for Vega M.
Maybe AMD changed Vega 11's codename into Vega 12, since that's the codename appearing in the new drivers now, along Vega 20.

Possibly, those slides were very accurate. For the CPU slides they were like as if someone was emailed these slides from AMD directly. :)
For the GPU indeed missing the Vega 10 x2 and the Vega 11 code names (not marketing name)

So, the 7nm Vega 20 has 32GB HBM2 with (edit) 1.28 TB/s bandwidth, afterall.

U1W9Vgr.jpg


https://wccftech.com/amd-7nm-vega-20-32gb-gpu-3dmark-benchmark-leaked-up-to/

Indeed, it is also a first for the amdgpu driver, they patched the code to allow it to make those huge DMA requests to feed the 32 GB data from memory.

Code:
Don't enforce DMA32 limits. They don't matter for AMD GPUs that can
access at least 40 bits of physical address space.
+ /*if (mem <= ((uint64_t) 1ULL << 32)) {*/
+ if (1) {

Also the Vega 20 has many new 16 bit math instructions for vector manipulation suitable for ML.

However,
Could anyone share some light on the HW fixes in Vega 12.
From the patch notes I read Vega 12 (and possibly Vega 20) has some changes in tiling calculations, and additional fixes for HW quirks.
 
So, the 7nm Vega 20 has 32GB HBM2 with (edit) 1.28 TB/s bandwidth, afterall.
Provided 4 stacks of the latest HBM2 that's not overly surprising. Wouldn't be surprising to see a refresh of Vega10 with a bandwidth bump either. At least for a highly binned chip for prosumer/enthusiast. Should just be a matter of using the newer memory chip.

However,
Could anyone share some light on the HW fixes in Vega 12.
From the patch notes I read Vega 12 (and possibly Vega 20) has some changes in tiling calculations, and additional fixes for HW quirks.
That doesn't seem overly significant. The number of memory channels is simply different resulting in different tiling calculations. Vega 20 having 4 stacks of HBM, Vega 10 having 2 stacks, Vega 12 possibly a single stack along with the KabyG design.
 
Isn't 1250MHz even above Samsung's and SK Hynix's currently highest HBM2 clocks?

Were they using the memory at 1200MHz I'd say that rumors were very believable, but that 50MHz "factory overclock" seems rather worthless and makes me suspicious.
It's not like the Vega 64 is starving for bandwidth at 945MHz, and unless Vega 20's core is clocking north of 2GHz I don't really see why AMD would need to clock the HBM2 chip above their standard value.

Unless FP64 is super demanding on bandwidth...
 
Isn't 1250MHz even above Samsung's and SK Hynix's currently highest HBM2 clocks?

Were they using the memory at 1200MHz I'd say that rumors were very believable, but that 50MHz "factory overclock" seems rather worthless and makes me suspicious.
It's not like the Vega 64 is starving for bandwidth at 945MHz, and unless Vega 20's core is clocking north of 2GHz I don't really see why AMD would need to clock the HBM2 chip above their standard value.

Unless FP64 is super demanding on bandwidth...
Manufacturer availability != General availability or mass production.
AMD also had 8 GB HBM2s when supposedly no-one had ever made anything but 4 GB stacks
 
It could just be unannounced memory, or a misreading by the benchmark. And while Vega 64 isn't bandwidth-limited in games, other applications can behave very differently.
Manufacturer availability != General availability or mass production.
AMD also had 8 GB HBM2s when supposedly no-one had ever made anything but 4 GB stacks

That's true, but what I'm questioning is the need to have 1.28TB/s bandwidth.

Even the V100 with 15 TFLOPs FP32 / 7.5 TFLOPs FP64 apparently uses 1Gbps HBM2 downclocked to 875Mbps, for a total of 900GB/s.
A 64 CUs Vega 20 will need to clock at 1.8GHz to even reach V100's SP/DP throughput, so why would Vega 20 need over 40% more bandwidth?
 
because it runs at 2.6 der /s

That said apparently the 7nm GPU tapeouts and initial silicon are very pleasing according to Lisa at the annual stock holders meeting.
 
Since the target audience for this Vega 20 is different from that of RX Vega 64, in my opinion it might even be a neccessity.

Training an ML model with several thousands of features and millions of datasets even with SGD would need to pump in sample data at enormous rate. Of course, the more complicated the calculation for the weights the longer it takes to consume datasets, and the BW might not be used efficiently.
But new dedicated ML instructions which takes multi level matrices being operated at once, it could be that AMD discovered to feed those cores with multi level matrices each holding considerable amount of fp data would necessitate a BW increase. (Hence we also hear those insane 1K+ Tensor TFlops )

Some models simply respond linearly to core count and speed and keeping those cores fed with data of course decreases train time. I know every model, features, training cost calculation etc are different so definitely it is on a model by model basis.
 
So I was digging the code around for the Vega 20 Branch and the Vega 12 (which is now upstreamed)

Vega 20 adds support for Zero Frame Buffer (which I dont know what it does), Emulation Mode, new Data Fabric.
However I wish the DKMS present in this branch would be mainstreamed, because we could use any Linux version or distro and load the Kernel Module without upgrading the distro or wait for a new release.
The code looks far from prime time, looks like a HW bring up driver.

On the Vega 12 side I saw
It has a new SMU MicroCode
It has 16 DPM levels compared to 8 of Vega 10
It has a simplified thermal management and no LED control unlike Vega 10
It still has fan rpm control, which might suggest it is also meant for desktop
Funny thing is getting rid of the complicated Fan and LED control which is inherited from Vega 10 took out so much of the PP code :LOL:

Interestingly
I found this line in vega12_processpptables.c
#define VEGA12_ENGINECLOCK_HARDMAX 198000
Engine Clock hard Limit is 198000/100 Mhz = 1.98 GHz
This value is used in the overdrive engineClock limit calculation. It is read via SMC for Vega 10 and others, probably it is temporary.

From my experience with the code on the RR and Vega 10 last year on how it eventually matches the HW, I would say it is most likely indicative of their target.
It must have escaped Alex Deucher or someone because this is completely a new file and they probably didn't bother to check line by line. Usually on announcement the definitive set of patches are pushed.

Vega 12 with the new Aquabolt and higher clocks are perfect recipes to kick another round of Vega dust ;)
 
SUPER SINGLE INSTRUCTION MULTIPLE DATA (SUPER-SIMD) FOR GRAPHICS PROCESSING UNIT (GPU) COMPUTING
A super single instruction, multiple data (SIMD) computing structure and a method of executing instructions in the super-SIMD is disclosed. The super-SIMD structure is capable of executing more than one instruction from a single or multiple thread and includes a plurality of vector general purpose registers (VGPRs), a first arithmetic logic unit (ALU), the first ALU coupled to the plurality of VGPRs, a second ALU, the second ALU coupled to the plurality of VGPRs, and a destination cache (Do$) that is coupled via bypass and forwarding logic to the first ALU, the second ALU and receiving an output of the first ALU and the second ALU. The Do$ holds multiple instructions results to extend an operand by-pass network to save read and write transactions power. A compute unit (CU) and a small CU including a plurality of super-SIMDs are also disclosed.

No idea if these are Vega or not, but currently published so running with it.
 
Interestingly
I found this line in vega12_processpptables.c
#define VEGA12_ENGINECLOCK_HARDMAX 198000
Engine Clock hard Limit is 198000/100 Mhz = 1.98 GHz
This value is used in the overdrive engineClock limit calculation. It is read via SMC for Vega 10 and others, probably it is temporary.

From my experience with the code on the RR and Vega 10 last year on how it eventually matches the HW, I would say it is most likely indicative of their target.

What is this value for Vega 10 and Raven Ridge?

I'm starting to wonder if Vega 12 isn't a big Vega refresh with higher clocks all around (due to hardware debugging and 12LP) instead of Vega M.


SUPER SINGLE INSTRUCTION MULTIPLE DATA (SUPER-SIMD) FOR GRAPHICS PROCESSING UNIT (GPU) COMPUTING


No idea if these are Vega or not, but currently published so running with it.
Sounds like post-Navi..
 
What's the difference with VLIW? I'm not knowledgeable at all, but the description sounds a lot like what I always thought that VLIW is.
 
Back
Top