AMD CDNA Discussion Thread

There seems to be 512 VGPRs compared to GCN/RDNA's 256. They probably merged the VGPRs and AccVGPRs together, given existence of ACCUM_OFFSET.

It's like to make doubles a first class citizen, which use 2x VGPRs.
The registers typically hold single-precision (32-bit) floating-point (FP) data, but are also designed for efficiently handling mixed precision. For larger 64-bit (or double precision) FP data, adjacent registers are combined to hold a full wavefront of data.
 
It's like to make doubles a first class citizen, which use 2x VGPRs.
Yea, this would also make sense.

Samples of MI200 already exist. Apparently, these are used in HPE Cray EX class of supercomputers. It's interesting that the samples are already being labeled as "MCM". So is it a doubled MI100 or a CDNA2-based MCM-native chip?

Also the "HPC customized" Zen 3-based CPU - Trento - is being paired with the MI200.
 
IMHLO: Nothing is a hard as graphics on a competitively interactive scale.

With basically everything else done in GPU, you can mask or afford some kind of latency or uneven distribution of workloads. Even the much-cited recommender systems only have to satisfy a single measure of latency once in while (in terms of GPU timings), not continously deliver single-digit-ms latency to a single user.

edit: Yeah, that's a big THANK YOU to all hard- and software engineers making those chips so we can enjoy our favourite games on them to the fullest.
 
Last edited:
Being a layman, would it be safe to assume that the scientific workloads (most likely on an MI chip) makes it a better "testing ground" for MCM before progressing to consumer and graphics oriented GPUs?
To my understanding those scientific workloads rarely even care whether there's 1 or more chips doing the calculations, so yes and no - Yes as in, it's easy testing ground for MCM since workloads don't care and No, it's not that useful testing ground because games do care.
 
Being a layman, would it be safe to assume that the scientific workloads (most likely on an MI chip) makes it a better "testing ground" for MCM before progressing to consumer and graphics oriented GPUs?
In a sense that a pure compute workload is a lot easier to put on as many processing nodes as possible than graphics yes. But this isn't really related to graphics much so it's not so much of a "testing ground" from s/w perspective - it is a testing ground for h/w though and possibly a testing ground for h/w approaches which would make MCM suitable for graphics.
 
Back
Top