AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

The bigger question is what are the other tweaks made to the architecture. They quoted "nearly 60% faster" cores, but clocks explain only 25 of those 60% so there's 35 from something else.
 
AMD should be in a much better position for successfully implementing these changes - they have in-house engineers working for several decades on multi-socket x86 processors with a coherent inter-processor link (HyperTransport), they have a working PCIe 4.0/xGMI implementation in Vega 20, they have 128 PCIe links in the EPYC Milan (ZEN3) processor (and PCIe 4.0 too?!), and they lead several industry consortiums that develop cache-coherent non-uniform memory access (ccNUMA) protocols such as Gen-Z, Cache Coherent Interconnect for Accelerators (CCIX), Coherent Accelerator Processor Interface (OpenCAPI) etc...

... and then in the real world, we have the "The AMD Execution Thread" saga.

FYI, Dr. Bradley McCredie, who recently joined AMD as a corporate vice-president, made a presentation at the 2020 Oil and Gas HPC Conference at Rice University and disclosed that future Radeon GPUs will support cache-coherent shared memory.

https://wccftech.com/amd-next-gen-e...u-accelerator-power-el-capitan-supercomputer/
https://www.tomshardware.com/news/amd-infinity-fabric-cpu-to-gpu
 
Last edited:
Just how different these architectures are (and over time, will be) remains to be seen. AMD has briefly mentioned that CDNA is going to have less “graphics-bits”, so it’s likely that these parts will have limited (if any) graphics capabilities, making them quite dissimilar from RDNA GPUs in some ways. So broadly speaking, AMD is now on a path similar to what we’ve seen from other GPU vendors, where compute GPUs are increasingly becoming a distinct class of product, as opposed to repurposed gaming GPUs.

https://www.anandtech.com/show/1559...a-dedicated-gpu-architecture-for-data-centers
 
Are you afraid that this could backfire ? Like, making 2 différents arch in // will stretch RTG ressources too much ?
 
CDNA is essentially a rebranded GCN for now. I fully expect it to switch to the same RDNA base architecture down the road.
These days it might be harder to distinguish what is "new architecture", but RDNA is still very much GCN too and it's referred as "GCN 1.5" (and 1.5.1 for Navi+DLops) at least in certain contextes
 
These days it might be harder to distinguish what is "new architecture", but RDNA is still very much GCN too and it's referred as "GCN 1.5" (and 1.5.1 for Navi+DLops) at least in certain contextes
RDNA is not GCN, it has a completely different execution pipeline. This pipeline isn't any worse for compute than GCN's so I expect AMD to just switch to RDNA base architecture in some "CDNA2" product, maybe with some HPC specific tweaks as well (fast FP64, cut down ROPs and such).
 
Actually it is worse for compute. Changes implemented in RDNA required a lot of transistors, which have no impact on pure compute performace. Vega 20 offers 1,05 TFLOPS per 1B transistors, Navi 10 offers 0,95 TFLOPS per 1B transistors. Despite Navi's higher clocks and despite Vega's support for wider range of precisions, which cost transistors.
 
RDNA is not GCN, it has a completely different execution pipeline. This pipeline isn't any worse for compute than GCN's so I expect AMD to just switch to RDNA base architecture in some "CDNA2" product, maybe with some HPC specific tweaks as well (fast FP64, cut down ROPs and such).
Like I said, it's harder to distinguish what's "new architecture" and what not, as so much DNA of the "old architecture" gets always carried over. It still doesn't change the fact that in some contextes RDNA is still referred as GCN.
Like no-X said, it is worse for compute in terms of transistor budget. CDNA looks to be GCN1.4.x (since it's gfx9xx (908 IIRC)), biggest changes to 1.4.1 (Vega 20) should be removing some graphics related blocks
 
Actually it is worse for compute. Changes implemented in RDNA required a lot of transistors, which have no impact on pure compute performace. Vega 20 offers 1,05 TFLOPS per 1B transistors, Navi 10 offers 0,95 TFLOPS per 1B transistors. Despite Navi's higher clocks and despite Vega's support for wider range of precisions, which cost transistors.

There is more to GPGPU performance than simply the number of TFLOPS on the chip. Otherwise terascale would have been the greatest compute architecture known rather than generally incapable of it. Or take a look at nvidia's diverged designs: their compute focused ones have more SRAM and less TFLOPS per area than gaming. Things like larger and higher bandwidth caches or needing fewer waves to occupy a wavefront bring a larger benefit the more sophisticated a shader is and the less coherent its memory access. And rasterisation has very simple shaders and highly coherent access in the grand scheme of massively parallel algorithms
 
Back
Top