AMD CDNA Discussion Thread

Is it 1:1 or 2:1 FP32?

“Inside the AMD Instinct MI200 is an Aldebaran GPU featuring two dies, a secondary and a primary. It has two dies with each consisting of 8 shader engines for a total of 16 SE's. Each Shader Engine packs 16 CUs with full-rate FP64, packed FP32 & a 2nd Generation Matrix Engine for FP16 & BF16 operations.”
It's 1:1 with support for packed FP32, which delivers 2:1 when the math allows it (apparently most of the time)
 
Glad to see they're completely removing the "GPU" acronym from their Instinct series of products and announcements.
Because they've gutted out all the GPU parts besides the very lobotomized VCN instance.
They're genuine cray-cray SIMD machines now.
 
No, all kinda of NIC MSS data are wonky as hell.
But Cray using souped up Ethernet for their newest shiniest HPC offering tells a lot.

It looks like Ethernet also accounts for most of Mellanox's revenue and they're pushing Ethernet into the higher speed tiers. No idea if that's relevant for flagship projects like El Capitan.
Because they've gutted out all the GPU parts besides the very lobotomized VCN instance.
They're genuine cray-cray SIMD machines now.

No rasterizers, texture units or ROPs in sight. It’s a big fat FP co-processor.
 
It looks like Ethernet also accounts for most of Mellanox's revenue and they're pushing Ethernet into the higher speed tiers.
Yessssss.
Ethernet truly won.
Infiniband flavours not Mellanox were already killed by it, Cray proprietary stuff died in favour of an Ethernet superset thus only the last bastion remains now.
No idea if that's relevant for flagship projects like El Capitan.
Those things have Cray as their prime contractor so interconnect options not Slingshot aren't available.
 
For real.

I wonder how Nanite would fare on CDNA. Still a good amount of texture work and framebuffer blending involved I imagine.

For texture sampling, we could avoid emulating logic behind texture compression to lower computational cost but it would still be slow. Nanite has already solved the blending problem since it bans translucency altogether so the traditional restrictions behind the graphics pipeline following the API primitive order rule doesn't apply which makes Nanite's graphics pipeline by design more parallel ...
 
Back
Top