CDNA2 does supposedly Full rate FP64 and packed FP32 (so some FP32 can run at twice the speed, but not all) and doubles the CU count compared to MI100 (due both chiplets having 128 CU like MI100).FP64 will always be ineffcient. That is not a focus problem it is reality. nVidia solved this with their TensorCores. That is the reason why GA100 delivers 2,3x more performance over GV100 within the same power consumption.
Believing that CNDA2 will be 5x or 6x more effcient than CDNA1 for HPC workload is ridiculous. MI100 only delivers ~7 TFLOPs within 300W. A100 as 250W PCIe is around 12,8TFLOPs with TensorCores:
MI100: https://www.delltechnologies.com/en-us/blog/finer-floating-points-of-accelerating-hpc/
A100: https://infohub.delltechnologies.co...edge-r7525-servers-with-nvidia-a100-gpgpus-1/
All this talk about AMDs 5nm products is so far away from reality. Apples A14 is only 40% more effcient than A12 on 7nm.
AMD also confirmed 128 GB of HBM2e already.