With Arcturus on the way this never really made sense to me. Arcturus appears to be a huuuuge chip, currently running at 1ghz in a lab apparently but no doubt destined to run much faster by release. And we know Vega is faster per mm and watt than RDNA in terms of compute anyway, so the 2 products seem to overlap with one being clearly better.
Now if it was RDNA 2 I could see it, look at our huge raytracing enabled chip with 96gb HBM2E is a pretty good pitch for VFX houses. You could easily get most scenes that would show up on like, a Netflix or HBO series in RAM, then hardware raytracing gets you 10x or more the render speed. But why there's, rumored at the very least, 2 DL HBM chips coming out in the same year for AMD is something I can't find logic in.
Arcturus (possibly the MI100) from what I've gleaned from various articles and code commits is a compute-oriented product, perhaps HPC-targeted more so than usual.
It has 128 CUs and no graphics command processor. Perhaps there is some hint to what limits GCN's scaling in the removal of the graphics command processor, while scaling compute. Perhaps there's a limit to how much the control logic can directly control for a single graphics context, while a compute device could scale out the number of ACEs with no expectation that they act in concert like a graphics card would.
On top of that, there's an apparently new class of acceleration unit in addition to the vector hardware, perhaps some sort of large matrix multiply unit that might extend the machine learning instructions or general math capabilities for large compute. While I'd need to hunt down the reference, there's some code written to the effect that some portion of the clocking capability for boosting has been disabled, since there's going to be a lot of data movement and highly utilized silicon even at more modest clocks.
For instruction generation in the compiler, there is advice that while it is possible to issue vector instructions in parallel with the new accelerator instructions, it's discouraged due to the likelihood that the chip will throttle.
It will probably aim for lower clocks than prior GPUs, since it's going to have a lot more silicon active over a broad chip.
As for why it is released after RDNA, perhaps there are factors like the HPC contracts AMD is touting that would need the higher peak throughput but with somewhat reduced risk by sticking with a more familiar ISA and base architecture. RDNA has some unappetizing silicon bugs at this point, and its software support for RDNA is not good in compute.
The broader wavefronts and more coarse batch requirements may also be more acceptable for workloads dominated by very large matrix multiplies.
There are elements in RDNA that I would imagine could improve on Vega, but that might not be sufficient until RDNA is more mature.