AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Status
Not open for further replies.
With Arcturus on the way this never really made sense to me. Arcturus appears to be a huuuuge chip, currently running at 1ghz in a lab apparently but no doubt destined to run much faster by release. And we know Vega is faster per mm and watt than RDNA in terms of compute anyway, so the 2 products seem to overlap with one being clearly better.

Now if it was RDNA 2 I could see it, look at our huge raytracing enabled chip with 96gb HBM2E is a pretty good pitch for VFX houses. You could easily get most scenes that would show up on like, a Netflix or HBO series in RAM, then hardware raytracing gets you 10x or more the render speed. But why there's, rumored at the very least, 2 DL HBM chips coming out in the same year for AMD is something I can't find logic in.
There's no overlap, Arcturus is in completely different league with it's 128 CUs compared to Navi 12's 40 (20 dual).
It could be built for specific customers needs too, just like Vega 12 was tailored for Apple
 
With Arcturus on the way this never really made sense to me. Arcturus appears to be a huuuuge chip, currently running at 1ghz in a lab apparently but no doubt destined to run much faster by release. And we know Vega is faster per mm and watt than RDNA in terms of compute anyway, so the 2 products seem to overlap with one being clearly better.

Now if it was RDNA 2 I could see it, look at our huge raytracing enabled chip with 96gb HBM2E is a pretty good pitch for VFX houses. You could easily get most scenes that would show up on like, a Netflix or HBO series in RAM, then hardware raytracing gets you 10x or more the render speed. But why there's, rumored at the very least, 2 DL HBM chips coming out in the same year for AMD is something I can't find logic in.
Arcturus (possibly the MI100) from what I've gleaned from various articles and code commits is a compute-oriented product, perhaps HPC-targeted more so than usual.
It has 128 CUs and no graphics command processor. Perhaps there is some hint to what limits GCN's scaling in the removal of the graphics command processor, while scaling compute. Perhaps there's a limit to how much the control logic can directly control for a single graphics context, while a compute device could scale out the number of ACEs with no expectation that they act in concert like a graphics card would.

On top of that, there's an apparently new class of acceleration unit in addition to the vector hardware, perhaps some sort of large matrix multiply unit that might extend the machine learning instructions or general math capabilities for large compute. While I'd need to hunt down the reference, there's some code written to the effect that some portion of the clocking capability for boosting has been disabled, since there's going to be a lot of data movement and highly utilized silicon even at more modest clocks.
For instruction generation in the compiler, there is advice that while it is possible to issue vector instructions in parallel with the new accelerator instructions, it's discouraged due to the likelihood that the chip will throttle.

It will probably aim for lower clocks than prior GPUs, since it's going to have a lot more silicon active over a broad chip.
As for why it is released after RDNA, perhaps there are factors like the HPC contracts AMD is touting that would need the higher peak throughput but with somewhat reduced risk by sticking with a more familiar ISA and base architecture. RDNA has some unappetizing silicon bugs at this point, and its software support for RDNA is not good in compute.
The broader wavefronts and more coarse batch requirements may also be more acceptable for workloads dominated by very large matrix multiplies.

There are elements in RDNA that I would imagine could improve on Vega, but that might not be sufficient until RDNA is more mature.
 
With 25 TFLOPS (FP32) basically required for MI100 and 128 CUs à 64 ALUs given, Arcturus would need to clock slightly north of 1.5 GHz, not unusually low for a Vega-GPU.
 
Will Arcturus have half rate FP64?

Regardless, perhaps all Arcturus discussions should be in the Vega thread and not this one.
 
https://github.com/CLRX/CLRX-mirror/commit/a4c9fdfd191eda8fb206debe778dc9130caa3545

Navi 12 pieces are starting to fall on their places.
Navi 12 is GCN1.5.1 or "Navi 1.1", it's similar upgrade to Navi as Vega 20 (GCN1.4.1/5.1) was over Vega 10 (GCN1.4/5), adding support for Deep Learning instructions.
It also switches GDDR6 for 2048bit HBM2, but CU count is same as Navi 10 at 40 "old CUs" or 20 "Dual CUs".

Navi 12 is looking stranger each time there's news about it.
On one hand it's using two HBM2E stacks, so we should expect around 820GB/s of bandwidth from it, which is >80% wider than Navi 10's 256bit 14Gbps GDDR6. On the other hand, it's still a relatively narrow GPU with only 20 WGPs like Navi 10.
And then there are those tests showing very slow core clocks at 1.15GHz.

If this was using a single HBM2E stack, then I'd say we were looking at Vega 12's successor for Macbooks, with a single HBM2E stack offering considerably better performance even with the low core clocks.
With 2x HBM2E stacks this will be a bandwidth monster glued to a relatively tiny GPU that is moreover clocked really low.

Can HBM2E clock significantly lower (e.g. lower than 2.4Gbps/pin) to enable significantly lower voltages? Otherwise Navi 12 only makes sense if it's clocked at astronomical and unforeseen speeds like 2.5GHz, but that would make Big Navi rather redundant.
 
Navi 12 is looking stranger each time there's news about it.
On one hand it's using two HBM2E stacks, so we should expect around 820GB/s of bandwidth from it, which is >80% wider than Navi 10's 256bit 14Gbps GDDR6. On the other hand, it's still a relatively narrow GPU with only 20 WGPs like Navi 10.
And then there are those tests showing very slow core clocks at 1.15GHz.

If this was using a single HBM2E stack, then I'd say we were looking at Vega 12's successor for Macbooks, with a single HBM2E stack offering considerably better performance even with the low core clocks.
With 2x HBM2E stacks this will be a bandwidth monster glued to a relatively tiny GPU that is moreover clocked really low.

Can HBM2E clock significantly lower (e.g. lower than 2.4Gbps/pin) to enable significantly lower voltages? Otherwise Navi 12 only makes sense if it's clocked at astronomical and unforeseen speeds like 2.5GHz, but that would make Big Navi rather redundant.
The ES-boards have been with 2 Gbps and 2.4 Gbps HBM2e, so not quite 820GB/s, only 512 - 614 GB/s (of course this doesn't mean the final couldn't be higher, that ES board was a mere 200W board)

late edit:
My memory is short but bad, but wasn't there talk about the Navi with DLops already right after Navi 10 launch?
 
Last edited:
The GFX1011 "NaviDL" device listed in that github commit looks to be associated with the older commit for GFX1011 AND GFX1012: https://github.com/llvm-mirror/llvm/commit/eaed96ae3e5c8a17350821ae39318c70200adaf0.
That brings various dot product instructions into GFX10, whereas there is a slightly differently numbered set for Vega 20.

GFX1011 is also the Navi version that doesn't have the FeatureLdsMisalignedBug flag, but lists all the other bugs that might have been called teething pains for GFX10.
This family variant has the FeatureDoesNotSupportXNACK flag, which is present for all non-APU products.

GFX1011 does have a smattering of error strings related to BVH instructions, perhaps as errors in their use or some kind of ISA conflict with an unspecified nearby variant with BVH instructions.
 
So how big does AMD need to go, with "big navi" ? Or what die size makes sense for rdna2 ? What size is navi12 rumored to be at, how much bigger than Vega20 @ 331mm^2 ?

What does "bigger" mean in terms of what navi needs more of ? I think for games you need more TMUs and ROPS, etc. I don't know anymore, rdna changes things up and is about feeding the engine unfettered. rdna2 is the full architecture and is said to be more efficient at crunching games. I think we all expect this, but to what degree..? How far advanced is rdna2 ? (25% uplift in architecture alone rdna1 to rdna2 ?) ??

Seems like rdna2 is catering to dx12 and Vulkan and will have a robust front end, sitting on new fabric. And a few patents ?



For argument sake, if you add +50% to Navi 10s' 252mm^2 size (area), you get about 380mm^2. What are we looking at with 7nm+ (w/HBM2e) .?

50% more CUs ??
ROPS ?
TMUs ?
 
So how big does AMD need to go, with "big navi" ? Or what die size makes sense for rdna2 ? What size is navi12 rumored to be at, how much bigger than Vega20 @ 331mm^2 ?
Navi 12 should be really similar in size to Navi 10, since they're essentially the same chip minus memory controller and added DL ops in 12.
 
hm... the main core area of Navi 10 is in the 150mm^2 range. The 4x64-bit MCs are about 50mm^2 altogether. The uncore stuff is about 50mm^2.

I guess if they just did a naive doubling of everything (512-bit, 40WGP, 128ROPs, 4SE) then the die size would be in the 450mm^2 range :?:
 
From what I've heard about how difficult GDDR6 traces can be, 512bit of that memory could be very hard to achieve.

Besides, Big Navi should be getting into the price point where HBM2E is worth implementing, especially with the clock speeds and memory density attainable by the newly produced stacks from Samsung and SK Hynix. They could get up to 48GB and 920GB/s on just 2 stacks.
 
Besides, Intel's 500W Xe needs some competition.
Unfortunately AMD has no plans for 500W boards so far.
Arcturus is 8k ALU@300W.
I doubt they'll be doing 2 "big chips"
Oh you should never doubt her majesty.
HBM2E is worth implementing, especially with the clock speeds and memory density attainable by the newly produced stacks from Samsung and SK Hynix
I have some baaaaaaaad~ news for you.
 
What DRAM vendors tell you is total bullshit.
The fastest and densest shit you're getting this year is 8-Hi@2.4Gbps.
Kinda low-key fucks every acc vendor on the market, but it can't be helped.
Where are you getting this info from? Do you have any sources for that?


I don't have anything against Big Navi using GDDR6, especially with speeds reaching 18Gbps in the near future (or is that a lie too?).
I just pointed out the newest HBM2E spec announcements as good opportunities for a high-end graphics card.
 
Status
Not open for further replies.
Back
Top