AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Status
Not open for further replies.
A couple random things I came across either in the ISA doc or elsewhere.
There's a few null image formats added that allow pixel shaders to execute without actually writing out pixels. In the ISA doc, there's mention of a mode setting that treats exports like NOPs, which seems related. If there's some pre-processing or side effects desired from a graphics shader, this can apparently allow them to continue without actually writing out pixel data.
I forget at this point whether there's a bug flag related to export and having the EXEC mask set to zero, there seems to be a decent number of such bugs now with Wave32/64 and the various modes.

Another item of comparison between Vega and RDNA is the presence or non-presence of certain message types for s_sendmsg. Vega's ISA guide mentions the existence of two message types related to primitives.
"Early Prim Dealloc" and "GS alloc req", though the guide doesn't delve into the use cases.
However, RDNA mentions "GS alloc req" as being necessary for primitive shaders to reserve output buffer space before they can export their results.
For some reason, the "Early Prim Dealloc" message is missing from the version of the RDNA guide we have, though its number in the table is unused. Whether this had something to do with primitive shaders as well, and whether its absence can explain why some of the numbers for Navi's primitive shaders don't match Vega's marketing isn't clear.

Vega and RDNA also took out message type 1, which is an interrupt message documented in earlier guides.
 
I really liked that article, it's a great way to understand the basics of GPU architecture while explaining the differences of both solutions.

But they should have compared regular 2070 (2304 S, 3 GPCs) against 5700 (2304 S, 2SE) as the 2070 super is a cut down 2080 with the advantage of a much higher number of 'engines' (here 5 or 6 GPCs).

I think the next big Navi card with hopefully a higher number of SE (probably 4) will be more interesting to compare against nvidia 2080 and their 5 or 6 GPCs.
 
Last edited:
It seems to be a presentation almost fully dedicated to the workgroup processors and the TMUs that are attached to each.
There's practically nothing on the ROPs, multimedia and display engines either.

It seems to be more a general guide for programmers wanting to do low level optimization/best practices. Probably have to wait till Hot Chips for a more thorough, more architecture focused overview.
 
But they should have compared regular 2070 (2304 S, 3 GPCs) against 5700 (2304 S, 2SE) as the 2070 super is a cut down 2080 with the advantage of a much higher number of 'engines' (here 5 or 6 GPCs).
I agree comparisons at this point are great talking points, but ideally the best comparison would be on the same node.
I thought the article was very well done in terms of explaining subtle differences of similar architectural concepts in a very understandable way.
 
Last edited by a moderator:

I think there's a mistake in the article:

"AMD and Nvidia take a markedly different approach to their unified shader units, even though a lot of the terminology used seems to be the same. Nvidia's execution units (CUDA cores) are scalar in nature -- that means one unit carries out one math operation on one data component; by contrast, AMD's units (Stream Processors) work on vectors -- one operation on multiple data components. For scalar operations, they have a single dedicated unit."

AMD is scalar as well since GCN - all modern GPUs are 'scalar' AFAIK since many years. (Assuming the term means adding vec4+vec4 results in 4 add ops)
 
I think there's a mistake in the article:
...
AMD is scalar as well since GCN - all modern GPUs are 'scalar' AFAIK since many years. (Assuming the term means adding vec4+vec4 results in 4 add ops)

Right... both are scalar SIMD.

We call the SIMD unit "vector" to distinguish it from the scalar non-SIMD processor, but each ALU in the SIMD is working on a different thread.
 
Given what we know of RDNA now, how does it scale..?

What are the flexible parts of the new RDNA architecture? I am thinking in supposition of how big (or small) Navi's RDNA architecture can go..? All these rumors of "big-navi" and then another chip called "Nvidia killer"..?


How is AMD going to incorporate more features (ie: ray tracing engine) into RNDA2 architecturally..? Can AMD do that in chiplet form? And just use a fast L2, or IF2.0 bus... feeding the RDNA machine.
 
How is AMD going to incorporate more features (ie: ray tracing engine) into RNDA2 architecturally..? Can AMD do that in chiplet form? And just use a fast L2, or IF2.0 bus... feeding the RDNA machine.
Going by their patents, they're going to do RT a bit different from NVIDIA and the accelerator parts will be incorporated into the TMUs
 
Maybe

Big Navi is probably now Navi 12 - and still RDNA1

Navi 20,21, 23(?) including "Nvidia Killer" is probably RDNA2 and includes hardware RT.

I'm thinking Xbox Scarlett is a Navi 10 class GPU in terms of size with some RDNA2 features including hardware RT.
 
This might be a stupid question that has been answered before, but is there any hint that AMD has an equivalent of nvlink or nvswitch coming out? The pcie4 is nice, but it's nowhere near the bandwidth of nvswitch.
 
This might be a stupid question that has been answered before, but is there any hint that AMD has an equivalent of nvlink or nvswitch coming out? The pcie4 is nice, but it's nowhere near the bandwidth of nvswitch.

What use cases needs more than 64 GB/s (32 each direction)?

PCIExpress 5.0 is close and would provide 128 GB/s bandwidth (64 each direction).
PCIExpress 6 would double bandwidth again and is targeted for 2021.
 
What use cases needs more than 64 GB/s (32 each direction)?

PCIExpress 5.0 is close and would provide 128 GB/s bandwidth (64 each direction).
PCIExpress 6 would double bandwidth again and is targeted for 2021.

Lots of things. That's why nvidia makes the dgx-2. And to be clear pcie5 is no close. After the spec is finalized it will take at least 3 years before we see it in servers most likely.

Vega20 already has a pair of 100GB/s IF links.

That's 1/3 the speed of nvswitch, and only a ring topology, so it's not really competitive. It wasn't clear if AMD is even trying to compete with that market.
 
Lots of things. That's why nvidia makes the dgx-2. And to be clear pcie5 is no close.

So you dont know and/or wont list them. Must not be important at all then.

After the spec is finalized it will take at least 3 years before we see it in servers most likely.

PCIExpress 4 was officially finalized in 2017 June, and we have chipsets in 2019 July. That's far less than your at least 3 years.

PCIExpress 5 was officially finalized in 2019 May. I suspect we will have chipsets before 2021 June.
 
So you dont know and/or wont list them. Must not be important at all then.



PCIExpress 4 was officially finalized in 2017 June, and we have chipsets in 2019 July. That's far less than your at least 3 years.

PCIExpress 5 was officially finalized in 2019 May. I suspect we will have chipsets before 2021 June.

Deep learning is heavily bound by inter-GPU communications. That's why infiniband was so popular leading up to Ethernet matching the speeds. Debating whether inter-gpu transfer speeds are the bottleneck is like dismissing all high-speed interconnect improvements, including pcie. It's also equivalent to saying that a single GPU's memory is large enough for any application. It's not.

Until interconnect speed matches that of the HBM speed, it's not fast enough.

Intel still does not have a pcie-4 compatible chip. Since Nvidia, the market leader by far, also only supports pcie3, means the lag to adoption is longer than you allude to. The fact is that memory speeds are increasing faster than the interconnect, so AMD is already making a proprietary interconnect to solve it. But it's not very fast, relatively speaking.
 
Lots of things. That's why nvidia makes the dgx-2. And to be clear pcie5 is no close. After the spec is finalized it will take at least 3 years before we see it in servers most likely.
Intels Sapphire Rapids, scheduled for Q1/21, will have PCIe5
 
Status
Not open for further replies.
Back
Top