AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Status
Not open for further replies.
Any chance AMD will drop the confusing “dual compute unit” terminology for RDNA2? It seems the two CUs share an L0 instruction cache and scalar data cache but all other resources are CU specific. Not sure those two caches are worth the confusing name.
 
Any chance AMD will drop the confusing “dual compute unit” terminology for RDNA2? It seems the two CUs share an L0 instruction cache and scalar data cache but all other resources are CU specific. Not sure those two caches are worth the confusing name.

Actually they don't share L0. Each WGP can access the double the LDS because it is part of the WGP.
More Info on WGP mode from AMD's RDNA whitepaper.
The RDNA architecture has two modes of operation for the LDS, compute-unit and workgroupprocessor mode, which are controlled by the compiler. The former is designed to match the behavior of the GCN architecture and statically divides the LDS capacity into equal portions between the two pairs of SIMDs. By matching the capacity of the GCN architecture, this mode ensures that existing shaders will run efficiently. However, the work-group processor mode allows using larger allocations of the LDS to boost performance for a single work-group
 
However, the work-group processor mode allows using larger allocations of the LDS to boost performance for a single work-group

I wonder if CU can mix workgroups that use lots of LDS with others that use only a little bit? Probably.
I also wonder how this compares with NV which seems to have more LDS in general (Ampere increased it once more).
 
Considering he's adressed as Devinder throughout the interview, how likely it is he since switched his name and told WCCFtech specifically about it?
People with uncommon names (well, uncommon where they currently live) sometimes goes with more familiar name. When I was working with Koreans some more prominent people were using English names as their first names, because English is more cool or something. Considering it could be confusing for people in his vicinity he may just said something like "You can call me David" and here's that.

May be also because of etymology, some of my English and Americans clients call me Thomas because it would be my name in English. I don't know if it's the case for Devinder.
 
People with uncommon names (well, uncommon where they currently live) sometimes goes with more familiar name. When I was working with Koreans some more prominent people were using English names as their first names, because English is more cool or something. Considering it could be confusing for people in his vicinity he may just said something like "You can call me David" and here's that.

May be also because of etymology, some of my English and Americans clients call me Thomas because it would be my name in English. I don't know if it's the case for Devinder.
I'm aware of some people doing this, even going all official like Jen-Hsun switching to Jensen, but Devinder is and has been Devinder everywhere but the WCCFtech article.
 
There has been only one ACE since early GCN, and there still only is. What's shown as 2 ACEs is a single core with 2x SMT, and each thread polls from a number of queues.

AMDs presentation of that implementation detail is more artistic freedom than anything else.

There is a good description of what it is from @bridgman

- "MEC" (Micro Engine Compute, aka the compute command processor)
Some chips have two MEC's, other parts have only one. So far one MEC (up to 32 queues) seems to be more than enough to keep the shader core fully occupied.
The MEC block has 4 independent threads, referred to as "pipes" in engineering and "ACEs" (Asynchronous Compute Engines) in marketing. One MEC => 4 ACEs, two MECs => 8 ACEs. Each pipe can manage 8 compute queues, or one of the pipes can run HW scheduler microcode which assigns "virtual" queues to queues on the other 3/7 pipes.

https://www.phoronix.com/forums/for...x/856534-amdgpu-questions?p=857850#post857850
 
Last edited by a moderator:
However, the work-group processor mode allows using larger allocations of the LDS to boost performance for a single work-group

I wonder if CU can mix workgroups that use lots of LDS with others that use only a little bit? Probably.
I also wonder how this compares with NV which seems to have more LDS in general (Ampere increased it once more).

That's true for HPC parts but RDNA and "gaming" Turing have the same LDS:ALU ratio of 1KB per FP ALU. They also share the same 4KB LDS per block/workgroup.

Support for higher maximum LDS allocations per workgroup makes sense but Nvidia seems to take a simpler approach. Still not clear to me why a dual CU isn't just a CU with 4 32-wide SIMDs with mode toggles for GCN compatibility. Maybe that's exactly what it is and the terminology is just wonky.

"Turing allows a single thread block to address the full 64 KB of shared memory. To maintain architectural compatibility, static shared memory allocations remain limited to 48 KB, and an explicit opt-in is also required to enable dynamic allocations above this limit."
 
Actually they don't share L0. Each WGP can access the double the LDS because it is part of the WGP.
More Info on WGP mode from AMD's RDNA whitepaper.
They do share the instruction cache just not the vector L0.

However, the work-group processor mode allows using larger allocations of the LDS to boost performance for a single work-group

I wonder if CU can mix workgroups that use lots of LDS with others that use only a little bit? Probably.
Yes
 
Thanks to all for the clarity on the ACE nomenclature and cache structure.

For the more interesting part :oops:

Code:
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
case CHIP_SIENNA_CICHLID:
    adev->gfx.me.num_me = 1;
    adev->gfx.me.num_pipe_per_me = 2;
    adev->gfx.me.num_queue_per_pipe = 1;

What is the main reasoning behind the increase of the GFX pipe to 2. Could the API calls theroretically be dispatched across multiple pipes.

I see the MES also got a bunch of updates for Sienna. It is would be really interesting to see this being put to action hopefully with the upcoming HW scheduling in Windows (seems to be not active based on some reports).
 
Some new "Sienna Cichlid" related commits in radeonSI MESA driver

EDIT: Corrected Phoronix link.

Some interesting stuff:

ac_gpu_info.c
Code:
if (info->chip_class >= GFX10_3)
        info->max_wave64_per_simd = 16;
    else if (info->chip_class == GFX10)
        info->max_wave64_per_simd = 20;
    else if (info->family >= CHIP_POLARIS10 && info->family <= CHIP_VEGAM)
        info->max_wave64_per_simd = 8;
 
Last edited by a moderator:
I CTRL-Fed the RDNA whitepaper.

The fetched instructions are deposited into wavefront controllers. Each SIMD has a separate instruction pointer and a 20-entry wavefront controller, for a total of 80 wavefronts per dual compute unit. Wavefronts can be from a different work-group or kernel, although the dual compute unit maintains 32 work-groups simultaneously. The new wavefront controllers can operate in wave32 or wave64 mode.
 
Another interesting bit.
PHP:
if (ASICREV_IS_SIENNA_M(chipRevision))
            {
                m_settings.supportRbPlus   = 1;
                m_settings.dccUnsup3DSwDis = 0;
            }
I figured RB could mean a rendering backend.
 
Yeah, that's my conclusion as well. I guess 20 was too much for Navi, so they've shrank it to shave off some transistors.

Or they’ve managed to reduce pipeline latency and/or increase ILP such that 16 wavefronts per SIMD is enough to hide typical latencies.

For reference Turing allocates 8 wavefronts per SIMD down from 16 in Volta/Pascal. Ampere is back up to 16.
 
Last edited:
Does the M in ASICREV_IS_SIENNA_M mean that it will be a mobile part?
I haven't even notice it.

Code:
#define ASICREV_IS_VEGA10_M(r)         ASICREV_IS(r, VEGA10)
#define ASICREV_IS_VEGA10_P(r)         ASICREV_IS(r, VEGA10)

Vega 10 has never been released as a mobile part, right? There are some chips with V. I really can't find logic behind it, maybe Value, Mid and Performance?

Edit: oh, and Vega M is apparently P.
Code:
#define ASICREV_IS_VEGAM_P(r)          ASICREV_IS(r, VEGAM)
 
I haven't even notice it.

Code:
#define ASICREV_IS_VEGA10_M(r)         ASICREV_IS(r, VEGA10)
#define ASICREV_IS_VEGA10_P(r)         ASICREV_IS(r, VEGA10)

Vega 10 has never been released as a mobile part, right? There are some chips with V. I really can't find logic behind it, maybe Value, Mid and Performance?

Edit: oh, and Vega M is apparently P.
Code:
#define ASICREV_IS_VEGAM_P(r)          ASICREV_IS(r, VEGAM)

I found some old tweet by komachi which says that M stands for "Mainstream" , unless things have changed. But it could indicate that Sienna Cichlid may not be the "Big Navi" that we're looking for. ¯\_(ツ)_/¯
 
Status
Not open for further replies.
Back
Top