Any chance AMD will drop the confusing “dual compute unit” terminology for RDNA2? It seems the two CUs share an L0 instruction cache and scalar data cache but all other resources are CU specific. Not sure those two caches are worth the confusing name.
The RDNA architecture has two modes of operation for the LDS, compute-unit and workgroupprocessor mode, which are controlled by the compiler. The former is designed to match the behavior of the GCN architecture and statically divides the LDS capacity into equal portions between the two pairs of SIMDs. By matching the capacity of the GCN architecture, this mode ensures that existing shaders will run efficiently. However, the work-group processor mode allows using larger allocations of the LDS to boost performance for a single work-group
People with uncommon names (well, uncommon where they currently live) sometimes goes with more familiar name. When I was working with Koreans some more prominent people were using English names as their first names, because English is more cool or something. Considering it could be confusing for people in his vicinity he may just said something like "You can call me David" and here's that.Considering he's adressed as Devinder throughout the interview, how likely it is he since switched his name and told WCCFtech specifically about it?
I'm aware of some people doing this, even going all official like Jen-Hsun switching to Jensen, but Devinder is and has been Devinder everywhere but the WCCFtech article.People with uncommon names (well, uncommon where they currently live) sometimes goes with more familiar name. When I was working with Koreans some more prominent people were using English names as their first names, because English is more cool or something. Considering it could be confusing for people in his vicinity he may just said something like "You can call me David" and here's that.
May be also because of etymology, some of my English and Americans clients call me Thomas because it would be my name in English. I don't know if it's the case for Devinder.
There has been only one ACE since early GCN, and there still only is. What's shown as 2 ACEs is a single core with 2x SMT, and each thread polls from a number of queues.
AMDs presentation of that implementation detail is more artistic freedom than anything else.
- "MEC" (Micro Engine Compute, aka the compute command processor)
Some chips have two MEC's, other parts have only one. So far one MEC (up to 32 queues) seems to be more than enough to keep the shader core fully occupied.
The MEC block has 4 independent threads, referred to as "pipes" in engineering and "ACEs" (Asynchronous Compute Engines) in marketing. One MEC => 4 ACEs, two MECs => 8 ACEs. Each pipe can manage 8 compute queues, or one of the pipes can run HW scheduler microcode which assigns "virtual" queues to queues on the other 3/7 pipes.
I'm aware of some people doing this, even going all official like Jen-Hsun switching to Jensen, but Devinder is and has been Devinder everywhere but the WCCFtech article.
However, the work-group processor mode allows using larger allocations of the LDS to boost performance for a single work-group
I wonder if CU can mix workgroups that use lots of LDS with others that use only a little bit? Probably.
I also wonder how this compares with NV which seems to have more LDS in general (Ampere increased it once more).
They do share the instruction cache just not the vector L0.Actually they don't share L0. Each WGP can access the double the LDS because it is part of the WGP.
More Info on WGP mode from AMD's RDNA whitepaper.
YesHowever, the work-group processor mode allows using larger allocations of the LDS to boost performance for a single work-group
I wonder if CU can mix workgroups that use lots of LDS with others that use only a little bit? Probably.
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
case CHIP_SIENNA_CICHLID:
adev->gfx.me.num_me = 1;
adev->gfx.me.num_pipe_per_me = 2;
adev->gfx.me.num_queue_per_pipe = 1;
if (info->chip_class >= GFX10_3)
info->max_wave64_per_simd = 16;
else if (info->chip_class == GFX10)
info->max_wave64_per_simd = 20;
else if (info->family >= CHIP_POLARIS10 && info->family <= CHIP_VEGAM)
info->max_wave64_per_simd = 8;
The fetched instructions are deposited into wavefront controllers. Each SIMD has a separate instruction pointer and a 20-entry wavefront controller, for a total of 80 wavefronts per dual compute unit. Wavefronts can be from a different work-group or kernel, although the dual compute unit maintains 32 work-groups simultaneously. The new wavefront controllers can operate in wave32 or wave64 mode.
Yeah, that's my conclusion as well. I guess 20 was too much for Navi, so they've shrank it to shave off some transistors.So, according to that commit, in Sienna there is 16-entry wavefront controller per SIMD, right?
Yeah, that's my conclusion as well. I guess 20 was too much for Navi, so they've shrank it to shave off some transistors.
I haven't even notice it.Does the M in ASICREV_IS_SIENNA_M mean that it will be a mobile part?
#define ASICREV_IS_VEGA10_M(r) ASICREV_IS(r, VEGA10)
#define ASICREV_IS_VEGA10_P(r) ASICREV_IS(r, VEGA10)
#define ASICREV_IS_VEGAM_P(r) ASICREV_IS(r, VEGAM)
I haven't even notice it.
Code:#define ASICREV_IS_VEGA10_M(r) ASICREV_IS(r, VEGA10) #define ASICREV_IS_VEGA10_P(r) ASICREV_IS(r, VEGA10)
Vega 10 has never been released as a mobile part, right? There are some chips with V. I really can't find logic behind it, maybe Value, Mid and Performance?
Edit: oh, and Vega M is apparently P.
Code:#define ASICREV_IS_VEGAM_P(r) ASICREV_IS(r, VEGAM)
After seeing 128-bit bus I didn't ever think it is. I wonder why they pushed this one into the driver before Big Navi.I found some old tweet by komachi which says that M stands for "Mainstream" , unless things have changed. But it could indicate that Sienna Cichlid may not be the "Big Navi" that we're looking for. ¯\_(ツ)_/¯