AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

I guess if we actually wanted attention we could drop a couple of patches into llvmpipe or softpipe to support thousands of rendering threads.
 
I think that's unlikely. Do you have a source for that?
If they ripped out all 3D hardware (ROPs? TMUs?) then it probably wouldn't be GFX9 ISA compatible.

The code changes that cover some of this discuss no longer needing to enumerate graphics queues and load the microcode for several of the processors that exist with in the command processor block (PFP, ME). However, there's a brief change elsewhere that discusses enumerating certain things like Hi-Z and primitive FIFO sizes. These could be placeholders that might be rewritten at some point to remove the stubs, or these elements might still be present to some extent because they exist at the SE level (geometry engine, RBEs) instead of being attached to command processors. Perhaps the higher CU count is enabled in part because there were limitations to the GFX command processor's ability to manage that many CUs. Having multiple compute processors might allow for a larger pool to be subdivided internally since they don't pretend to host a massive single context.

The microcode engines that handle compute still interact with the SE hardware, and perhaps adjustments can be made for that part of the context. Other graphics functionality remains, since there are display controllers enumerated (default to gated-off unless specifically needed) and a streamlined context for surface parameters.
The graphics command processor block specializes in the backwards-compatible support of a large API-defined context at speed, but something not compatible or lower-spec could be useful for visualization or other scenarios.

Mixed in the changes are some mentions of integer-scaling of content, but I didn't see a statement that this was specific to Arcturus.
 
Regarding Arcturus, VCN 2.5, no display ip block with that and below you have how much of a compute card it is:
D_qmFyWU8AElbky.png:large
 
A recent change meant to reduce the impact of integrating Arcturus' larger CU count mentioned a term that was briefly discussed a few times in this forum in the past.
https://lists.freedesktop.org/archives/amd-gfx/2019-August/037800.html

This change is because SE/SH layout on Arcturus is 8*1, different from
4*2(or 4*1) on Vega ASICs.

Currently the cu bitmap array is 4x4 size, and besides the bitmap is used widely
across SW stack. To mostly reduce the scale of impact, we make the cu bitmap
array compatible with SE/SH layout on Arcturus. Then the store of cu bits of
each shader array for Arcturus will be like below:
SE0,SH0 --> bitmap[0][0]
SE1,SH0 --> bitmap[1][0]
SE2,SH0 --> bitmap[2][0]
SE3,SH0 --> bitmap[3][0]
SE4,SH0 --> bitmap[0][1]
SE5,SH0 --> bitmap[1][1]
SE6,SH0 --> bitmap[2][1]
SE7,SH0 --> bitmap[3][1]

The SE/SH layout seem like it addresses a sub-division that is possible within the CUs in a shader engine.
Going back to documentation in the Southern Islands ISA doc, there is a hardware register called HW_ID that references the SE number and another 1-bit identifier for the shader array within the SE.
What impact having the CUs within an SE being part of one single array or split into two is unclear. Perhaps it has some impact on how the CUs can be signaled or how they can arbitrate for shared resources like the memory crossbar or export.

In theory, the combination of SE, SH, and CU identifiers could have given enough space to differentiate 128 CUs all the way back in Southern Islands, at least in terms of that element of the architecture.
Since the SI ISA doc, AMD hasn't wanted to keep documenting this hardware register, though even with the recent RDNA ISA doc you can see the jump in numbers over where it likely still is.

This change sheds some light on Vega might look in terms of that register, and how Arcturus didn't follow the previously mentioned way of getting to 128 CUs.
Vega apparently had 4 SEs with 2 SH each, or some products with 4 SEs and 1 SH.
Arcturus is apparently going for 8 SEs and 1 SH each, but in order to reduce the impact of having to rewrite a commonly-used layout table that assumed 4 SEs max, the SH count is being repurposed for Arcturus to serve as an additional bit for differentiating between the first and second halves of the set of 8 shader engines.
What it means to have 1 SH isn't clear, although if it deals with how the shader engines can interface with the rest of the chip it might prevent excess complexity in linking them to their infrastructure (or the there's some other barrier to having 16 shader arrays).

Edit:
The table from the SI ISA doc, for reference:
Code:
Table 5.8 HW_ID
Field Bits Description
WAVE_ID    3:0    Wave buffer slot number (0-9).
SIMD_ID     5:4    SIMD to which the wave is assigned within the CU.
                   7:6     reserved.
CU_ID       11:8    Compute unit to which the wave is assigned.
SH_ID        12      Shader array (within an SE) to which the wave is assigned.
SE_ID       14:13  Shader engine the wave is assigned to.
TG_ID       19:16  Thread-group ID
VM_ID       23:20  Virtual Memory ID
RING_ID   26:24  Compute Ring ID
STATE_ID 29:27  State ID (graphics only, not compute).
 
Last edited:
That looks fairly solid, but I really can't figure out why AMD would keep a Vega GPU on Renoir. By then, Navi will have been out for about a year. I can't make sense of this.
Unless Renoir comes up later than expected, the difference should be half a year.
Regardless, I can't think of any reason to not adopt RDNA other than development time, or maybe AMD being concerned with the iGPU"s compute capabilities.
 
Aren't they separate use designs now? Compute focused Vega and RDNA for rasterization gaming?
 
Prior APUs lagged the discrete architectures as well. There seems to be an additional hurdle for integrating the output of CPU and GPU development pipelines. There are some possible reasons, like the teams responsible for implementing the APU need to wait on two different milestone dates from the CPU and GPU groups, potentially more involved physical design and engineering trade-offs between the CPU and GPU silicon, possibly more physical work if there's a mobile focus, and risk-management on top of all else.

Zen 2's chiplet strategy may be a hindrance here, assuming this is some kind of APU. A selling point of the dis-integration was using process technology and implementation techniques that best matched the type of processor, and an APU would revert things to the less-optimal mixed-use chips of before. Additional time or risk would require locking design elements in earlier, perhaps before Navi was considered suitable.
 
Isn't AMD using GF's 16nm process for the I/O chip of their chiplet CPU's?

If so, that might explain the use of Vega. Get the GPU+I/O chip from GF's 16nm, and pair it with Zen 2 chiplets from TSMC's 7nm.
 
Unless Renoir comes up later than expected, the difference should be half a year.
Regardless, I can't think of any reason to not adopt RDNA other than development time, or maybe AMD being concerned with the iGPU"s compute capabilities.

If it's just a six-month gap, then yes, I can see how timing might prevent the integration of a Navi GPU. This still feels like a shame, though. But I guess it's better to have Zen2+Vega in January than Zen2+Navi in July.
 
Prior APUs lagged the discrete architectures as well. There seems to be an additional hurdle for integrating the output of CPU and GPU development pipelines. There are some possible reasons, like the teams responsible for implementing the APU need to wait on two different milestone dates from the CPU and GPU groups, potentially more involved physical design and engineering trade-offs between the CPU and GPU silicon, possibly more physical work if there's a mobile focus, and risk-management on top of all else.

Zen 2's chiplet strategy may be a hindrance here, assuming this is some kind of APU. A selling point of the dis-integration was using process technology and implementation techniques that best matched the type of processor, and an APU would revert things to the less-optimal mixed-use chips of before. Additional time or risk would require locking design elements in earlier, perhaps before Navi was considered suitable.

Would it be possible to just have the GPU be part of the "uncore" chip rather than being included on the CPU core chips? Granted that would make for a rather large "uncore" chip, but would provide consistent GPU performance as well as retaining the flexibility in how to use the CPU cores.

Regards,
SB
 
Would it be possible to just have the GPU be part of the "uncore" chip rather than being included on the CPU core chips? Granted that would make for a rather large "uncore" chip, but would provide consistent GPU performance as well as retaining the flexibility in how to use the CPU cores.

Regards,
SB
Yes, see Arrandale.
 
Would it be possible to just have the GPU be part of the "uncore" chip rather than being included on the CPU core chips? Granted that would make for a rather large "uncore" chip, but would provide consistent GPU performance as well as retaining the flexibility in how to use the CPU cores.

Regards,
SB

It would be similar to how almost every northbridge with integrated graphics worked back when the chipset was on the motherboard, so I think it would be possible.
The size might be a counter-argument to using it, and unlike most of the IO die a GPU is very dense and very active silicon.
If the product is expected to make its way into laptops, the benefits of having the GPU die on a node with significant power savings would be significant.

If the CPU and GPU are on a 7nm node and this is going into laptops, an APU might make more sense since it can discard IFOP links and maybe shave off some of the IO capability of a Matisse IO die.
If this is on 7nm, GF is excluded and AMD's cost structure with two separate 7nm die could be unfavorable.
A custom uncore could also revisit the low-power and idle behaviors of Zen2, which seem to be somewhat higher on the desktop and could really go for optimization in portable systems.

Details are still sparse on what differences Renoir has versus Raven. Some of the code shares blocks with Raven Ridge for IDs and settings, although some things like video encoding and the PSP version are more recent.
Navi 10 does have more die area devoted to elements that might be less acceptable at this range. The L1 cache may not make as much sense in a system where a shader array is alone on the chip and not fighting for the L2, and the WGP organization splits a front end between 2 CUs versus 1:3. The geometry engine setup and other elements also seem like they take up a fair chunk of area in the center versus what was there in Vega.
In a system with very limited memory bandwidth, die area constraints, and contention with a CPU, perhaps Navi wasn't worth waiting for versus a more compact Vega GPU running at low clocks on 7nm.

I don't remember whether Raven was considered "late", and whether Vega's growing pains might have figured into its release schedule.
 
Back
Top