AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Kaotik · Jul 16, 2019

ToTTenTranz said:
I think that's unlikely. Do you have a source for that?
If they ripped out all 3D hardware (ROPs? TMUs?) then it probably wouldn't be GFX9 ISA compatible.

Not much:

Digging through this code drop today, it further points at a compute accelerator without any 3D support, "It's because Arcturus has not 3D engine."

https://www.phoronix.com/scan.php?page=news_item&px=Arcturus-Linux-Driver-Patches

JoeJ · Jul 16, 2019

Kaotik said:
It's supposedly pure compute chip with anything related to 3D ripped out

I'm excited!

Too bad nobody except me will believe this is the right direction also for gaming :|

Malo · Jul 16, 2019

JoeJ said:
Too bad nobody except me will believe this is the right direction also for gaming :|

Maybe in a couple of decades

bridgman · Jul 16, 2019

I guess if we actually wanted attention we could drop a couple of patches into llvmpipe or softpipe to support thousands of rendering threads.

PizzaKoma · Jul 16, 2019

https://twitter.com/x/status/1151112093631799296

https://twitter.com/x/status/1151130726516924417

https://twitter.com/x/status/1151115231726227456

3dilettante · Jul 17, 2019

ToTTenTranz said:
I think that's unlikely. Do you have a source for that?
If they ripped out all 3D hardware (ROPs? TMUs?) then it probably wouldn't be GFX9 ISA compatible.

The code changes that cover some of this discuss no longer needing to enumerate graphics queues and load the microcode for several of the processors that exist with in the command processor block (PFP, ME). However, there's a brief change elsewhere that discusses enumerating certain things like Hi-Z and primitive FIFO sizes. These could be placeholders that might be rewritten at some point to remove the stubs, or these elements might still be present to some extent because they exist at the SE level (geometry engine, RBEs) instead of being attached to command processors. Perhaps the higher CU count is enabled in part because there were limitations to the GFX command processor's ability to manage that many CUs. Having multiple compute processors might allow for a larger pool to be subdivided internally since they don't pretend to host a massive single context.

The microcode engines that handle compute still interact with the SE hardware, and perhaps adjustments can be made for that part of the context. Other graphics functionality remains, since there are display controllers enumerated (default to gated-off unless specifically needed) and a streamlined context for surface parameters.
The graphics command processor block specializes in the backwards-compatible support of a large API-defined context at speed, but something not compatible or lower-spec could be useful for visualization or other scenarios.

Mixed in the changes are some mentions of integer-scaling of content, but I didn't see a statement that this was specific to Arcturus.

PizzaKoma · Jul 19, 2019

Regarding Arcturus, VCN 2.5, no display ip block with that and below you have how much of a compute card it is:

3dilettante · Aug 2, 2019

A recent change meant to reduce the impact of integrating Arcturus' larger CU count mentioned a term that was briefly discussed a few times in this forum in the past.
https://lists.freedesktop.org/archives/amd-gfx/2019-August/037800.html

This change is because SE/SH layout on Arcturus is 8*1, different from
4*2(or 4*1) on Vega ASICs.

Currently the cu bitmap array is 4x4 size, and besides the bitmap is used widely
across SW stack. To mostly reduce the scale of impact, we make the cu bitmap
array compatible with SE/SH layout on Arcturus. Then the store of cu bits of
each shader array for Arcturus will be like below:
SE0,SH0 --> bitmap[0][0]
SE1,SH0 --> bitmap[1][0]
SE2,SH0 --> bitmap[2][0]
SE3,SH0 --> bitmap[3][0]
SE4,SH0 --> bitmap[0][1]
SE5,SH0 --> bitmap[1][1]
SE6,SH0 --> bitmap[2][1]
SE7,SH0 --> bitmap[3][1]

The SE/SH layout seem like it addresses a sub-division that is possible within the CUs in a shader engine.
Going back to documentation in the Southern Islands ISA doc, there is a hardware register called HW_ID that references the SE number and another 1-bit identifier for the shader array within the SE.
What impact having the CUs within an SE being part of one single array or split into two is unclear. Perhaps it has some impact on how the CUs can be signaled or how they can arbitrate for shared resources like the memory crossbar or export.

In theory, the combination of SE, SH, and CU identifiers could have given enough space to differentiate 128 CUs all the way back in Southern Islands, at least in terms of that element of the architecture.
Since the SI ISA doc, AMD hasn't wanted to keep documenting this hardware register, though even with the recent RDNA ISA doc you can see the jump in numbers over where it likely still is.

This change sheds some light on Vega might look in terms of that register, and how Arcturus didn't follow the previously mentioned way of getting to 128 CUs.
Vega apparently had 4 SEs with 2 SH each, or some products with 4 SEs and 1 SH.
Arcturus is apparently going for 8 SEs and 1 SH each, but in order to reduce the impact of having to rewrite a commonly-used layout table that assumed 4 SEs max, the SH count is being repurposed for Arcturus to serve as an additional bit for differentiating between the first and second halves of the set of 8 shader engines.
What it means to have 1 SH isn't clear, although if it deals with how the shader engines can interface with the rest of the chip it might prevent excess complexity in linking them to their infrastructure (or the there's some other barrier to having 16 shader arrays).

Edit:
The table from the SI ISA doc, for reference:

Code:

Table 5.8 HW_ID
Field Bits Description
WAVE_ID    3:0    Wave buffer slot number (0-9).
SIMD_ID     5:4    SIMD to which the wave is assigned within the CU.
                   7:6     reserved.
CU_ID       11:8    Compute unit to which the wave is assigned.
SH_ID        12      Shader array (within an SE) to which the wave is assigned.
SE_ID       14:13  Shader engine the wave is assigned to.
TG_ID       19:16  Thread-group ID
VM_ID       23:20  Virtual Memory ID
RING_ID   26:24  Compute Ring ID
STATE_ID 29:27  State ID (graphics only, not compute).

Deleted member 13524 · Aug 14, 2019

Looks like Picasso's successor Renoir is still bringing a Vega iGPU.

https://www.tomshardware.com/news/amd-renoir-apu-zen-2-vega-10-graphics,40145.html

If true, I wonder what clocks they'll achieve on 15 to 25W.

Alexko · Aug 14, 2019

ToTTenTranz said:
Looks like Picasso's successor Renoir is still bringing a Vega iGPU.

https://www.tomshardware.com/news/amd-renoir-apu-zen-2-vega-10-graphics,40145.html

If true, I wonder what clocks they'll achieve on 15 to 25W.

That looks fairly solid, but I really can't figure out why AMD would keep a Vega GPU on Renoir. By then, Navi will have been out for about a year. I can't make sense of this.

Deleted member 13524 · Aug 14, 2019

Alexko said:
That looks fairly solid, but I really can't figure out why AMD would keep a Vega GPU on Renoir. By then, Navi will have been out for about a year. I can't make sense of this.

Unless Renoir comes up later than expected, the difference should be half a year.
Regardless, I can't think of any reason to not adopt RDNA other than development time, or maybe AMD being concerned with the iGPU"s compute capabilities.

no-X · Aug 14, 2019

Or maybe AMD offered semi-custom Navi-based APU exclusively to a bountiful customer, e. g. Apple.

Malo · Aug 15, 2019

Aren't they separate use designs now? Compute focused Vega and RDNA for rasterization gaming?

3dilettante · Aug 15, 2019

Prior APUs lagged the discrete architectures as well. There seems to be an additional hurdle for integrating the output of CPU and GPU development pipelines. There are some possible reasons, like the teams responsible for implementing the APU need to wait on two different milestone dates from the CPU and GPU groups, potentially more involved physical design and engineering trade-offs between the CPU and GPU silicon, possibly more physical work if there's a mobile focus, and risk-management on top of all else.

Zen 2's chiplet strategy may be a hindrance here, assuming this is some kind of APU. A selling point of the dis-integration was using process technology and implementation techniques that best matched the type of processor, and an APU would revert things to the less-optimal mixed-use chips of before. Additional time or risk would require locking design elements in earlier, perhaps before Navi was considered suitable.

Tkumpathenurple · Aug 15, 2019

Isn't AMD using GF's 16nm process for the I/O chip of their chiplet CPU's?

If so, that might explain the use of Vega. Get the GPU+I/O chip from GF's 16nm, and pair it with Zen 2 chiplets from TSMC's 7nm.

Alexko · Aug 15, 2019

ToTTenTranz said:
Unless Renoir comes up later than expected, the difference should be half a year.
Regardless, I can't think of any reason to not adopt RDNA other than development time, or maybe AMD being concerned with the iGPU"s compute capabilities.

If it's just a six-month gap, then yes, I can see how timing might prevent the integration of a Navi GPU. This still feels like a shame, though. But I guess it's better to have Zen2+Vega in January than Zen2+Navi in July.

Silent_Buddha · Aug 15, 2019

3dilettante said:
Prior APUs lagged the discrete architectures as well. There seems to be an additional hurdle for integrating the output of CPU and GPU development pipelines. There are some possible reasons, like the teams responsible for implementing the APU need to wait on two different milestone dates from the CPU and GPU groups, potentially more involved physical design and engineering trade-offs between the CPU and GPU silicon, possibly more physical work if there's a mobile focus, and risk-management on top of all else.

Zen 2's chiplet strategy may be a hindrance here, assuming this is some kind of APU. A selling point of the dis-integration was using process technology and implementation techniques that best matched the type of processor, and an APU would revert things to the less-optimal mixed-use chips of before. Additional time or risk would require locking design elements in earlier, perhaps before Navi was considered suitable.

Would it be possible to just have the GPU be part of the "uncore" chip rather than being included on the CPU core chips? Granted that would make for a rather large "uncore" chip, but would provide consistent GPU performance as well as retaining the flexibility in how to use the CPU cores.

Regards,
SB

Bondrewd · Aug 15, 2019

Silent_Buddha said:
Would it be possible to just have the GPU be part of the "uncore" chip rather than being included on the CPU core chips? Granted that would make for a rather large "uncore" chip, but would provide consistent GPU performance as well as retaining the flexibility in how to use the CPU cores.

Regards,
SB

Yes, see Arrandale.

3dilettante · Aug 15, 2019

Silent_Buddha said:
Would it be possible to just have the GPU be part of the "uncore" chip rather than being included on the CPU core chips? Granted that would make for a rather large "uncore" chip, but would provide consistent GPU performance as well as retaining the flexibility in how to use the CPU cores.

Regards,
SB

It would be similar to how almost every northbridge with integrated graphics worked back when the chipset was on the motherboard, so I think it would be possible.
The size might be a counter-argument to using it, and unlike most of the IO die a GPU is very dense and very active silicon.
If the product is expected to make its way into laptops, the benefits of having the GPU die on a node with significant power savings would be significant.

If the CPU and GPU are on a 7nm node and this is going into laptops, an APU might make more sense since it can discard IFOP links and maybe shave off some of the IO capability of a Matisse IO die.
If this is on 7nm, GF is excluded and AMD's cost structure with two separate 7nm die could be unfavorable.
A custom uncore could also revisit the low-power and idle behaviors of Zen2, which seem to be somewhat higher on the desktop and could really go for optimization in portable systems.

Details are still sparse on what differences Renoir has versus Raven. Some of the code shares blocks with Raven Ridge for IDs and settings, although some things like video encoding and the PSP version are more recent.
Navi 10 does have more die area devoted to elements that might be less acceptable at this range. The L1 cache may not make as much sense in a system where a shader array is alone on the chip and not fighting for the L2, and the WGP organization splits a front end between 2 CUs versus 1:3. The geometry engine setup and other elements also seem like they take up a fair chunk of area in the center versus what was there in Vega.
In a system with very limited memory bandwidth, die area constraints, and contention with a CPU, perhaps Navi wasn't worth waiting for versus a more compact Vega GPU running at low clocks on 7nm.

I don't remember whether Raven was considered "late", and whether Vega's growing pains might have figured into its release schedule.

Kaotik · Aug 15, 2019

Tkumpathenurpahl said:
Isn't AMD using GF's 16nm process for the I/O chip of their chiplet CPU's?

If so, that might explain the use of Vega. Get the GPU+I/O chip from GF's 16nm, and pair it with Zen 2 chiplets from TSMC's 7nm.

Epycs I/O-chip is 14nm, Ryzens 12nm (and X570 chipset 14nm), GloFo doesn't have 16nm

AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Kaotik

Drunk Member

JoeJ

Malo

Yak Mechanicum

bridgman

PizzaKoma

3dilettante

PizzaKoma

3dilettante

Deleted member 13524

Guest

Alexko

Deleted member 13524

Guest

no-X

Malo

Yak Mechanicum

3dilettante

Tkumpathenurple

Alexko

Silent_Buddha

Bondrewd

3dilettante

Kaotik

Drunk Member