AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

What would the drawbacks be since it wasn't done before?
 
This suggests the SIMDs are now 2-wide in Vega...?
Though anyone could have made that in Paint...
The alleged diagram probably means Vega has one 8-wide, two 4-wide and one 2-wide pipeline, which surprisingly pretty much follows the patent that was linked above. This gives a variable wavefront size from 8 wide to 32 wide, assuming the same 4-cycle cadence. But it also means each "NCU" would get only 18 lane. So one might expect multiple of these NCUs to form a larger block that shares at least the LDS.
 
Last edited:


All this shows is that the card is working and driver QC is probably close to shipping.
I wish they had used a more punishing game than Battlefront, but if this is in Ultra settings + TAA then not even the Pascal Titan X can achieve solid 60FPS.
Though this could be a special new build of Battlefront with DX12.

But I guess tomorrow we'll know almost everything there is to know about the new cards.


Someone has taken a screenshot of the settings:

http://videocardz.com/65343/amd-demos-star-wars-battlefront-on-ryzen-and-vega-at-ces2017

FXAA and the FOV is very small at 55º (which is why they're showing it in 3rd-person, I guess), so I think it'll be very hard to find a comparable test in the web.
 
What would the drawbacks be since it wasn't done before?
Complexity is one drawback, in terms of steps in the execution loop, hardware dedicated to scheduling, register file allocation/access choices, and potentially peak throughput if following the patent diagram exactly (only 14 lanes in vector units, only one SIMD's worth in a CU).
Wavefronts can dynamically change width based on branch divergence, and the patent admits there is cost and uncertainty in deciding whether or not to change SIMD allocation to react to it.

I'm not clear if that diagram is legitimate. It's ambiguous in various aspects, and what it shows could be rather vanilla. However, more adventurous implementations (high-performance scalar, split files, shared access to files, dynamic detection of branch divergence, SIMD+issue unit sharing, weighing ALU versus memory-limited, turbo) can ramp complexity, the baseline power consumption, and the cost of mistakes.

AMD may also be looking into other changes, such as the how storage is allocated for a wavefront versus the worst-case upfront allocation currently done. Whether that meshes with what's here is unknown.

The alleged diagram probably means Vega has one 8-wide, two 4-wide and one 2-wide pipeline, which surprisingly pretty much follows the patent that was linked above. This gives a variable wavefront size from 8 wide to 32 wide, assuming the same 4-cycle cadence. But it also means each "NCU" would get only 18 lane. So one might expect multiple of these NCUs to form a larger block that shares at least the LDS.
The patent casts a decently wide net, with every parameter being physically or dynamically variable: number of scalar, high-performance scalar, and vector units, their actual widths versus partial gating, etc.
That rather simplistic diagram leaves off most of the interesting elements, unless they aren't there. I kind of hope it's not a marketing slide, and is just someone trying to explain part of the idea. The level of polish makes me hope it's not a marketing slide if only because it's a little too MS Paint, and that particular summary is one of the least interesting or differentiated of the implementations.

I don't follow the portion about not wasting SIMD space in the variable scenario. The visual language still seems to indicate 4 independent SIMDS, but unless SIMD lanes are migratory or AMD has discovered a 8-4-2-4 pattern to wavefront coverage, I don't see how it saves SIMD space. The 18-lane thing doesn't quite fit unless quads stopped being a thing.

Also, isn't Next Generation Compute Unit shortened to NGCU?
 
I kind of hope it's not a marketing slide, and is just someone trying to explain part of the idea.
Looks more like some kind of review guide or white paper if it is real. Slides are usually full of fanciness, aren't they?

I don't follow the portion about not wasting SIMD space in the variable scenario. The visual language still seems to indicate 4 independent SIMDS, but unless SIMD lanes are migratory or AMD has discovered a 8-4-2-4 pattern to wavefront coverage, I don't see how it saves SIMD space.
Some kind of migration or forwarding seem to be the case if it is real, and as implied by "not wasting space" with variable width SIMD. If it is just clock gating, it could say just power saving in 16 lane SIMDs. This might also explain the smaller number of hardware lanes in an NCU (complexity in data path and instruction scheduling).

The 18-lane thing doesn't quite fit unless quads stopped being a thing.
Are quads still a concrete concept in the CU domain though? They are essentially four consecutive work-items.

Also, isn't Next Generation Compute Unit shortened to NGCU?
There is no obligation in forming abbreviations with all the first letters though. Next-generation Compute Unit stands for NCU as well.
 
Last edited:
Looks more like some kind of review guide or white paper if it is real. Slides are usually full of fanciness, aren't they?
Maybe a review guide, although then I still hope not. That might be down to a personal bias against automotive analogies, however.
AMD's whitepapers have been classier than that.

Are quads still a concrete concept in the CU domain though? They are essentially four consecutive work-items.
There are some elements to the design that show optimizations for data swizzling between quads, and it's a reasonable expectation in a graphics context that a lot of work will be coming in a granularity of 4. A physically two-wide SIMD drawn in a similar position as a formerly independent 16-wide is creating a scenario where there's over-subscription when a quad needs to fit, or under-subscription if well-packed graphics wavefronts have to ignore it.

There is no obligation in forming abbreviations with all the first letters though. Next-generation Compute Unit stands for NCU as well.
The marketing may have been served well if that hyphen were added. That's more of a nitpick where I think it adds an iffy impression, like the MS-Paint level of the graphic in general.
 
There are some elements to the design that show optimizations for data swizzling between quads, and it's a reasonable expectation in a graphics context that a lot of work will be coming in a granularity of 4. A physically two-wide SIMD drawn in a similar position as a formerly independent 16-wide is creating a scenario where there's over-subscription when a quad needs to fit, or under-subscription if well-packed graphics wavefronts have to ignore it.
But the alleged diagram doesn't imply the instruction pipelining though. If the four-cycle lockstep execution is here to stay, that means at minimum the SIMD would be running an 8-wide wavefront, which fits two quads.
 
The semi truck illustration is just something someone made on Reddit anyway. The patent shows an 8 + 4 + 2 + 1 + 1 = 16 example configuration.

FXAA and the FOV is very small at 55º (which is why they're showing it in 3rd-person, I guess), so I think it'll be very hard to find a comparable test in the web.

I thought the same thing at first too, but apparently 55 is the default.
 
But the alleged diagram doesn't imply the instruction pipelining though. If the four-cycle lockstep execution is here to stay, that means at minimum the SIMD would be running an 8-wide wavefront, which fits two quads.
That would split a quad across clocks, which may not have been necessary before with operations that do work on a quad granularity like interpolation or the quad-swizzle DDP ops. Then there's some elements of the GPU's graphics hardware that work on quad granularity as well. They could be buffered, but seemingly add complexity just to be different.
I would be curious as to whether the other wider SIMDs do the same thing, or does the CU throw a different execution loop just for one SIMD.

The semi truck illustration is just something someone made on Reddit anyway.
That's good in my opinion, because I hope it's inaccurate enough to keep Vega interesting--just not too inaccurate.

The patent shows an 8 + 4 + 2 + 1 + 1 = 16 example configuration.
Although the diagram doesn't give two of those ALUs a vector register file to draw from.
 
Someone has taken a screenshot of the settings:

http://videocardz.com/65343/amd-demos-star-wars-battlefront-on-ryzen-and-vega-at-ces2017

FXAA and the FOV is very small at 55º (which is why they're showing it in 3rd-person, I guess), so I think it'll be very hard to find a comparable test in the web.

Do you people actually Game? that is standard vertical FOV for all DICE games since atleast BF3 maybe even BF:BC2
You also have to consider the map when looking at other benchmarks endor is one of the more taxing GPU maps.
 
I double that!
Vertical Fov 55 and 85 Horizontal Fov is standard for SW Battlefront and the Endor map is the most demanding SW map (all that vegetation costs some fps)
More than 60 fps will be very good, better than a GTX 1080 anyway...
 
Why would a FOV slider be for vertical FOV? that makes no sense.
 
Complexity is one drawback, in terms of steps in the execution loop, hardware dedicated to scheduling, register file allocation/access choices, and potentially peak throughput if following the patent diagram exactly (only 14 lanes in vector units, only one SIMD's worth in a CU).
Wavefronts can dynamically change width based on branch divergence, and the patent admits there is cost and uncertainty in deciding whether or not to change SIMD allocation to react to it.
I would actually prefer variable wavefront sizes realized by a variable amount of looping with a narrower SIMD (like vec4). Okay, it stays a bit more granular (if one keeps the latency=troughput=4 cycles one would get wavefronts sizes of at least 16), but one could keep a lot of the other stuff intact. For the smaller wavefronts one needs relatively more scalar ALUs in the CU (optimally still one per 4 vALUs). But that should be a relatively small investment.
One needs to increase the scheduling capacity per SP though, as each small vALU needs its own instructions. But it could work out in terms of power consumption as larger wavefronts should still dominate and one could gate the scheduling logic for 75% of the time for the old fashioned 64 element wavefronts. In case of smaller 16 or 32 element wavefronts, the increased throughput (potentially factor 4) justifies the increased consumption of the scheduler.
Being able to execute wavefronts of any size on any vALU (just over a variable amount of cycles) may avoid most of the problem of wavefront migration between different vALUs and register files. And it reduces the complexity of scheduling the workload to a set of different vALUs. Would appear as the more elegant solution to me.
 
Do you people actually Game? that is standard vertical FOV for all DICE games since atleast BF3 maybe even BF:BC2
You also have to consider the map when looking at other benchmarks endor is one of the more taxing GPU maps.
I played the beta demo extensively but now I just got it for the PS4 (15€ IIRC) to get access to the X-Wing VR demo.
Didn't get the game during release because I thought it had ridiculously low value. I still do, even at 15€ with the VR demo, but I just had to try flying a X-Wing in VR.

Regardless, it never crossed my mind that the FoV option shown in those settings was for vertical FoV. I've never seen a game with that setting before.
What do they call horizontal FoV?
 
Why would a FOV slider be for vertical FOV? that makes no sense.
I dont have SW:BF installed but here is BF4 and BF1

bf4%20FOV.png

bf1%20FOV.png



edit:
Regardless, it never crossed my mind that the FoV option shown in those settings was for vertical FoV. I've never seen a game with that setting before.
What do they call horizontal FoV?

They dont only one option to choose called FOV you have to hover over it for the actual description. I think its based off things like eyefinity because i can game in 1 or 3 screen modes and not have to touch anything settings wise. if you changed horizontal FOV the game would be a mess each time i changed.
 
Last edited:
ok so it's keeping the aspect ratio and scaling the fov for both vert+ and hor+ when you change the setting? First time I've seen fov scale for both scales, weird.

I think the point was though that 55 was the default setting and so irrelevant for the performance comparison as benchmarks would also be 55.
 
The Truck graphic, when real, just seems to be a band aid for a quite inefficient design to me. This would not reduce peak power consumption and not idle, only the typical gaming power consumption, but this would only be achieved by trading available processing power for lower power consumption. It makes sense for a design which rarely is able to use al SIMDs.
 
Back
Top