AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Alas, that's a rather ancient OpenGL Test. Will see if I can run it tomorrow in the office. But IIRC the results have been... strange for a couple of other cards a few years back, so I stopped using it on a regular basis. I still don't see, however, how I can correlate certain filtering modes to ALUs. Except the results between filtering modes differ wildly from the one in Fiji/Polaris - which the ones tested with the modern B3D suite do not indicate.
Yeah, been a while since I've seen filtering performance broken down, but supposedly new architecture so could be interesting.

My thinking was there were INT8, INT16, and FP blending tests that may align to the 2x/4x ratios. So if INT8 was twice the rate of HDR it may be using the ALUs as opposed to TMUs with some "ideal" configuration for current workloads. I'm unsure if the TMU capabilities of Polaris matched what we assumed were deep learning capabilities with Vega. Dropping TMUs in favor of beefier ALUs makes sense going forward. That feature could be highly compiler dependent.
 
What if Vega doesn't have TMUs? With the 2xFP16(INT16?) and 4xINT8 they could be filtering with the shader cores. Then lower bandwidth and/or register pressure slowing things down. With everything seemingly programmable that makes sense. Could apply to ROPs as well. Still leaves the question of what's taking up all the space.

Wouldn't that immediately warrant an increase in ALU/NCU amount compared to Fiji, given the performance target?
 
Worrying thought: you know how Ryzen performance has some dependence on the memory clock, because infinity fabric clock and memory clock are linearly related? Well, what if infinity fabric in Vega is related to HBM clock and ...
 
  • Like
Reactions: Kej
Fragments not rasterized save ALU, TEX and ROP.
How would tiling help with a low level micro-benchmark that only tests texture bandwidth?

While Scott hinted at tiling in his initial Maxwell review, very little in the B3D results hinted at its presence. It's unlikely that the same micro-benchmark will have a major impact on Vega.

So we now have a case with a severe performance drop in Vega for games and a severe texture BW performance drop as well.

It's not 100% conclusive evidence that the two are related, of course, but it definitely smells.
 
So interesting, looking at all of the fillrate #s from PCGH: http://www.pcgameshardware.de/Vega-...elease-AMD-Radeon-Frontier-Edition-1232684/3/

In most of the tests @ 1050 clocks the Vega FE is exactly 25% slower than Fury X @ 1050. It is then 52% faster than itself going 1050 -> 1600 which is a 52% OC. That ends up resulting in only a 21% gain of Vega FE @ 1600 vs Fury X @ 1050. I wonder what caused the huge 25% regression in Vega FE over Fury X clock for clock?
 

Attachments

  • b3d_fillrates.png
    b3d_fillrates.png
    61.6 KB · Views: 25
Worrying thought: you know how Ryzen performance has some dependence on the memory clock, because infinity fabric clock and memory clock are linearly related? Well, what if infinity fabric in Vega is related to HBM clock and ...

The thought crossed my mind given how insensitive the bandwidth is to core clock.
I have not quite figured out a clear set of overheads that can give the bandwidth numbers, however.

Vega's compressed versus random x1 and x8 disparity may point to a discrepancy in the resources available for a compressor pipeline on the same side of the fabric as the memory controller.
It's possible the x8 test is thrashing whatever storage there is on that side, leading to the drop in bandwidth despite being compressible.

That might be a disabled feature or some other difference unique to Vega. Otherwise, there may be a difference in Fiji having more stacks, and more controllers+compression hardware to absorb possible penalties related to having to load metadata.
Half the channels at twice the speed may not be able to absorb some kind of banking conflict as readily.

There may be some other factor in play, given the ratio, like how many cache partitions there are and if IF injects some other overhead.


The filtering throughput is acting like Vega's running at 4/5 for texturing rate or experiencing a bubble every fifth cycle. If it's filtering, a pure vector load might not have the same issue. Possibly something else was exposed for instruction issue, like some of the scheduling pipeline for the memory path?
 
Wouldn't that immediately warrant an increase in ALU/NCU amount compared to Fiji, given the performance target?
I would think so, hence all the unexplained space. The alternative might be the ability to issue multiple vector instructions simultaneously or the FLOPs are a bit misleading. I had a theory on dual FMAs per lane using SMT that would probably require L0 cache to make work. Doubles or possibly quadruples flops under ideal conditions.

Worrying thought: you know how Ryzen performance has some dependence on the memory clock, because infinity fabric clock and memory clock are linearly related? Well, what if infinity fabric in Vega is related to HBM clock and ...
I had the same thought a while back, but we're seeing benchmarks at 1050 and 1100MHz with 8-Hi stacks. So unless HBM2 is far faster than anyone knew, they should have been able to design around it. Ryzen memory clocks were low initially, Vega FE looks to be ahead of the curve.
 
What is this supposed to mean?
If it were merely a matter of the data fabric being tied the speed of the memory controller, the bandwidth should be higher.
There are half as many stacks, while the DDR clock is slightly less than 2x the rate. The bandwidth utilization is worse than that discrepancy, and Vega experiences bandwidth loss in the more intensive tests that Fiji does not.

AMD fellow Maurice Steinman was reported as stating that Infinity Fabric is capable of delivering the full bandwidth of the attached DRAM, and the results are not proportional to that.
 
Well that's new.

Radeon RX Vega is on its way—but before that, we’re bringing it to you. We’re making a few stops to show it off in action: we’re packing up a few Radeon RX Vega GPUs and embarking on a mini community tour, and we’re hoping you join us!

At our stops, we’re setting up a Radeon RX Vega Experience area where you’ll be able to game on the upcoming graphics card and take in the experience, tradeshow-style.
 
any chance some of the rops and tmu's are just disabled ? perhaps thats where the problem is coming from. Lke i said these could be older chiips with disabled parts while vega gaming edition will be fully enabled chips.

Anyway just 2 weeks or so till we find out
 
1 - Texel fillrate (per TMU per clock) is pretty terrible compared to Polaris and even Fiji (maybe connected to new ROPs as L2 cache clients?)
2 - Effective bandwidth is actually lower than Fiji
Do note that (while I haven't seen what the B3D test actually does):
- Effective texture bandwidth is likely just the (number of pixels written * pixel size + number of texels read * texel size) / time (P.S.: Well it's more complicated then that as the texture is a render target so you can actually get the black compressed but that's the general idea).
- Texel fillrate would use a tiny texture, say 2x2 or 4x4 that would fit entirely into cache and thus be pretty much bandwidth independent (except for pixel output).
Meaning effective bandwidth is probably just fine (for example a compute kernel could confirm that) and that there is something weird going on with the TMUs. Numbers are too low for 256 of them and too high for 128 of them. Hmm...
 
Back
Top