AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Digidi · Jul 11, 2017

CarstenS said:
Not that I know of. I have nothing omitted, except the mentioned redundant texture fillrate tests.
.

Thank you for the Info.

Anarchist4000 · Jul 11, 2017

CarstenS said:
Alas, that's a rather ancient OpenGL Test. Will see if I can run it tomorrow in the office. But IIRC the results have been... strange for a couple of other cards a few years back, so I stopped using it on a regular basis. I still don't see, however, how I can correlate certain filtering modes to ALUs. Except the results between filtering modes differ wildly from the one in Fiji/Polaris - which the ones tested with the modern B3D suite do not indicate.

Yeah, been a while since I've seen filtering performance broken down, but supposedly new architecture so could be interesting.

My thinking was there were INT8, INT16, and FP blending tests that may align to the 2x/4x ratios. So if INT8 was twice the rate of HDR it may be using the ALUs as opposed to TMUs with some "ideal" configuration for current workloads. I'm unsure if the TMU capabilities of Polaris matched what we assumed were deep learning capabilities with Vega. Dropping TMUs in favor of beefier ALUs makes sense going forward. That feature could be highly compiler dependent.

Jawed · Jul 11, 2017

silent_guy said:
But that doesn't hold for the texture units. AFAIK, there are no tiler consequences there.

Fragments not rasterised save ALU, TEX and ROP.

Deleted member 13524 · Jul 11, 2017

Anarchist4000 said:
What if Vega doesn't have TMUs? With the 2xFP16(INT16?) and 4xINT8 they could be filtering with the shader cores. Then lower bandwidth and/or register pressure slowing things down. With everything seemingly programmable that makes sense. Could apply to ROPs as well. Still leaves the question of what's taking up all the space.

Wouldn't that immediately warrant an increase in ALU/NCU amount compared to Fiji, given the performance target?

Jawed · Jul 12, 2017

Worrying thought: you know how Ryzen performance has some dependence on the memory clock, because infinity fabric clock and memory clock are linearly related? Well, what if infinity fabric in Vega is related to HBM clock and ...

silent_guy · Jul 12, 2017

Jawed said:
Fragments not rasterized save ALU, TEX and ROP.

How would tiling help with a low level micro-benchmark that only tests texture bandwidth?

While Scott hinted at tiling in his initial Maxwell review, very little in the B3D results hinted at its presence. It's unlikely that the same micro-benchmark will have a major impact on Vega.

So we now have a case with a severe performance drop in Vega for games and a severe texture BW performance drop as well.

It's not 100% conclusive evidence that the two are related, of course, but it definitely smells.

BacBeyond · Jul 12, 2017

So interesting, looking at all of the fillrate #s from PCGH: http://www.pcgameshardware.de/Vega-...elease-AMD-Radeon-Frontier-Edition-1232684/3/

In most of the tests @ 1050 clocks the Vega FE is exactly 25% slower than Fury X @ 1050. It is then 52% faster than itself going 1050 -> 1600 which is a 52% OC. That ends up resulting in only a 21% gain of Vega FE @ 1600 vs Fury X @ 1050. I wonder what caused the huge 25% regression in Vega FE over Fury X clock for clock?

3dilettante · Jul 12, 2017

Jawed said:
Worrying thought: you know how Ryzen performance has some dependence on the memory clock, because infinity fabric clock and memory clock are linearly related? Well, what if infinity fabric in Vega is related to HBM clock and ...

The thought crossed my mind given how insensitive the bandwidth is to core clock.
I have not quite figured out a clear set of overheads that can give the bandwidth numbers, however.

Vega's compressed versus random x1 and x8 disparity may point to a discrepancy in the resources available for a compressor pipeline on the same side of the fabric as the memory controller.
It's possible the x8 test is thrashing whatever storage there is on that side, leading to the drop in bandwidth despite being compressible.

That might be a disabled feature or some other difference unique to Vega. Otherwise, there may be a difference in Fiji having more stacks, and more controllers+compression hardware to absorb possible penalties related to having to load metadata.
Half the channels at twice the speed may not be able to absorb some kind of banking conflict as readily.

There may be some other factor in play, given the ratio, like how many cache partitions there are and if IF injects some other overhead.

The filtering throughput is acting like Vega's running at 4/5 for texturing rate or experiencing a bubble every fifth cycle. If it's filtering, a pure vector load might not have the same issue. Possibly something else was exposed for instruction issue, like some of the scheduling pipeline for the memory path?

xEx · Jul 12, 2017

Jawed said:
Worrying thought: you know how Ryzen performance has some dependence on the memory clock, because infinity fabric clock and memory clock are linearly related? Well, what if infinity fabric in Vega is related to HBM clock and ...

I'd Like to think thousands of engineers are AMD are not that stupid.

Anarchist4000 · Jul 12, 2017

ToTTenTranz said:
Wouldn't that immediately warrant an increase in ALU/NCU amount compared to Fiji, given the performance target?

I would think so, hence all the unexplained space. The alternative might be the ability to issue multiple vector instructions simultaneously or the FLOPs are a bit misleading. I had a theory on dual FMAs per lane using SMT that would probably require L0 cache to make work. Doubles or possibly quadruples flops under ideal conditions.

Jawed said:
Worrying thought: you know how Ryzen performance has some dependence on the memory clock, because infinity fabric clock and memory clock are linearly related? Well, what if infinity fabric in Vega is related to HBM clock and ...

I had the same thought a while back, but we're seeing benchmarks at 1050 and 1100MHz with 8-Hi stacks. So unless HBM2 is far faster than anyone knew, they should have been able to design around it. Ryzen memory clocks were low initially, Vega FE looks to be ahead of the curve.

3dilettante · Jul 12, 2017

xEx said:
I'd Like to think thousands of engineers are AMD are not that stupid.

If they were merely that stupid, Vega would have better bandwidth numbers for 3 of the 4 memory test cases.

Deleted member 13524 · Jul 12, 2017

3dilettante said:
If they were merely that stupid, Vega would have better bandwidth numbers for 3 of the 4 memory test cases.

What is this supposed to mean?

3dilettante · Jul 12, 2017

ToTTenTranz said:
What is this supposed to mean?

If it were merely a matter of the data fabric being tied the speed of the memory controller, the bandwidth should be higher.
There are half as many stacks, while the DDR clock is slightly less than 2x the rate. The bandwidth utilization is worse than that discrepancy, and Vega experiences bandwidth loss in the more intensive tests that Fiji does not.

AMD fellow Maurice Steinman was reported as stating that Infinity Fabric is capable of delivering the full bandwidth of the attached DRAM, and the results are not proportional to that.

Geeforcer · Jul 12, 2017

Well that's new.

Radeon RX Vega is on its way—but before that, we’re bringing it to you. We’re making a few stops to show it off in action: we’re packing up a few Radeon RX Vega GPUs and embarking on a mini community tour, and we’re hoping you join us!

At our stops, we’re setting up a Radeon RX Vega Experience area where you’ll be able to game on the upcoming graphics card and take in the experience, tradeshow-style.

eastmen · Jul 12, 2017

any chance some of the rops and tmu's are just disabled ? perhaps thats where the problem is coming from. Lke i said these could be older chiips with disabled parts while vega gaming edition will be fully enabled chips.

Anyway just 2 weeks or so till we find out

Clukos · Jul 12, 2017

https://twitter.com/x/status/884788921154105344

https://twitter.com/x/status/884816425369694208

silent_guy · Jul 12, 2017

Geeforcer said:
Well that's new.

Would those be pre-launch-post-pre-launches?

Geeforcer · Jul 12, 2017

silent_guy said:
Would those be pre-launch-post-pre-launches?

Considering how unusual this whole rollout has been, a literal traveling carnival seems par the course. I feel we are about a week away from Chris Hook tweeting out that he will shotgun a 40oz and fight a cayote for ever 1,000 preorders.

seahawk · Jul 12, 2017

Do we have to guess the driver and BIOS standard again for the Radeon mini World Tour?

MDolenc · Jul 12, 2017

ToTTenTranz said:
1 - Texel fillrate (per TMU per clock) is pretty terrible compared to Polaris and even Fiji (maybe connected to new ROPs as L2 cache clients?)
2 - Effective bandwidth is actually lower than Fiji

Do note that (while I haven't seen what the B3D test actually does):
- Effective texture bandwidth is likely just the (number of pixels written * pixel size + number of texels read * texel size) / time (P.S.: Well it's more complicated then that as the texture is a render target so you can actually get the black compressed but that's the general idea).
- Texel fillrate would use a tiny texture, say 2x2 or 4x4 that would fit entirely into cache and thus be pretty much bandwidth independent (except for pixel output).
Meaning effective bandwidth is probably just fine (for example a compute kernel could confirm that) and that there is something weird going on with the TMUs. Numbers are too low for 256 of them and too high for 128 of them. Hmm...

AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Digidi

Anarchist4000

Jawed

Deleted member 13524

Guest

Jawed

silent_guy

BacBeyond

Attachments

3dilettante

xEx

Anarchist4000

3dilettante

Deleted member 13524

Guest

3dilettante

Geeforcer

Harmlessly Evil

eastmen

Clukos

Bloodborne 2 when?

silent_guy

Geeforcer

Harmlessly Evil

seahawk

MDolenc