AMD: Speculation, Rumors, and Discussion (Archive)

Status
Not open for further replies.
Surprised compared to the size of the GPU I mean. So obviously transistor budget went somewhere else. Or density decreased.
Well there's the new video codec which is a whole new beast altogether. Real-time 4K 60FPS H265 shouldn't be exactly cheap if they're still going with somewhat programmable DSPs.

But of course, where I'm hoping they're using a lot of their transistor budget is where they failed more often, which is geometry performance.
 
The veracity of the various leaked benchmarks is unclear, but it doesn't look like the scaling factor for CU throughput has changed so far.
The 36 CUs are clocked high enough to almost totally negate the shortfall with Hawaii, and the Hawaii/Fury comparison breaks down since the CU count in this case is in range of the Tahiti/Tonga vs Hawaii transition.
 
Surprised compared to the size of the GPU I mean. So obviously transistor budget went somewhere else. Or density decreased.
Or the CU count isn't indicative of the total processors. AMD traditionally hasn't counted the scalars they added and in theory they are for more robust than in the past. Go the flexible SIMD route it would probably reduce the total a bit as well.
 
Do we know the exact clockspeed, which the GPU held running the mentioned benchmarks?

AMD's own slides allow for at least 1.1 GHz to get >5 TFLOPS.
A purported leaked internal slide gives at least 1.2 GHz in the "up to" scenario.

This is being rated against a 28nm architecture that only has "up to" clocks, so neither is strongly guaranteed.
If AMD's slides that did not have an "up to" mean an actual base clock, then it probably is hanging around in the 1.1-1.2 GHz range, and would be in the ballpark of the CU throughput of the 390X.
That it seems to somewhat underperform the 390X in many of the admittedly unverified tests doesn't give much evidence of a supposed change in effectiveness of AMD's non-CU resources--with the probable exception of bandwidth efficiency. I would wonder on the bandwidth point why the deficit is not being more fully overcome, as Tonga was eventually able to do on an older version of compression and small upclock.

If non-CU hardware has been improved in amount or capability, or even kept constant with Hawaii, it has ~100-150 MHz of more clock with a performance shortfall to show for it.
 
If non-CU hardware has been improved in amount or capability, or even kept constant with Hawaii, it has ~100-150 MHz of more clock with a performance shortfall to show for it.
Radeon R9 390X was "up to 1050 MHz", but in fact all availabe boards were non-reference, equipped with huge coolers, so many of them were able to hold the clock at 1050 MHz. Radeon RX 480 is targeted as power-efficient product, something like Nano. Nano didn't hold its 1000MHz clock in most situations. It's possible, that Radeon RX 480, just like Nano, won't run always at its top clockspeed. In fact it's possible, that actual clocks won't be extremely higher than R9 390X's. Event at 1266 MHz RX 480's arithmetic perfomance is 1 % lower than R9 390X's. At 1100 MHz its arithmetic performance is 17 % lower than R9 390X's. I think it's correct to expect, that CU-efficiency could be 1-17 % higher.
 
If someone wants to do a die shot analysis with the chip, it is here, got at AMD Brazil page:
13417455_1091626950911542_4088077310669149989_n.jpg
 
Radeon R9 390X was "up to 1050 MHz", but in fact all availabe boards were non-reference, equipped with huge coolers, so many of them were able to hold the clock at 1050 MHz. Radeon RX 480 is targeted as power-efficient product, something like Nano. Nano didn't hold its 1000MHz clock in most situations.
If AMD's slide with no "up to" modifier is giving a baseline, the RX 480 defaults to being faster than Hawaii.


It's possible, that Radeon RX 480, just like Nano, won't run always at its top clockspeed. In fact it's possible, that actual clocks won't be extremely higher than R9 390X's. Event at 1266 MHz RX 480's arithmetic perfomance is 1 % lower than R9 390X's. At 1100 MHz its arithmetic performance is 17 % lower than R9 390X's. I think it's correct to expect, that CU-efficiency could be 1-17 % higher.
I was addressing the contention that Polaris represents a shift towards greater reliance on the non-CU resources to drive performance scaling, in order to avoid Fury's limited scaling over Hawaii despite its greater CU resources and bandwidth.
Tonga already showed that the introduction of compression can allow for a small upclock to overcome an proportionately larger bandwidth deficit than what Polaris is not showing it can overcome.

The benchmarks showing the 480X falling in the 390-390X range would point to a correlation with CU throughput that doesn't seem to be changed. The ones that do show it falling short of the 390X would hint that the upper end of the CU-efficiency figure is less likely. Being a few percent off is not a notable change of the situation.

I also have doubts that a small chip on FinFET should need to throttle as extensively as a ~600mm2 chip on 28nm, particularly when Fiji has 2 product tiers that go way beyond Nano's power range, where Polaris does not.
I think it would be a worrisome sign of AMD's prospects at the start of a new node if it's already out of headroom.
 
Almost got me excited there then I checked and it is rv770..
Does amd only have 1 die shot they can use for marketing or what?

I don't know if it's 100% of their representative GPU shots, but it is in a lot of them.
I think it is perhaps the most effective AMD GPU to use for the purpose, given the small set of GPUs that have gotten die shots and its not being as cluttered for images that are scaled to very different sizes. More modern GPUs have so much in them that I don't know if they'd look like anything.

I feel that in terms of discrete AMD GPU shots, it aesthetically works the best and strikes a good balance of not looking muddy at a distance while still having detail.
There are some pretty APU shots out there with good detail. Some of the cat core APUs had much more visual love put into them than RV770 did, but those wouldn't work in a discrete slide and have a lot more non-GPU hardware in the picture.
 
Event at 1266 MHz RX 480's arithmetic perfomance is 1 % lower than R9 390X's. At 1100 MHz its arithmetic performance is 17 % lower than R9 390X's. I think it's correct to expect, that CU-efficiency could be 1-17 % higher.
Higher clock should have an effect on the front-end/other parts of the chip too?
 
So, I'm going to estimate that RX 480 is ~50% faster than R9 380X.

If we compare clocks: 1266 versus 970, that's 31%. Bearing in mind that a frame is not 100% bottlenecked by ROPs, it seems likely to me that RX 480 has 32 ROPs, i.e. 31% more fillrate.

64 ROPs would produce 161% more fillrate. A thoroughly ridiculous fillrate for only ~50% more performance.

Also, since I'm talking theoreticals: 2304 ALUs at 1266 versus 2048 ALUs at 970MHz is 47% more.
 
Status
Not open for further replies.
Back
Top