AMD: Pirate Islands (R* 3** series) Speculation/Rumor Thread

http://www.overclock.net/t/1561804/vmod-amd-radeon-r9-fury-x-4gb-hbm-4096-bit-review if the above linked site is too hammered by traffic. Thanks for the heads-up! Googling the link brought up the hit.
Edit: Some numbers seem weird. In COD Advanced Warfare a roughly 10% overclock sends the GTX 980 around 40% faster. I guess unlocking power can make a difference but not that much. Tired, maybe I'm reading things wrong. ;)
 
Last edited:
http://www.overclock.net/t/1561804/vmod-amd-radeon-r9-fury-x-4gb-hbm-4096-bit-review if the above linked site is too hammered by traffic. Thanks for the heads-up! Googling the link brought up the hit.
Edit: Some numbers seem weird. In COD Advanced Warfare a roughly 10% overclock sends the GTX 980 around 40% faster. I guess unlocking power can make a difference but not that much. Tired, maybe I'm reading things wrong. ;)
-30% OC on a 980Ti gives you nearly 60% more performance
-980Ti OC is almost as fast as 980Ti SLI.
-Titan X OC is faster than 980Ti SLI.

The fury gets slower when overclocked in gta 5 ... by about 6fps
an overclocked ti gains about 25fps in the witcher over its non overclocked counter part


I'm going to go with its full of shit
 
Wouldn't you expect just the opposite? At higher definitions, there should be larger regions made up of relatively similar pixels. By "larger" I mean "containing more pixels".
My theory is that at higher resolutions there is too little tile re-use. Literally, pixels from more tiles are more likely to be in flight at any given time. So the compressor/decompressor becomes the bottleneck (more likely the former).
 
I especially like how framerates are acurately depicted to the second digit behind the comma... This clearly is the deciding factor when making a purchase decision.
 
Please keep reviews in the review specific thread and keep this as speculation.
 
The most striking piece of information provided by reviews is that performance didn't increase over Hawaii by nearly as much as theoretical figures suggested. You basically get 20~30%, depending on the display definition, where you might have expected about 45% based on raw ALU throughput or memory bandwidth. The picture is a little blurred by the fact that some 290Xs are poorly cooled (reference cooler in quiet mode) while others use custom coolers but are also overclocked, etc., but basically, Fiji doesn't scale very well.

Now, it's true that it has exactly the same fillrate as Hawaii clock for clock, so that could potentially be a severe limitation, but I'm not sure it actually is. The front-end, after all, is also unchanged.
 
The only thing that supports the argument that fill rate is more important than we think it is, is the fact that Nvidia doubled the amount of ROPs and thought it was worth spending the area. One way or the other, they must have been on to something, but I can't explain it.
 
Maybe it's not strictly fill rate, but that they need to get data in and out of L2 and have to go through ROPs?
 
I wonder, if it would have been a better decision for AMD to re-balance Fiji's architecture and fit two more setup pipes and bump the ROP count to 96, at the expense of a bit less multiprocessors?
Looking at Tonga's die, a single setup pipe takes roughly the same area as a CU. So, four less CUs (64 -> 60) would balance quite well for two more setup pipes and eight additional ROP clusters, that would definitely utilize the HBM throughput more rationally and would be more "visible" for the purpose of high-resolution gaming/benchmarking.
 
Looking at Tonga's die, a single setup pipe takes roughly the same area as a CU.
Which block do you consider to be a setup pipe on that die shot?

I'm curious if someone has an idea of what operations could be making the Techreport's fillrate benchmarks act the way they do for Fiji.
It looks like there's a ceiling, and from all appearances that memory bus is eager to give way more bandwidth than those ROPs can use.
 
Where exactly are you thinking that it would land up if there were no ceiling? Need to bring back that Vantage bench instead of the 'fancy beyond3d suite'. Maybe Anandtech would do it in their synthetics bench.

Ok they have updated their GPU benches but there is still no review out. Fury does better at vantage pixel fill than a 980Ti.

http://www.anandtech.com/bench/product/1496?vs=1513
 
Last edited:
Which block do you consider to be a setup pipe on that die shot?

I'm curious if someone has an idea of what operations could be making the Techreport's fillrate benchmarks act the way they do for Fiji.
It looks like there's a ceiling, and from all appearances that memory bus is eager to give way more bandwidth than those ROPs can use.
sawhPeg.jpg


Green outline -- setup block;
Cyan outline -- a single CU;
 
I was considering the possibility that the block of what appears to be SRAM in the upper third of the green block was an L2 section.
 
The only thing that supports the argument that fill rate is more important than we think it is, is the fact that Nvidia doubled the amount of ROPs and thought it was worth spending the area. One way or the other, they must have been on to something, but I can't explain it.
Don't forget, nvidia also has slow ROPs for certain operations (like fp16 or worse, fp32 blend), which isn't the case for AMD. So, doubling the amount of ROPs essentially fixed that problem without making the ROPs themselves more complex. If the ROPs are reasonably cheap, making sure you are unlikely to ever be limited by them (rather than bandwidth) might make sense. Nvidia also has the rasterizer throughput to make good use of them (they didn't in previous chips), 4x16 pixels on gm204 (same can be said for all other gm2xx chips, so 6x16 on gm200 with 96 ROPs) whereas AMD probably would not (4x32 pixels might not be all that helpful, and I don't get the impression scaling up to 8x16 with the current gcn architecture would be particularly effective or feasible).
 
Back
Top