AMD: Pirate Islands (R* 3** series) Speculation/Rumor Thread

What? So a 50-60% performance boost while running only a 25% increase in transistors and maintaining the same TDP is somehow not "real" next gen? Wow, I guess next year AMD is just going to go all Conan the Barbarian on everyone...
Does it really matter whether or not Fiji is called next gen or not?

Fiji will likely have the GCN of Tonga, which was an incremental improvement of Hawaii, which was an incremental improvement of Tahiti. There are no earth shattering jumps in efficiency between Tahiti, Hawaii, and Tonga, though the latter is definitely an improvement.

The headline feature of Fiji is going to be HBM. You're saying it will have a 25% increase in transistors compared to Hawaii. I think that's unlikely. I expect more something like 35% to 40%, in line with the increase of shaders cores (45%).

Or, due to the increased complexity of synthesis with TSMC/Samsung 14nm, they'll do much the same as Nvidia and re-synth their old cards with only ancillary upgrades.
I have no idea what you're suggesting here. I also don't know what you think will be so special about synthesizing for 14nm. IMO, the synthesis step will look a lot like typing 'make synthesis' on a command line.

And I suppose they decided to just rebadge absolutely everything for the "3xx" series again. A popular notion online despite last year's Tonga card showing a 17% increase in performance for transistor count and 25% increase in TDP for performance.
I see the efficiency increase of Tonga as orthogonal to whether or not they'll rebadge some other chips. While nice to have, the improvements of Tonga are not game changers they way they are when going from Kepler to Maxwell. There is no point in redoing Hawaii and some smaller chips for this kind of improvement. (One can't help but wonder why Tonga exists in the first place.) Add to this the fact the 28nm is really long in the tooth. The lifetime of new 28nm chips that would be introduced in a month or two will be relatively short. Does that warrant the expense and effort to do it? I don't think so.

But I'm sure those improvements and any made since can wait till next year for a struggling company trying to stay afloat. Either that or all the multitude of leaks are right and the 390x will be around equivalent to even a bit above a Titan X at 4k, ...
You seem to be of the opinion that 390X performing above the Titan X will be some kind of miracle? Of course, it will be faster. Anything else would be an epic fail. The question is how much faster it will be. If the difference is only marginal, then the question for AMD to answer will be why it was worth going through the trouble of using HBM.
 
Tonga ( well something like it ) should have been released at the same time of the 290x, i can imagine this was plan at this time, or at least a chip similar to it. ( not with the same features level, as the color compression maybe ). I can too imagine the reason why Tonga have been released finally is simply in relation of the mobile chip version ( OEM 285M).. They have decided to release the desktop version too.

You seem to be of the opinion that 390X performing above the Titan X will be some kind of miracle? Of course, it will be faster. Anything else would be an epic fail. The question is how much faster it will be. If the difference is only marginal, then the question for AMD to answer will be why it was worth going through the trouble of using HBM.

When you have pass 6 years to developp stacked memory, collaborate with Hynix then for release it to the market, when it is ready, you use it.. it was the same for GDDR4 with the x1950XTX, it was the same for GDDR5 with the 4870.
 
Last edited:
"World's first discrete GPU with full DirectX 12 implementation" implies that Fiji has some architectural features that Tonga must lack./
In Marketese language, this probably means feature level 12_1 and it only differs from feature level 12_0 (supported by all GCN 1.1 parts and Xbox one) by having Conservative Rasterization and Rasterizer Ordered Views - i.e. improvements to the fixed function rasterizer, not the shader computing units which is the essense of GCN. Even if Fiji supports these extensions, they probably do not require any changes to the basic GCN architecture of the shading processors.

i will be really surprised that Fiji have the same GCN architecture that Tonga, with all the change we know so far.
I understand you want Fiji to have some new exciting features, but GCN 1.1/1.2 is still perfectly capable of running Direct3D 12 workloads.

In addition this will imply that AMD have not made any other modification, improvement on the SM and ACE . And damn, looking how much old features of GCN are coming in DX12 ( thanks to mantle ), i dont see why AMD will have not work to increase his scaling on this front for keep his advance on it ( Async shaders, command buffers etc etc )
Direct3D 12 will support SM 5.1 across all supporting hardware. As for ACE, each AMD chip has different amount of ACE scheduling blocks - this is not really linked with the version of the GCN architecture in the shader processing units.
 
"World's first discrete GPU with full DirectX 12 implementation"
https://forum.beyond3d.com/posts/1831601/

BTW further decyphering AMD's Marketese, the slide you cite implies that "full DirectX 12 implementation" means Resource Binding Tier 3 which is already supported by all current GCN chips, and not really the feature level 12_1 with ROV and CR.

yPBxI49.jpg
 
With Fiji, only by add HBM, memory controller should drive a lot of change alone on the cache, architecture. i dont see how it can be the same revision features level of previous GCN. This should have drive to some deep modification who have need possibly changes on other part .
HBM is DRAM that looks like it has the same general behavior of the memory that preceded it. The interface is different, but the GPU's internals do not see how the bus twitches on the other side of the controller. HBM seems to imply more of what the GPU already has in abundance. There could still be changes, but I don't see a clear reason why a larger bandwidth number that is still within the same order of magnitude cannot be handled decently with the current overall architecture.

I have no idea what you're suggesting here. I also don't know what you think will be so special about synthesizing for 14nm. IMO, the synthesis step will look a lot like typing 'make synthesis' on a command line.
I am curious if the new physical properties brought on with FinFETs, the long node gap, Nvidia's success with larger dies, and the change in product mix would make it prudent to re-evaluate the balance point for AMD's architectures.
It seems like GCN was painted into a corner with a somewhat smaller preferred die size, compute focus, obsession with leveraging the IP, and transplanting it across so many disparate processes alongside CPUs.
Not as much area could be used up to save power, the designs that hit peak graphics performance bands share the same chip as compute, and Amdahl's law still seems to apply to some of the fixed function portions and geometry handling that Nvidia has done better or parallelized more completely, which AMD has to compensate for with higher clocks that the big transistor investment in parallel CUs would not prefer.

At least going forward, it looks like the CPU side has been beaten back quite a bit in how much it is permitted to bother the GPU, given the homogenized process targets.

I've seen AMD's physical design expertise pointed out in the past, it doesn't strike me as being in the lead at 28nm. Part of that might be that AMD has spent so much time with so many physical design targets with dwindling resources, but it seems like Nvidia has gotten parts with measurable perf/W advantages rather consistently of late without pushing uncomfortably close to the limits of the silicon and cooling.

AMD's power management and clocking does appear to be superior for now, and it seems like these measures are best utilized to push a less focused architecture and inconsistent implementations to stay within range of but consistently behind the competition.

You seem to be of the opinion that 390X performing above the Titan X will be some kind of miracle? Of course, it will be faster. Anything else would be an epic fail.
I'm generally confident it should be faster than the Titan X as we know it, probably. It's not like that level of confidence has been let down before, though.
 
I've seen AMD's physical design expertise pointed out in the past, it doesn't strike me as being in the lead at 28nm. Part of that might be that AMD has spent so much time with so many physical design targets with dwindling resources, but it seems like Nvidia has gotten parts with measurable perf/W advantages rather consistently of late without pushing uncomfortably close to the limits of the silicon and cooling.
I think perf/W part has much more to do with architectural and RTL design than with physical design. Which, unfortunately for AMD, is what requires the most resources.

Another potentially perverse problem for AMD is that GCN is very good at compute. There may be aspects of the current GCN architecture that make it inherently harder to rearchitect for perf/W. Nvidia had a major regression in terms of compute when going from Fermi to Kepler, but was able to compensate for that in different topics that IMO were worth the trade off. I'm not sure if AMD has the luxury of doing the same. This is, of course, pure speculation...

AMD's power management and clocking does appear to be superior for now, and it seems like these measures are best utilized to push a less focused architecture and inconsistent implementations to stay within range of but consistently behind the competition.
I don't know how it appears superior, but that's probably because I have paid a lot of attention. ;-)
 
Another potentially perverse problem for AMD is that GCN is very good at compute. There may be aspects of the current GCN architecture that make it inherently harder to rearchitect for perf/W. Nvidia had a major regression in terms of compute when going from Fermi to Kepler

Hard to say if GCN is good for compute, because not many people use it for compute. Almost everyone using GPUs for compute are using Nvidia GPUs due to ecosystem issues. This means there's a serious lack of deployed applications to judge GCN's compute performance.

Typical tech site compute benchmarks (like at Anandtech) are not a very good sampling of GPU compute in the wild.
 
Another potentially perverse problem for AMD is that GCN is very good at compute. There may be aspects of the current GCN architecture that make it inherently harder to rearchitect for perf/W.
For graphics, this could be the case. Reduced specialization and the lower iteration rate for discrete seems to be contributing.
However, a hypothetical architectural inability to target perf/W these days is very close to an indictment of the architecture.

I don't know how it appears superior, but that's probably because I have paid a lot of attention. ;-)
AMD's DVFS has very short response times, and I view its ability to hold TDP and temp as close to the red line as the 290 can with that unimpressive cooler as a sign of the implementation's ability to head off or control thermal and electrical transients with much narrower guard banding. Nvidia has had some notable failures with this, with fatal driver updates, more constrained electrical specs for some of its custom boards, and the wider gap in temp and wattage from their respective ceilings. I do not know about the latest Maxwell implementation, although it has been asserted to have improved, but prior examples had driver-level DVFS and transitions were measured in terms of rendered frames rather than microseconds.

For GCN, it's more clock cycles thanks to better handling of the physical system. It happens that those are less effective clock cycles, but I do not think that was up to the DVFS designers.
 
Hard to say if GCN is good for compute, because not many people use it for compute. Almost everyone using GPUs for compute are using Nvidia GPUs due to ecosystem issues. This means there's a serious lack of deployed applications to judge GCN's compute performance.
.
Let's say it this way then: whatever benchmarks and applications out there that allow us to compare, show GCN to be very capable.
 
Let's say it this way then: whatever benchmarks and applications out there that allow us to compare, show GCN to be very capable.

Well, i cant speak for scientifc compute, but i know many many peoples who use AMD GPU's + OpenCL for accelerated rendering / physical compute / images, video treatment. Theres a tons of exporters and internal rendering API using OpenCL for Blender, 3Dmax, Maya ( well autodesk softwares ), at all price ( from free, ( Luxrender / Luxcore ), to costly ones ( Chaos, Vray etc ). Let alone all the plugin, addon or whatever and the Video and image softwares.. I

Ofc, for easy setup in Blender, you will use Cycles ( CUDA ), but there's better render engines out there ( more physically accurate at my sense ) than it ( nothing to do with CUDA vs OpenCL, it is just how the rendering engine work who is different )
 
Last edited:
Im no so sure about it, taking whatever it is Hawaii or even Thaiti ( GCN 1.0 ), comparing the 780TI vs 290x, increase resolution and bandwith demand, and suddenly the 290x was take the lead, and lately, with some new games, we see the same scenario with the 980. ( where the old 290x close the 980 strangely at high resolution ).
I'm not really basing this on these higher end parts where the bandwidth limitations are more difficult to see, but rather the parts which are very bandwidth limited. Take Cape Verde vs GK107 for instance, the GTX 650 can just about keep up with the HD 7750. But take the ddr3 versions of these cards (GT640 vs. HD 7750 ddr3) and gk107 completely blows away Cape Verde.
And of course, with Maxwell it's even way more pronounced, the performance of a gm108 with 64bit ddr3 is quite amazing considering the bandwidth. Of course AMD doesn't have a comparable part there with framebuffer compression, but needless to say it's quite understandable noone wants to touch 64bit ddr3 Mars/Oland chips despite AMD having two dozen names for such parts (which all perform exactly the same except some synthetics benchmarks, or rather the (low) performance you get is determined solely by the memory frequency and not the part number - that isn't true for the 128bit versions but for some reason they are even more rare).
 
I'm not really basing this on these higher end parts where the bandwidth limitations are more difficult to see, but rather the parts which are very bandwidth limited. Take Cape Verde vs GK107 for instance, the GTX 650 can just about keep up with the HD 7750. But take the ddr3 versions of these cards (GT640 vs. HD 7750 ddr3) and gk107 completely blows away Cape Verde.
And of course, with Maxwell it's even way more pronounced, the performance of a gm108 with 64bit ddr3 is quite amazing considering the bandwidth. Of course AMD doesn't have a comparable part there with framebuffer compression, but needless to say it's quite understandable noone wants to touch 64bit ddr3 Mars/Oland chips despite AMD having two dozen names for such parts (which all perform exactly the same except some synthetics benchmarks, or rather the (low) performance you get is determined solely by the memory frequency and not the part number - that isn't true for the 128bit versions but for some reason they are even more rare).

Well i have find this test on hardware.fr http://www.hardware.fr/focus/76/amd-radeon-hd-7750-ddr3-test-cape-verde-etouffe.html ... I dont know if it is representative at first i was just search specifications for thoses gpu's, i find the bandwith problem between both rather similar when going to DDR low bandwith, ofc with maxwell i expect something else.

(last graphs is the average fps )
IMG0039894.gif


Anyway, both are dropping from 50 to 32fps average, its crazy how the low memory bandwith impact thoses gpu's.
 
Last edited:
Well i have find this test on hardware.fr http://www.hardware.fr/focus/76/amd-radeon-hd-7750-ddr3-test-cape-verde-etouffe.html ... I dont know if it is representative at first i was just search specifications for thoses gpu's, i find the bandwith problem between both rather similar when going to DDR low bandwith, ofc with maxwell i expect something else.

Anyway, both are dropping from 50 to 32fps average, its crazy how the low memory bandwith impact thoses gpu's.
Oh seems you are right. I can't remember which review I used to came to this conclusion but it must have been a different one...
Guess I'll have to revisit that statement then (for kepler vs. pre 1.3 gcn). (The GT640 also loses core clock vs the GTX 650 which the HD 7750 does not, however the absolute bandwidth for the GT640 is 10% lower vs. the HD 7750 ddr3 in this case which is likely at least as important.)
 
Typical tech site compute benchmarks (like at Anandtech) are not a very good sampling of GPU compute in the wild.
Even I won't debate that. Unfortunately it's very hard to get good cross-vendor compute benchmarks, especially when so much stuff is CUDA-only. But if you guys ever have any suggestions/requests, then please shoot me an email or PM. I'm always on the lookout for something newer/better.
 
Back
Top