AMD: Pirate Islands (R* 3** series) Speculation/Rumor Thread

It may not even be R&D. Spinning up a new chip that cannot justify replacing what is already in its niche means limited revenues for the expense of designing and spinning up a new ASIC, even if there's not much additional research needed for the implementation. Making up those costs when a new process could cut the already extended life cycle for that portion of the market short may not happen, or may not leave much upside if it is reached versus doing nothing.
 
Or who knows, maybe it's terrible, AMD will finally go bankrupt, Nvidia will have a monopoly on workstation and desktop graphics and suddenly triple all its card prices.

This nonsense claiming they'd greatly increase the cost of their cards is nothing but fear-mongering by the uninformed. I don't want AMD to go belly up, but if that ever happened, nVidia would have to take great care to avoid anti-trust legislation being brought against them. People seem to think Single-Firm markets equate to being a Monopoly, which is illegal and require being broken apart But that's far from reality especially if the reason it became a Single-Firm market was due to a company making a better product and/or poor business decisions by the competition.
Anti-trust legislation can only be brought against companies proven to be involved in all 3 of the following actions:
1. Illegal anti-competitve conduct
2. Had a specific intent to become a monopoly (a specific intent to destroy competition or build monopoly)
3. Monopolization creates a dangerous probability in market (the relevant market and the defendant's ability to lessen or destroy competition in that market.)

Since nVidia would already be considered a Single-Firm market, they'd be under strict rules via government oversight which would require them to abide by fair market practices, most notibly... "not raising the cost of products by 3x". Such acts would result in nVidia being hit with huge fines and possibly being broken up by an anti-trust court.

For a more thorough explanation, read this: http://www.justice.gov/atr/public/reports/236681_chapter1.htm (sorry about the .gov address)
 
Good , I will have a lot better cash flow next year than this year lol
LOL!

I dunno, I keep telling myself that I'm going to buy a 980Ti when it arrives. But then I also realize that my current 3930k + 7970GHz combo isn't wanting for anything, other than the fact that 1440p resolutions with EVERYTHING on ultra isn't always feasible. Maybe I can convince myself to wait until next year, just because...
 
LOL!

I dunno, I keep telling myself that I'm going to buy a 980Ti when it arrives. But then I also realize that my current 3930k + 7970GHz combo isn't wanting for anything, other than the fact that 1440p resolutions with EVERYTHING on ultra isn't always feasible. Maybe I can convince myself to wait until next year, just because...
Just do 10 Hail HBM's and 20 Our 14nm's and it shouldn't be too hard to wait
 
If next year brings the real jumps for both AMD and Nvidia to me its silly to buy a stop gap gpu. Esp now that we have all the vr activity.
 
The mere prospect of what next year's GPUs can do with 1TB/s+ compared to 1/3 of that which we have now with Titan X is enough to make AMD's next chip seem unexciting. Not to mention that it seems likely to be 28nm. And unlikely to conquer games at 8MP with maximum graphics settings.

But, my 7970 is close to 3 years old and some of my code is causing it to make noises that make me think it won't last until the HBM2 equipped GPUs arrive...
 
Are current GCN cards particularly bandwidth bound? Or rather, should we expect the differences to show up with just MSAA modes. :s

Or... even just smoother moments of lol-transparency areas in games.
 
I mentioned Titan X specifically because it is making so much better use of bandwidth than GCN. Is Titan X bandwidth bound? Titan X has slightly less bandwidth yet is clearly always faster in games.

Is AMD going to tackle whatever it is that makes GCN so slow given the amount of bandwidth available? Is delta compression like that found in Tonga the only bump in usage efficiency that'll occur? Is that all of the difference between GCN and Maxwell in terms of bandwidth usage efficiency?

Maybe we should ignore Hawaii's memory configuration and simply forget about its bandwidth usage efficiency, because the "slow" 512 bit interface is supposedly more efficient than a "fast" 384 bit interface (power/area-wise). Never mind the spare bandwidth that provides.

To be fair, Titan X is only 35% faster than 290X at 8MP in TechReport's test overall (though BF4 is 66%), which is a smaller gap than often touted. Titan X's headline bandwidth consumers: fillrate advantage of 54% and 16-bit texture filtering advantage of 124% would appear to indicate bandwidth could be a limitation. (Serious question: is 16-bit texture usage a serious factor in game performance?)

Maybe we'll see GCN with a better balance + delta compression to really use HBM1. Maybe the 50%+ bandwidth advantage that HBM1 brings over Titan X will be exercised properly. Or maybe we have to wait until a better node than 28nm to see HBM's legs being stretched.
 
Or maybe we have to wait until a better node than 28nm to see HBM's legs being stretched.
I think that's the main reason why Nvidia decided to forego HBM1 ...

Edit:
I'll be curious to see if using HBM1 brings "sledgehammer" performance or if it's usage is more "marketing" ploy.
 
Last edited:
Hmm, well, if the next card is 50% faster than 290X at 8MP, versus 35% for Titan X, then maybe no-one will care whether it's HBM or not.
 
Kyle at HardOCP posted "End of June..." in the 390x thread.
Followed it up with "400 next year and it will be the REAL next gen."

Pretty sure we were all assuming the 300 is GCN 1.3 and 400 is GCN 2.0 on a new node anyway.

What? So a 50-60% performance boost while running only a 25% increase in transistors and maintaining the same TDP is somehow not "real" next gen? Wow, I guess next year AMD is just going to go all Conan the Barbarian on everyone...

Or, due to the increased complexity of synthesis with TSMC/Samsung 14nm, they'll do much the same as Nvidia and re-synth their old cards with only ancillary upgrades. And I suppose they decided to just rebadge absolutely everything for the "3xx" series again. A popular notion online despite last year's Tonga card showing a 17% increase in performance for transistor count and 25% increase in TDP for performance.

But I'm sure those improvements and any made since can wait till next year for a struggling company trying to stay afloat. Either that or all the multitude of leaks are right and the 390x will be around equivalent to even a bit above a Titan X at 4k, and the rest of the line on down will have similar improvements. And while I'm not privvy to anything inside AMD, I know which outcome sounds more logical.
 
Then again... hmmm. Thinking about it, with all AMD's troubles their projected "GCN 2.0" may not have made this year. The 3xx series could be an advancement of Tonga... projecting out 4k of AMD's "Stream processors" from Tonga's 1792, that's a 2.23 increase. 3.3 teraflops is only 7.3... with a 10% increase over that you hit the rumored 8 teraflops of a 390x. It would still be a good advancement but not a huge "New architecture!" upgrade...

I don't know, maybe that's right somehow. If I were AMD I'd certainly have tried to hit 2.0 this year. But if they decided that wasn't possible early enough they might have tried to brute force the Synth down to 14nm. Complicating matters over synthing a known architecture onto a new node, but if they had no choice... Bleh.
 
Looking it up... yeah Graphics Core Next is almost 4 years old now. If GCN 2.0 was originally planned for this year on 20nm, then cancelled early this year when the node didn't pan out for being GPU friendly, that would leave a mostly complete architecture to be synthed on 14nm over the next 18 months while the 3xx series could just be yet another small update to GCN 1.0 ala Hawaii and Tonga. If Tonga and Hawaii are mostly recycled, except assumedly the delta frambuffer compression memory bus and new ISA for Hawaii... then each 256 block of "Stream processors" on 28nm is highly familiar and predictable for AMD at this point.

So... despite my earlier bullheadedness I could see, with 4+ years since GCN coming out, the resources for GCN 2.0 coming out at the same time as 14nm Finfet for AMD. I do remember GCN 2.0 supposedly being scheduled for this year, so a year plus delay to be re-synthed on the new node... at least for scheduling. It'll be interesting to see what date next year AMD hits with it. If all the extra trouble I've heard people have with Finfet pans out the 4xx series might not hit till Holiday next year.
 
Is AMD going to tackle whatever it is that makes GCN so slow given the amount of bandwidth available? Is delta compression like that found in Tonga the only bump in usage efficiency that'll occur? Is that all of the difference between GCN and Maxwell in terms of bandwidth usage efficiency?
Even Kepler was already somewhat more bandwidth efficient.
I think though the delta compression is the biggest reason why Maxwell is more bandwidth efficient, I always thought another reason is the fully unified cache nvidia has since Fermi (as rop cache is integrated into L2), but I don't have any proof for that.
 
we were all assuming the 300 is GCN 1.3 and 400 is GCN 2.0 on a new node anyway.
Nope, I'm assuming Fiji is GCN 1.2 like Tonga (GCN3 in AMD speak), and the rest of 300 series will be 1.0/1.1.

The 400 series (FinFET 14 nm) will be GCN 1.3 (GCN4 in AMD speak) or maybe a new microarchitecture.


with all AMD's troubles their projected "GCN 2.0" may not have made this year. The 3xx series could be an advancement of Tonga... projecting out 4k of AMD's "Stream processors" from Tonga's 1792
You can't just "update" parts of existing chips to a better spec - this is the same as designing a new chip from the scratch, as you have to go through all the same validation and testing steps. This would be a waste of resources when you have a 14 nm 3D node coming in the near future - from the economy point of view it makes much more sense to switch to the next node and make better yields than to make another 28 nm redesign with only a few minor changes.

If I were AMD I'd certainly have tried to hit 2.0 this year. But if they decided that wasn't possible early enough they might have tried to brute force the Synth down to 14nm. Complicating matters over synthing a known architecture onto a new node, but if they had no choice
It doesn't work this way. Fabs have "libraries" - i.e. a collection of working common blocks tested on specific resolution node and process materials - and you must use these libraries for physical design if you want good yields (or any yields at all).
http://www.globalfoundries.com/news...-dual-core-cortex-a9-processor-implementation
http://news.synopsys.com/2014-06-02...-Design-Tools-and-IP-for-14-nm-FinFET-Process


Graphics Core Next is almost 4 years old now. If GCN 2.0 was originally planned for this year on 20nm, then cancelled early this year when the node didn't pan out for being GPU friendly, that would leave a mostly complete architecture to be synthed on 14nm over the next 18 months while the 3xx series could just be yet another small update to GCN 1.0 ala Hawaii and Tonga.
What do you really expect from GCN 2.0?

I guess AMD is pretty satisfied with their current CISC architecture since they assume saturates the execution units much better than SIMD5 (Evergreen) did. Improving each individual processor with things like superscalar and out-of-order execution could be counter-productive - simply adding more processors would probably yield better performance gains overall.

Now they could once again revise the arrangement of caches, schedulers and stream processor blocks, but then it wouldn't be called GCN anymore, rather some new funky name to use in advertisement.
 
Even Kepler was already somewhat more bandwidth efficient.
I think though the delta compression is the biggest reason why Maxwell is more bandwidth efficient, I always thought another reason is the fully unified cache nvidia has since Fermi (as rop cache is integrated into L2), but I don't have any proof for that.

Im no so sure about it, taking whatever it is Hawaii or even Thaiti ( GCN 1.0 ), comparing the 780TI vs 290x, increase resolution and bandwith demand, and suddenly the 290x was take the lead, and lately, with some new games, we see the same scenario with the 980. ( where the old 290x close the 980 strangely at high resolution ).

Maxwell benefit of an SM architecture revised who allow a better efficiency of the SP.. ( basically a 1/3 performance increase on the shader side with same number count ). the rest is the high core speed of Maxwell to take in account : ~ 20-25% higher clock speed vs Kepler. And thats a lot, specially with an architecture who scale extremely well with the core speed. ( OC test is a good start for check it ).

Alone the core speed increase and the better efficiency of Shader processor performance drive, something like 40 %.

Ofc the memory compression algorythm is too responsible of it, thats effectively 100% sure that it help a lot Maxwell bandwith efficiency. This said, the technic have his limitation as shown with the 980.

In the case of the 290x or even 7970ghz, the same old problem is here, less than on Cypress, but the SM need to be feeded correctly and then the bandwith follow. In their case, if we compare to Maxwell, i tend to believe the SP efficiency hit his limit way before the memory bandwith.

There will be a way to compare them on this aspect: downclock the TitanX to the same performance level of the old 290x, then up the resolution and bandwith requirement on different games for see who is limited first.

But lets be honest, it will be complicated, even on the choice of games. Most games follow hardware possibilitty increase, specially if they have been developped with and for the actual architectures.
It is funny to see the suddenly increase on the Vram requirement on some studio titles, when suddenly some gpu's are more capable than what they was on this front.
 
Last edited:
"World's first discrete GPU with full DirectX 12 implementation" implies that Fiji has some architectural features that Tonga must lack.
https://forum.beyond3d.com/posts/1831601/

Not only that, i will be really surprised that Fiji have the same GCN architecture that Tonga, with all the change we know so far.. ( ofc the way AMD use the " GCN " features level description is a bit troublesome )..

Tonga have allways make me think to a small hawaii revision chips, something between the 7970 size and Hawaii features level, with some addition here and there ( color compression algorythm somewhat improved ).

With Fiji, only by add HBM, memory controller should drive a lot of change alone on the cache, architecture. i dont see how it can be the same revision features level of previous GCN. This should have drive to some deep modification who have need possibly changes on other part . In addition this will imply that AMD have not made any other modification, improvement on the SM and ACE . And damn, looking how much old features of GCN are coming in DX12 ( thanks to mantle ), i dont see why AMD will have not work to increase his scaling on this front for keep his advance on it ( Async shaders, command buffers etc etc )
 
Last edited:
Back
Top