AMD: Speculation, Rumors, and Discussion (Archive)

Status
Not open for further replies.
I'm mostly interested to see what the PR spin will be when both Nvidia and AMD don't put out anything significantly faster than last gen for their first round of Finfet cards. Koduri has recently been killing it for AMD in the same way Nvidia's been doing for years, making nonsense PR parties into big events that get a lot of press and etc. So good for them, and based on the "Polaris" PR so far, my guess is the first round will be "hey look at our 125 watt Fury X thing that's $200+ cheaper than last time!"

Which makes me wonder if they'll actually win the PR battle over Nvidia this first go around, who've been teasing the "Geforce 1080" which sounds like it should be super fast, but probably won't be anything more than 980ti like performance with some nice perf/watt gains from moving to Finfet/GDDR5x.
 
I don't think AMD or Nvidia would need to spin Fury X/GTX 980 Ti performance at 125W and half the price. They'd be too busy shoving cards out the door. ;)

Both companies have easy targets to meet. They need to show a clear upgrade on the 970, for both AMD and Nvidia buyers, though only AMD seems to have the relevant GPU anytime soon. A cut-down Polaris 10 should easily put paid to the 970 in every way - probable 980 performance with a cut-down 232mm2 die and 100W or so of power and cheaper to boot! That's the definition of no-brainer.

The other main target is the 980 Ti, which Nvidia must clear by a good margin in order to get the remaining 980 buyers who didn't upgrade to the Ti last time, and those Ti buyers who upgrade so long as it makes any kind of sense. I wonder if there is fatigue there amongst Nvidia buyers though, they've been shelling out a lot for 30% gains over and over the past 4 years.

I also wonder where Nvidia's small GPU is hiding.
 
Nvidia ... who've been teasing the "Geforce 1080" which sounds like it should be super fast, but probably won't be anything more than 980ti like performance with some nice perf/watt gains from moving to Finfet/GDDR5x.
What teasing? IMO Nvidia has been 100% silent about what kind of consumer GPU we can expect this year. Or do you consider Zauba part of Nvidia's marketing department?
 
I don't think AMD or Nvidia would need to spin Fury X/GTX 980 Ti performance at 125W and half the price. They'd be too busy shoving cards out the door. ;)

Both companies have easy targets to meet. They need to show a clear upgrade on the 970, for both AMD and Nvidia buyers, though only AMD seems to have the relevant GPU anytime soon. A cut-down Polaris 10 should easily put paid to the 970 in every way - probable 980 performance with a cut-down 232mm2 die and 100W or so of power and cheaper to boot! That's the definition of no-brainer.
Exactly. The upcoming generation should be one for the books, one of the easiest to convince buyers to upgrade.

The other main target is the 980 Ti, which Nvidia must clear by a good margin in order to get the remaining 980 buyers who didn't upgrade to the Ti last time, and those Ti buyers who upgrade so long as it makes any kind of sense. I wonder if there is fatigue there amongst Nvidia buyers though, they've been shelling out a lot for 30% gains over and over the past 4 years.
Nvidia has a history of targeting buyers who skip a generation. Think GTX 770 and GTX 780Ti. I'm sure they have the numbers to prove that these are the ones who are most likely to buy.
 
AMD have stated that it will be a focus for Polaris:
http://anandtech.com/show/9886/amd-reveals-polaris-gpu-architecture/2

GCN3 did alright as AMD's first implementation of memory compression, falling between Nvidia's 1st (Kepler) and 2nd (Maxwell) implementations.
http://techreport.com/review/28513/amd-radeon-r9-fury-x-graphics-card-reviewed/4
I don't really see that from these numbers...
But anyway, I've come to suspect a bigger issue in practice might be the non-unified L2 cache of gcn 1.0-1.2. AMD calls that unified, but it doesn't include rop caches. pre-gcn 1.2 if you actually wanted to do something like texturing using a compressed color msaa texture or compressed depth texture (using msaa or not) you needed to do a decompress blit - obviously that's going to eat bandwidth like crazy. Based on open-source driver sources, I think this is still necessary with gcn 1.2 (but it should be able to read non-msaa compressed color textures directly) - I could be wrong with these things as I just glanced at the driver... But that Polaris block diagram (also seen on that anandtech link) seems to indicate ROPs are now included in the L2 cache hierarchy too (something I've predicted might happen with gcn 1.2, but I was wrong, of course nvidia has done that since fermi), which probably would mean the shader core can access the compressed bits directly. There'd probably be lots of other benefits of having truly unified L2 cache too (with the main disadvantage that L2 bandwidth needs to be higher and cache subsystem gets more complex).
 
For be honest, memory management is moving with new API, new graphics engine, new find.. im not sure in 6-8months we could still apply current and old believe on memory usage, bandwith...
 
It's just the biggest version of Polaris, but with HBM2 instead of what will, probably, end up being GDDR5/X for the 2 smaller cards. Apparently the yields for GloFlo are as bad as those for TSMC, meaning both Nvidia and AMD have their big Finfet cards delayed. Looks like "Vega" is scheduled for "sometime" between the end of this year and beginning of the next, no doubt waiting on production ramp up and yields from the foundry.
And also delayed until HBM2 is "mainstream" as mentioned by Raja Koduri in the interview with Ryan Shrout.
Cheers
 
I don't really see that from these numbers...
Depends on what numbers you were looking at. What you can see, if you look at numbers across AMD products that 256b versions of Tonga were actually the most bandwidth efficient (from a perf/bandwidth perspective) solutions produced in recent times, by a pretty good margin.
 
Depends on what numbers you were looking at. What you can see, if you look at numbers across AMD products that 256b versions of Tonga were actually the most bandwidth efficient (from a perf/bandwidth perspective) solutions produced in recent times, by a pretty good margin.
I was just referring to the link from kalelovil - which only shows Fury.
But yes you are right Tonga is quite bandwidth efficient in at least straight colorfill tests - beating full gk110 with nearly twice the memory bandwidth even and soaring way past Tahiti (with nearly as large bandwidth advantage).
But compared to Maxwell, gcn 1.2 still looks much worse in practice. Though I'm mostly basing this on Topaz/Iceland compared to gm108, because the cheapie 64bit ddr3 implementations there are really limited by memory bandwidth - and that one is unfortunately a very lopsided comparison (that said, I've never seen a thorough review of Iceland compared to Mars, unfortunately, but that's entirely AMD's fault if you can't even figure out from the official part number what it is...).
 
Anyone have a TR link handy that shows Tonga in the B3D suite? I seem unable to find one. :(
 
Vega 10 is probably the Greenland GPU. So now what we've been talking about in terms of Arctic Islands has been split between Polaris and Vega.
 
There's a recent shipment of a card 30% pricier than Baffin XT(likely Polaris 10) samples.

AMD-Polaris-and-Vega-GPUs-C94-C98-and-C99.png
 
But that Polaris block diagram (also seen on that anandtech link) seems to indicate ROPs are now included in the L2 cache hierarchy too (something I've predicted might happen with gcn 1.2, but I was wrong, of course nvidia has done that since fermi), which probably would mean the shader core can access the compressed bits directly. There'd probably be lots of other benefits of having truly unified L2 cache too (with the main disadvantage that L2 bandwidth needs to be higher and cache subsystem gets more complex).
Actually I'm wondering what's really new with Polaris now... According to the open-source bits, the shader core is virtually unchanged (it's effectively just another 2 VI chips). Albeit of course other blocks (like uvd/vce) are indeed newer.
 
If that turns out to be true, then it might explain why Polaris is positioned kind of oddly in the sequence of architectures in the roadmap, if it turns out that Polaris is partly based on work scuttled by the 20nm skip.

That might explain why Tonga is apparent the reference point, and why Vega might have additional efficiency if it wasn't hit as hard by the skip. Shifting Polaris back a little in time makes for a straight line in improvement, if 20nm were used in the right year.

Polaris versus Vega and Navi might also raise questions as to whether or when some of the DX12 features GCN does not natively support might actually show up.
 
We still have no decent explanation for the vast amount of extra transistors in Tonga compared with Tahiti.

It should be possible to compare Fiji/Tonga/Antigua metrics and counts versus other GCNs to figure out some of what's going on with those extra transistors. But I'm too lazy.
 
Status
Not open for further replies.
Back
Top