AMD: Sea Islands R1100 (8*** series) Speculation/ Rumour Thread

It's apparent that Nvidia's next generation Kepler family has surpassed AMD's latest tech in size and efficiency metrics. This is bad news for AMD/ATI engineers. The time has come to start scouting the bottlenecks within Pitcairn and Tahiti that need tweaking and begin re-thinking their design goals in order to regain engineering supremacy. More Rops, clock domains, higher pixel fillrate, clock boosts, big dies, high speeds...whatever it may be.

Considering Nvidia's Engineering skill has lagged behind AMD for 6-8 years now, Kepler comes as a welcomed change and quite a surprise. GK104 challenges DAAMIT's most efficient asic Pitcairn in performance/watt, transistor density, performance/mm2, and perhaps temperature and voltage metrics. Nvidia will now enjoy a technical advantage in GPU compute translated into a commanding lead in the HPC arena, which does bode well for them. Not only this, but their overall consumer product quality is superb in many other ways including: Driver support, Software quality, CUDA, 3D Vision, Physx, customer support, game support, community support - I recognize all this is due to differing company strategy.

I'll be honest, I don't know enough about Mimd/Simd/warps etc to understand exactly what architecture changes have given Kepler the 3x performance/watt advantage over previous Nvidia designs. However 1536 is the same core count as Cayman, and very different from the previous 512/384, so it's fairly obvious where this idea originated. To the lay person's eyes, GTX680 successfully copies DAAMIT's best assets, such as IHS-less die contact, 4-way display output, Eyefinity, Powertune, high speed GDDR5, and adds new features like GPU boost, adaptive Vsync, and TXAA. Nvidia has years of engineering experience struggling to dissipate 300+watts, subjectively, their cooling designs are quieter and more efficient.

Power delivery and PCB component quality is perhaps last feature the Radeon boards do have an upper hand with. One more GTX590 type fiasco will guarantee Nvidia copying this as well. What does all this mean for the Radeon 8000 series? Well for starters, it's going to have to pack a punch if they plan to enter GK110 territory. Should DAAMIT head back to the grindstone with Sea Islands?
 
wait complete reviews before go on assumption.

Yes, with all these leaked GK104 results, I do wonder how many of them are fake. :LOL: In any case, if GK104 is capable of 3400 Xtreme 3D Mark 11 score, then nvidia is about to take the performance crown immediately back. That's the result of 6990 OC according to Linkie

I do wonder whether AMD can follow this strategy- to introduce a very gaming oriented die at less than 320 mm^2 and one compute-heavy oriented die around and bigger than 400-450 mm^2. That's the only way they can fight for performance leadership, otherwise they will always be beaten. Like a true underdog. :mrgreen:

Considering Nvidia's Engineering skill has lagged behind AMD for 6-8 years now, Kepler comes as a welcomed change and quite a surprise.

Somebody has already forgotten 2900XT and how 8800GT was beating it every so often. :LOL:
 
I do wonder whether AMD can follow this strategy - to introduce a very gaming oriented die at less than 320 mm^2 and one compute-heavy oriented die around and bigger than 400-450 mm^2.
The main "problem" AMD is facing right now is that GCN basically is their first major move towards a very compute heavy architecture - a step NVidia has taken a long time ago.

That being said, AMD basically already has a "bigger, compute heavy chip" in Tahiti and a "smaller, more gaming oriented chip" in Pitcairn.

Given that a 24-CU-Pitcairn would probably have ended up very near to RV770 in die size, you could actually argue they're just returning to a more aggressive small die strategy.

In a time where chips are made of billions of transistors, the future lies - on many levels - with horizontal scalability: It's not about how much "punch" you can pack into one really complex element, but about how many smaller (yet very intelligently designed) elements you can efficiently implement into a larger system (or how many "larger systems" you can efficiently implement into a "system-of-systems", and so on).
 
There's more to it (Nvidias apparent strategy change) IMHO.

First, new processes have become more an more problematic in their early phases plus there are more contenders for „beta runs“ (and thus probably discounts on wafer/die price). That means you really want to have numbers out of your first ASIC on a new process while at the same time being faster than last-gen high-end. The big dies make more money when they're put out later when yields have stabilized.

Second, we've hit the 300 watt power wall hard with large monolothic dies. In the past (read: pre-Fermi, AMD had their experience with R600), you always had some headroom left in terms of power and the real struggle was with die size (hot clock vs. area, now synchronous clock vs. power).

Third, the (real) entry-level segment has all but disappeard, making scaling down to 75is square millimeter chips not really important any more, since that performance level is served by APUs already.

There's probably more, but I can't think of it right now. But already this certainly warrants that you re-think your strategy as company answering to shareholders and not to tech-geeks and ub0r enthusiasts in the first place.
 
How costly is the 1/4 DP rate of Tahiti compared to Pitcairn's 1/16? My thinking is that the next gen AMD dropping down to Pitcairn's rate and steering compute oriented customers towards Tahiti based products and eventually processors that contain both x86 and GCN cores.
 
I wouldn't be surprised if both companies went to a "small" die strategy for the consumer market in the future. With "large" die GPUs being primarily targetted at HPC/professional markets with a bit of bleed into potential enthusiast "sky is the limit pricing" class.

So it's entirely possible that GF110 could be the last consumer oriented large die from Nvidia while Tahiti may end up as the last consumer oriented large die from AMD.

Not saying that's going to be the case going forwards, but it does make sense. The consumer space doesn't really need high performant DP or ECC, for example.

And as GF 104 shows. If you cut out alot of stuff, you can make highly performant consumer chips with less die space and less power consumption.

And on AMD's side you can see that adding some of the things to enhance compute on Tahiti has likely made the chip larger and more power hungry than it might have been without when comparing it to Pitcairn.

Regards,
SB
 
Make better use of PowerTune?

What I'd be very interested in - given that Nvidia actually USES its TDP throttling technique in GTX680 - is what happens if you clock an HD7970 to 1,2Ghz while keeping the PowerTune switch @ default (or even -20%)?

How and when does the throttling actually kick in? Does average perf/W increase?

Most reviews obviously just set the PowerTune switch to +20% when overclocking (in order to AVOID throttling) - but given the way Nvidia very smartly applies what they call "GPU Boost" to push performance without compromising efficient power draw, it would be very interesting to see what AMD's PowerTune can do in that respect.

Anyone willing to do some tests?



That being said - I wonder whether AMD will react to NVidia's GPU Boost by pushing more aggresive PowerTune limits @ their upcoming cards.

For example:
Given that nearly all current HD7970s easily reach 1125Mhz - why not make dual-Tahiti a stock 1125Mhz card that's - for once - actually throttled by a (300W) PowerTune limit?

That way, the rumored (rather conservative) 850Mhz core clock should possibly end up as a kind of "worst case" (NvidiaSpeech: "base clock") scenario - but the card could actually push clocks a lot higher when power draw allows it.
 
What I'd be very interested in - given that Nvidia actually USES its TDP throttling technique in GTX680 - is what happens if you clock an HD7970 to 1,2Ghz while keeping the PowerTune switch @ default (or even -20%)?

How and when does the throttling actually kick in? Does average perf/W increase?

Most reviews obviously just set the PowerTune switch to +20% when overclocking (in order to AVOID throttling) - but given the way Nvidia very smartly applies what they call "GPU Boost" to push performance without compromising efficient power draw, it would be very interesting to see what AMD's PowerTune can do in that respect.

Anyone willing to do some tests?



That being said - I wonder whether AMD will react to NVidia's GPU Boost by pushing more aggresive PowerTune limits @ their upcoming cards.

For example:
Given that nearly all current HD7970s easily reach 1125Mhz - why not make dual-Tahiti a stock 1125Mhz card that's - for once - actually throttled by a (300W) PowerTune limit?

That way, the rumored (rather conservative) 850Mhz core clock should possibly end up as a kind of "worst case" (NvidiaSpeech: "base clock") scenario - but the card could actually push clocks a lot higher when power draw allows it.

I don't think you understand how these two mechanisms work.

AMD's DVFS is much better.

David
 
Why don't they actually use it, then?

What's the point of a card that comes with a 250W PowerTune limit - yet typically draws much less power?

anandtech's HD7970 review said:
On that note, at this time the only way to read the core clockspeed of the 7970 is through AMD’s drivers, which don’t reflect the current status of PowerTune. As a result we cannot currently tell when PowerTune has started throttling. If you recall our 6970 results we did find a single game that managed to hit PowerTune’s limit: Metro 2033. So we have a great deal of interest in seeing if this holds true for the 7970 or not. Looking at frame rates this may be the case, as we picked up 1.5fps on Metro after raising the PowerTune limit by 20%. But at 2.7% this is on the edge of being typical benchmark variability so we’d need to be able to see the core clockspeed to confirm it.
Link. So PowerTune presumably hardly kicks in at all on current cards.

I'm just wondering why they don't put it to better use. My best guess is that's mainly a question of marketing: No one wants his card to be throttled. That's why Nvidia sells their throttling technique the way they do - although the higher concept is very similar to PowerTune.

I, for one, don't look at GTX680 as a 1Ghz card that boosts clocks when possible. I look at it as a ~1,2Ghz card that throttles clocks when needed (though not below 1Ghz in worst case). It has to work like that - as the max "Boost clock" for each chip has to be validated / binned in some way.
 
Why don't they actually use it, then?

What's the point of a card that comes with a 250W PowerTune limit - yet typically draws much less power?


Link. So PowerTune presumably hardly kicks in at all on current cards.

I'm just wondering why they don't put it to better use. My best guess is that's mainly a question of marketing: No one wants his card to be throttled. That's why Nvidia sells their throttling technique the way they do - although the higher concept is very similar to PowerTune.

I, for one, don't look at GTX680 as a 1Ghz card that boosts clocks when possible. I look at it as a ~1,2Ghz card that throttles clocks when needed (though not below 1Ghz in worst case). It has to work like that - as the max "Boost clock" for each chip has to be validated / binned in some way.

First, GTX680 only runs up to 1.1GHz. Second, if you read the marketing materials, the AVERAGE boost is 5% (i.e. running at 1.05GHz).

How you can get a 1.2GHz number from that is beyond me.

Second, you're totally missing the point of DVFS.

The goal is to improve performance and power efficiency. One of the ways this happens is by reducing the guardbanding around the operating points.

DK
 
First, GTX680 only runs up to 1.1GHz. Second, if you read the marketing materials, the AVERAGE boost is 5% (i.e. running at 1.05GHz). How you can get a 1.2GHz number from that is beyond me.
I got the ~1,2Ghz number from the HardOCP review:
GPU Boost is guaranteed to hit 1058MHz in most games. Typically, the GPU will be going much higher. We experienced clock speeds in demo sessions that would raise to 1.150GHz and even 1.2GHz in such games as Battlefield 3.
Given that their GTX680 also drew the least load power when running Battlefield3 (as opposed Deus EX:HR, for instance), I assumed said 1.2Ghz to be the max clock. Arguing about numbers is not my intention, though.

Second, you're totally missing the point of DVFS.

The goal is to improve performance and power efficiency. One of the ways this happens is by reducing the guardbanding around the operating points.
I don't miss the point concerning the reduction of guardbanding - I just question AMD's VERY conservative approach towards doing so.

Their PowerTune limits are basically set in a way that makes sure the most power hungry game (i.e. the "1%") won't be throttled (if I remember early PowerTune articles correctly, AMD tested a lot of games over several HD6970s and finally found Alien vs. Predator 2010 and Metro 2033 to stress power draw the most. So the final card was adjusted in a way that put the max power draw found in those games right @ the stock PowerTune limit - - maybe with a small security bonus. The procedure for HD7970 probably was about the same).

Adjusting the PowerTune limit vs. clock speed balance in a way that makes sure the most power hungry games aren't throttled leaves a lot of untapped performance for the remaining "99%" of games, though. I'm obviously exaggerating a bit with that 1% vs. 99% metaphor - but you get my point.

As a matter of fact, one could argue that the only application PowerTune is currently intentionally supposed to throttle is Furmark (and the likes).

So it basically "untaps" the performance formerly lost by having to adjust the TDP of a card to Furmark levels - but it doesn't "untap" the performance lost by the fact that some games stress power draw a lot more than others (which is basically what NVidia's "GPU Boost" is supposed to do within a certain range).


All that being said, I didn't come here to have an argument. I just wondered whether it could be advisable for AMD to be a little more aggressive concerning their PowerTune balance (e.g. Hardocp found that HD7970 draws 30W less when running Skyrim than when running Battlefied3 - so HD7970 could potentially run Skyrim with much higher clocks (on average) while staying within the same TDP range).

That's all.
 
Last edited by a moderator:
I got the ~1,2Ghz number from the HardOCP review:
Given that their GTX680 also drew the least load power when running Battlefield3 (as opposed Deus EX:HR, for instance), I assumed said 1.2Ghz to be the max clock. Arguing about numbers is not my intention, though.


I don't miss the point concerning the reduction of guardbanding - I just question AMD's VERY conservative approach towards doing so.

Their PowerTune limits are basically set in a way that makes sure the most power hungry game (i.e. the "1%") won't be throttled (if I remember early PowerTune articles correctly, AMD tested a lot of games over several HD6970s and finally found Alien vs. Predator 2010 and Metro 2033 to stress power draw the most. So the final card was adjusted in a way that put the max power draw found in those games right @ the stock PowerTune limit - - maybe with a small security bonus. The procedure for HD7970 probably was about the same).

Adjusting the PowerTune limit vs. clock speed balance in a way that makes sure the most power hungry games aren't throttled leaves a lot of untapped performance for the remaining "99%" of games, though. I'm obviously exaggerating a bit with that 1% vs. 99% metaphor - but you get my point.

As a matter of fact, one could argue that the only application PowerTune is currently intentionally supposed to throttle is Furmark (and the likes).

So it basically "untaps" the performance formerly lost by having to adjust the TDP of a card to Furmark levels - but it doesn't "untap" the performance lost by the fact that some games stress power draw a lot more than others (which is basically what NVidia's "GPU Boost" is supposed to do within a certain range).


All that being said, I didn't come here to have an argument. I just wondered whether it could be advisable for AMD to be a little more aggressive concerning their PowerTune balance (e.g. Hardocp found that HD7970 draws 30W less when running Skyrim than when running Battlefied3 - so HD7970 could potentially run Skyrim with much higher clocks (on average) while staying within the same TDP range).

That's all.

Two things:

1. Reviewers measure power over 500-2000ms. Even Nvidia's relatively slow DVFS is operating at 100ms. I suspect AMD's is faster still, because the goal is managing dI/dt. I could measure 170W, but in reality there might be transients at 200-220W.

2. My understanding from Nvidia's engineers is that they are actually targeting around 170W. So they leave a bit of margin, although my recollection is not perfect.

3. Realistically, what you should be interested in is measuring the frequency over time for a variety of benchmarks and comparing it against power consumption.

My overall take is that GCN's DVFS seems to get about a 10% boost in nearly all cases. Kepler reports a 5% average. I think it's obvious which one is more advanced. Incidentally, neither one is particularly impressive compared to Sandy Bridge or Ivy Bridge's GPU. IIRC, the DVFS range for SNB is around 600MHz and it can basically double the frequency.

So I don't disagree with what you are saying...but I think it's a goal for GPU vendors to aspire to, and neither one is there yet.

David
 
dkanter said:
Realistically, what you should be interested in is measuring the frequency over time for a variety of benchmarks and comparing it against power consumption.

Yeah, that would be very interesting indeed.

Of course, what I said with respect to different average power draw in different games could be applied to fluctuating power draw within one specific game, too. I didn't see a power draw vs. frequency comparison so far, but just measering power draw over a certain period time seems interesting enough. Look at this power draw graph done by tom's hardware:

2uo42kl.gif


While GTX680 does a great job in taking full advantage of it's max power budget (presumably by constantly adjusting clock speeds in order to use the full budget at any time and in any scene - but that's exactly the info that this graph lacks) - HD7970's power draw fluctuates a lot. (Interestingly, HD6990 is pretty solid in that respect - maybe PowerTune is already more active here?)

If we assume that the max power draw recorded in that measurement roughly corresponds to HD 7970's PowerTune limit, there's still a lot of untapped power budget - and hence performance - right there.

My point is that if the HD 7970 ran a higher stock clock (say 1,1Ghz) at the same PowerTune limit (i.e. had a more aggressive clock speed / PowerTune balance), it could make way better use of that currently untapped power I marked yellow - while reverting to a more modest 925Mhz in those scenes that already stress power draw to the limit.

I'm no expert as to the intricacies of the different techniques in adjusting frequency corresponding to power draw - but I trust your judgment that AMD's technique is more sophisticated than Nvidia's. If that's the case, though, why don't they use it in the more aggressive way I suggested? Nvidia seems to be rather successfull applying a - presumably - inferior technique.
 
Last edited by a moderator:
One thing missing in this PowerTune vs GPU Boost discussion is clock granularity.
From my experience with Power Tune on HD6970 when it hits power wall card throttles using quite big jumps in clock. nVidia approach has each step increasing clock by 13MHz and I'm pretty certain AMD can't do that at this moment in time. At least not with Caymann ...
 
I'm no expert as to the intricacies of the different techniques in adjusting frequency corresponding to power draw - but I trust your judgment that AMD's technique is more sophisticated than Nvidia's. If that's the case, though, why don't they use it in the more aggressive way I suggested? Nvidia seems to be rather successfull applying a - presumably - inferior technique.
Dave (kind of - it would be interesting to know more about the reasoning behind the decision) explained in another thread:
[...] 7970 performs the same because it is programmed to be deterministic in performance across the the full range of chips and user conditions. Unlike NV's implementation, it can be programmed to be either deterministic or non-deterministic, and it was a specific implementation choice to be deterministic.
 
From my experience with Power Tune on HD6970 when it hits power wall card throttles using quite big jumps in clock.

Current implementation of PowerTune has per-MHz of clock control. In higher end chips the power is somewhat static leakage dominated, though, so on high power apps (Furmark) the reaction can be quite severe. If you are using an app that is close the the limits you can see more fine grained clock variation.
 
Dave (kind of - it would be interesting to know more about the reasoning behind the decision) explained in another thread:
High end boards can have a large power delta from chip to chip variation - if you were to remove the voltage/leakage bins a board like Cayman could have a variation of 90W or more from one chip to another. While we have reduced that with better binning, and will continue to do so, there can still be a large power variation - when you are doing performance schemes based on power or thermals then the default performances could vary quite significantly from one board to another and you can end up with a situcation with one user getting quite different performances to another.
 
Back
Top