AMD: Sea Islands R1100 (8*** series) Speculation/ Rumour Thread

Dave.
So, correct me if I'm wrong, AMD chose to insure all end users the same performance, although theoretically they could very well tune it in a similar way as Nvidia has done with current technology?
 
jacozz said:
Dave.
So, correct me if I'm wrong, AMD chose to insure all end users the same performance, although theoretically they could very well tune it in a similar way as Nvidia has done with current technology?
I think they could.

The question is: do you stick to a set clock specification and let the customers play with overclocking, or do you overclock dynamically without the customer having to do anything himself (and still with some kind of guard band that allows some additional, limited, overclock.) I think the latter is more appealing for most users, and for Nvidia for that matter, but that's just a matter of opinion.

That's really orthogonal to the implementation method. Dave, you say your method is deterministic. I assume that you have a shitload of idle/activity counters in your design that are fed into a weighed sum to estimate, with fairly decent accuracy, the instantaneous power for different sections? I can see who this has advantages over measuring power at the regulators. I'm not sure it makes a major difference for a user. As always, it's not what you have but what you do with it. Marketing it only as a fail-safe mechanism against burning up your GPU (that's always how I looked at it at least) is not the kind of feature that make my heart and games go faster, even if I appreciate the details behind it.
 
Dave said:
High end boards can have a large power delta from chip to chip variation
So... If I'm unlucky to get a leaky 680-chip, I'll get worse performance, compared to the probably handpicked cards the review sites got?
 
Cayman could have a variation of 90W or more from one chip to another.
That much, eh? Thanks for sharing. Makes perfect sense from a consistency perspective (and enthusiasts will overclock/tweak their systems anyway), if not for maximum stock benchmarking performance on review cards. :)
 
Dave said:
So... If I'm unlucky to get a leaky 680-chip, I'll get worse performance, compared to the probably handpicked cards the review sites got?

Could look like it. I'm sure some site will soon test it on a couple of random retail cards, and if they turn out to be slower - in the same system - than the review sample, it will be really interesting what the various big sites will do.
I could imagine places like computerbase.de (who have run with non-standard cp settings before) would start testing without boost (but maybe some small fixed OC).
Or maybe both like hardware.fr already managed to do
 
Dave.
So, correct me if I'm wrong, AMD chose to insure all end users the same performance, although theoretically they could very well tune it in a similar way as Nvidia has done with current technology?

PowerTune is all on-chip and is programmable - the inputs to PowerTune are things such chip leakage, voltage and temps. We can enable these to be variables on a per-chip basis, however we have chosen to input these as constants (per SKU) based on characterisation of the "worst case".
 
I think they could.

The question is: do you stick to a set clock specification and let the customers play with overclocking, or do you overclock dynamically without the customer having to do anything himself (and still with some kind of guard band that allows some additional, limited, overclock.) I think the latter is more appealing for most users, and for Nvidia for that matter, but that's just a matter of opinion.

That's really orthogonal to the implementation method. Dave, you say your method is deterministic. I assume that you have a shitload of idle/activity counters in your design that are fed into a weighed sum to estimate, with fairly decent accuracy, the instantaneous power for different sections? I can see who this has advantages over measuring power at the regulators. I'm not sure it makes a major difference for a user. As always, it's not what you have but what you do with it. Marketing it only as a fail-safe mechanism against burning up your GPU (that's always how I looked at it at least) is not the kind of feature that make my heart and games go faster, even if I appreciate the details behind it.

It does make games go faster, because AMD can now reach higher clocks than before, when they were limited by the power draw of pathological cases like Furmark. Now they can set their clocks in a way that ensures minimal throttling in games, and just let PowerTune handle Furmark. In practice I don't know if they're gaining 2, 5, 10 or 15%, but they're gaining something.

Of course, this has made software like OCCT and Furmark quite pointless.
 
The question is: do you stick to a set clock specification and let the customers play with overclocking, or do you overclock dynamically without the customer having to do anything himself (and still with some kind of guard band that allows some additional, limited, overclock.) I think the latter is more appealing for most users, and for Nvidia for that matter, but that's just a matter of opinion.
The issues is that, unless there is some way of figuring ASIC to ASIC speed variation, this implementation of Boost isn't overclocking. "Overclocking" is allowing the end user to take the chip variation margin within a Bin/SKU and play with it; Boost still needs to guarantee that all the chips actually run at that maximum boost state performance.

That's really orthogonal to the implementation method. Dave, you say your method is deterministic. I assume that you have a shitload of idle/activity counters in your design that are fed into a weighed sum to estimate, with fairly decent accuracy, the instantaneous power for different sections? I can see who this has advantages over measuring power at the regulators. I'm not sure it makes a major difference for a user. As always, it's not what you have but what you do with it. Marketing it only as a fail-safe mechanism against burning up your GPU (that's always how I looked at it at least) is not the kind of feature that make my heart and games go faster, even if I appreciate the details behind it.

Yes, PowerTune is utilising activity counters all over the chip. Other than the customer benefits of enabling the capability across all the products without additional BOM costs, the primary end user benefits are that it can be programmed to be deterministic and it that is tuning the performance of the board (at its designed TDP segmentation) the best performance in the applications that actually matter for performance rather than setting clock / TDP targets based on the worst case application. It has always been marketed (in the enthusiast segement) as something that is enabling faster application performance.

So... If I'm unlucky to get a leaky 680-chip, I'll get worse performance, compared to the probably handpicked cards the review sites got?

With a deterministic implementation you can look at the performance on a review and you should be able to mirror that performance on different board (on the same system) irrespective of the environmental conditions. In a non-deterministic scenario you may see performance variations coming the the chip-to-chip variation (or even board to board, as even the components will have some efficiency differences) both also environmental variation as well - hotter components use more power so putting it in a worse ventilated cause or running in multi-GPU with boards heating each other up, will inevitably case some level of performance variation.

It was an active decision on our part to take these variables out of the equation on the current high end products.
 
I remember that Intel's Montecito implemented an analog version of DVFS (Foxton), based on current and voltage measurements on-chip.

The feature was cut, to the detriment of the Itanium core because it seemed disabling something so intertwined with the clocking scheme lopped off some of the upper range of the chip's freqency bins.
The story as I've seen it told was that customers trying to qualify the chip objected to the variability and unpredictability. Later versions of the tech were digital and more deterministic.

There seem to be some large system providers that turn any kind of clock scaling off due to synchronization and time-keeping problems, although the decoupled nature of HPC GPU use over a bus probably rules out that level of precision already.


I wonder if the perf and event counters and DVFS system could be used to provide more feedback when overclocking. Not just wondering if or why a chip is throttled, but an application that can tell you in almost real time that the GPU throttled back because of temps or "the VREG cried uncle".
I know CPU thermal events can be added to OS event logs, which is something in that vein.
 
Current implementation of PowerTune has per-MHz of clock control. In higher end chips the power is somewhat static leakage dominated, though, so on high power apps (Furmark) the reaction can be quite severe. If you are using an app that is close the the limits you can see more fine grained clock variation.


I've tested my card just after launch with initial drivers using OCCT and 3DM06 Perlin Noise.
When playing with clocks I've discovered that using default PowerTune value and lowering core clock to 8xxMHz yielded better results in PerlinNoise than stock 880MHz setting. After investigating this behaviour using GPU-Z I've noticed that at stock clocks PT would throttle GPU from 880MHz to 700MHz or even 500MHz back and forth during that test. With 8xxMHz core PT didn't kick in at all and average FPS was higher. BTW PerlinNosie only needs +5% PT on my card to get full performance at stock clocks.

Anyway I haven't had a chance to run this test on recent drives and updated BIOS on my HD6970. Will have to check it out :smile:
Thanks Dave for wonderful insight you're giving us here!

Notes: 8xxMHz is lower than 880MHz but can't remember exact clock I used for my test last year. It should be between 800MHz and 850MHz!
 
Bear in mind that you are also subject to the sampling frequency of the application you're using to measure clocks. I've no clue how or when GPU-z takes clock samples.

Edit Also, Furmark is an interesting one to look at if you are watching the clock traces. Because the "furry doughnut" is rotating you can see higher clocks when it is on its side and lower clocks when it it fully facing because it is rendering more fur here.
 
It was an active decision on our part to take these variables out of the equation on the current high end products.
I see what you did there :D

Most probable candidate for performance schemes based on more aggessive power thresholds would be HD7750 - and look what I found (should read way more midrange reviews):

A stock PowerTune setting that's actually limiting gaming performance:

6fwdic.gif


And what's the result of PowerTune actually kicking in at stock settings?

Nearly 50% better perf/watt than GTX680
:oops:

As I said earlier: A detailed perf/power review of an OCed HD7970 @stock PowerTune limits would be really interesting. Should behave rather similar to what Nvidia did on the GTX680 - just "self made".
 
Bear in mind that you are also subject to the sampling frequency of the application you're using to measure clocks. I've no clue how or when GPU-z takes clock samples.

Edit Also, Furmark is an interesting one to look at if you are watching the clock traces. Because the "furry doughnut" is rotating you can see higher clocks when it is on its side and lower clocks when it it fully facing because it is rendering more fur here.

True, just ran PerlinNoise and now clocks are adjusted much more consistently to what I remember from initial testing. Still leaving PT at 0% and clocking GPU from 880MHz to 910MHz brings performance decrease (minuscule 5FPS from over 800FPS but consistent) and going down to 860MHz gives about same results as 880MHz.

So thanks to your input and my few little tests I'm now confident PT granularity is much better than nVidia's GPU Boost.
 
Dave Baumann said:
The issues is that, unless there is some way of figuring ASIC to ASIC speed variation, this implementation of Boost isn't overclocking.
Isn't that something that drivers already know anyway?
Speed and current related test structures have been on chips since as long as I can remember. GPUs already used such information to determine the optimal voltage at which a chip should run to ensure a particular frequency without wasting power. If not read directly from the test structure, the speed info may be stored in fuses. So it looks to me that all the info is just there and that the real question is whether or not you want a conservative but deterministic product or one that pushes the envelope to the max, depending on the sample.

It has always been marketed (in the enthusiast segement) as something that is enabling faster application performance.
Agreed, it does since you don't have to size for the very worst power case.

It was an active decision on our part to take these variables out of the equation on the current high end products.
Understood.

I'm guessing that this decision with be under some scrutiny right now. ;)
 
Mianca
That's interesting. Did I understand this correct, (My german sucks), this is only with powertune set to +20% with no overclock? Yes?

It seems to me that AMD have some software reserve, if they choose to compete on equal terms.
 
Yeah, they did this kind of "PowerTune limitation" test on pretty much every HD7*** series card - and HD7750 was the only card that showed a significant performance boost by just putting the PowerTune switch to +20%.

Reading up on this, I found that HD7750's maximum power consumption almost exactly equals its average power consumption (41W average vs. 43W peak vs. 43W MAX @ techpowerup) - that's nothing short of breathtaking. :oops:

No other card comes even close to reaching this type of power-based throttling. It's no high-end card - so one certainly needs to take that kind of comparison with a huge heap of salt - but the corresponding numbers for GTX680 are: 166W average vs. 186W peak vs. 228W (!!) MAX.

I guess I just understood what David meant whe he said that AMD's PowerTune is way more sophisticated than Nvidia's GPU Boost stuff.

When gaming, HD7750 basically uses every last bit of power within its budget. All the time. In every game.

I'm seriously impressed.


================

EDIT:
Some very interesting numbers concerning clock speeds and voltages @ different load levels for a huge bunch of cards - some great review work right there.
Happy I came across that site.

"Load operation" is the Googlish translation for "idle / without load", btw.
 
Last edited by a moderator:
Yup, I think it's better to have predictable performance for your cards when they reach end consumers hands. Rather than to have wildly fluctuating speeds depending on your particular chip characteristics.

Even Intels turbo is pretty much guaranteed on every CPU they release. Hence the high OC headroom on most cards as I'm guessing like the 7970, they set base clock and turbo clocks with regards to the worst case scenario.

Predictable performance is more important to your average consumer, IMO. For those interested in overclocking beyond those limits, it's there for them if they wish to experiment. For John Smith buying a video card based on a review he read he can be pretty much guaranteed that his card will be the same speed as Bob Johnson's at the office or anyone else.

Nothing sucks more than buying the same product as someone else but due to random roll of the die (pun intended) he may have paid the same amount of money for a crappier performing card than his buddy. And things like that just suck the proverbial donkey gonads.

It's one thing to experiement with overclocking and get a bad overclocking card. At least your stock performance is the same as everyone else's and every reviewer's. With Powerboost? Put down your money and roll the die...

And considering the "average" according to Nvidia is 1058...that means there has to be quite a few worse than that if most of the reviewers are getting ones that boost fairly high. Somewhat interesting that none of those appears to have made it into reviewer's hands, however.

Regards,
SB
 
Considering 7970 OCs to 1000-1350, Sea islands top end should come in 4 flavors

8970, 8975, 8980, 8985. If its possible to bin pcb parts, then do that too. Add the extra vddc phase on the 8980 and 8985 pcb. If it clocks like tahiti, the 70 sku could be the 1050mhz chips, 75 the 1150, 80 the 1250mhz and 85 the 1350mhz.

Buyers pay the premium knowing ahead of time what they're getting. If its not to costly and is good for AMDs partners, everyone benefits.
 
Back
Top