Perf/watt/IHV man hours/posts *split*

That 60% figure seems too good to be true, but that's roughly what Anandtech reported in a table of advertised benefits of various processes. The article is about glofo's 7nm roadmap, but it compiles lots of other goodies.

http://www.anandtech.com/show/10704/globalfoundries-updates-roadmap-7-nm-in-2h-2018

And now I'm wondering where they found all of those, lol.
Those are from various presentations over the last year or so. I'd have to poke Anton to tell you where a given figure comes from, but it's usually an investor presentation. Either way, all of the numbers are direct from the manufacturer.
 
Maybe it is a perfect sample, but just the fact that it exists shows how much variation there is in the process. It might not even be a good sample, just better than most as it only takes one transistor to pull down the others. Judging by how much AMD was willing to pay GF, I'd imagine there is an issue.
An example having to use a custom 'tuned' XFX bios (their words not mine), along with them stating their True Clock Technology (however that works and interfaces with system environment).
You might as well say the same then about the sample of 1070/1080 hitting just under 2300MHz on air with a comparable custom bios for its model.
These are outliers of actual norm, which is why AMD has defined 1266MHz with 1.15V stable and Nvidia defined wall is 1.1V that gives around 2050MH, give or take a little.

Edit:
One thing Jayz2Cents does not mention is that the cards are noisy for cooling relative even to other 480s.
Cheers
 
Last edited:
For reference regarding the Jayz2Cents review of the GTR (non-black model so 1288MHz is the custom default).
Putting the extra OC aside, Jayz2Cents recorded 1288MHz at 1.056V, against the 'reference' 480s having 1266MHz at 1.12 to 1.15V.

To get a feel what is happening, fingers crossed some other measurement review sites or well known OCers get their hands on this card.
Cheers
 
An example having to use a custom 'tuned' XFX bios (their words not mine), along with them stating their True Clock Technology (however that works and interfaces with system environment).
No idea what they're tuning, maybe it's that power control thing Polaris allegedly had, but that sounds more like marketing to me. "Tuned" just meaning they used their own custom clocks.

You might as well say the same then about the sample of 1070/1080 hitting just under 2300MHz on air with a comparable custom bios for its model.
While using less power? It's not the clocks I'm interested in as much as the possibility that the leakage issues went away.

These are outliers of actual norm, which is why AMD has defined 1266MHz with 1.15V stable and Nvidia defined wall is 1.1V that gives around 2050MH, give or take a little.
Outliers are still interesting, as they demonstrate potential limits of the actual architecture. The process will still determine where that norm lands. The Jayz sample is right around the original AMD marketing numbers. The process being really bad goes a long way to explain why the perf/watt on an improved process with roughly the same architecture is so bad. As well as why AMD is willing to pay GF so much to get out of the deal.
 
Those are from various presentations over the last year or so. I'd have to poke Anton to tell you where a given figure comes from, but it's usually an investor presentation. Either way, all of the numbers are direct from the manufacturer.

I figured as much.

I'm surprised that all fabs seem to use similar enough metrics to be compiled in a single table like that.

It's pretty cool. I wish there were more resources for comparing various metrics of processes through the years.
 
However, AMD felt that they need to hit X performance level and thus sacrificed perf/watt by pushing the silicon beyond what would be considered a good voltage and frequency level. It's the same thing they did with Fury X.
I've pointed this out before, but this doesn't follow in a with the Power/Clock (PowerTune) mechanism in place. There isn't a "single voltage/clock" to push to, there are multiple and the voltage is chosen that supports the particular clock the for that particular power state. The only time there would be benefit in the path you describe is in a scenario where you say "TDP and PowerTune go hang, I'm going to run at the peak state whenever there's activity!" - more or less no product since Hawaii, to a lesser or greater extent, has done that. And, as pointed out, in a TDP bound product increasing the voltage for the peak clock(s) would have a detrimental affect rather than positive.
 
No idea what they're tuning, maybe it's that power control thing Polaris allegedly had, but that sounds more like marketing to me. "Tuned" just meaning they used their own custom clocks.

While using less power? It's not the clocks I'm interested in as much as the possibility that the leakage issues went away.

Outliers are still interesting, as they demonstrate potential limits of the actual architecture. The process will still determine where that norm lands. The Jayz sample is right around the original AMD marketing numbers. The process being really bad goes a long way to explain why the perf/watt on an improved process with roughly the same architecture is so bad. As well as why AMD is willing to pay GF so much to get out of the deal.
Yeah outliers are interesting but they can skew expectations and are beyond the normal scope of a design.

The specific BIOS with their True Clock Technology can be doing a lot of things, including changing behaviour of the dynamic power/environment (including WattMan)/etc, until we know more (including using more advanced measurements and engineer-extreme OCing users) is not really a useful indicator to the state of Polaris/Vega.

Unfortunately one cannot use Jayz2Cents power figures for reasons I explained in the past, furthermore BuildZoid when breaking down the XFX GTR GPU pointed out this is further exacerbated as system is not capturing watts-demand correctly from a software perspective anyway due to the auxiliary voltage controller cannot measure power consumption.
Go to 4m 5secs:

Jayz sample needs to be tested for reasons I mention above by those with more advanced measurement tools or extreme OCers understanding of the VRM/power stage implementation and hooking into this.
I am not sure where that 110W-120W 'TDP'-TPB figure originally came from, can only find leak-rumour information around it.
And so the outlier I mentioned for Pascal is just as relevant as the power used is more than what Jayz realises, but they should always be taken in context and are not part of the stable engineer spec/envelope used by either AMD or Nvidia.

Cheers
 
http://wccftech.com/amd-polaris-revisions-performance-per-watt/

Horrible source I know, but ~50% better perf/watt on a new Polaris metal spin. Corresponds to the embedded parts we've seen and seems likely they will be trickling into the mainstream. No reason to keep making the old version given an improved metal spin.

The specific BIOS with their True Clock Technology can be doing a lot of things, including changing behaviour of the dynamic power/environment (including WattMan)/etc, until we know more (including using more advanced measurements and engineer-extreme OCing users) is not really a useful indicator to the state of Polaris/Vega.
WattMan is my guess on what the True Clock Technology is adjusting. I don't see that accounting for the magnitude of the changes Jayz was reporting. Help sure, but the change was so significant it's difficult to imagine AMD and the other partners missed it until now.
 
http://wccftech.com/amd-polaris-revisions-performance-per-watt/

Horrible source I know, but ~50% better perf/watt on a new Polaris metal spin. Corresponds to the embedded parts we've seen and seems likely they will be trickling into the mainstream. No reason to keep making the old version given an improved metal spin.


WattMan is my guess on what the True Clock Technology is adjusting. I don't see that accounting for the magnitude of the changes Jayz was reporting. Help sure, but the change was so significant it's difficult to imagine AMD and the other partners missed it until now.

Khalid is seriously jumping the gun IMO and again making assumptions like he did recently with Vega.
I was wondering where to post my thoughts and questions on that latest news linked but here will do.

The previous generation E8950 MXM module was a 2048 stream processors/32 ROPs Tonga XT based product that had 3 TFLOPs FP32 at under 95W, released Sept-Oct 2015.
That means it is based upon either the 280X/380X/M295 GPU, all of which are substantially higher than 95W even allowing for higher TFLOPs and generation change.
280x was 3.4 TFLOPs with TDP 250W.
380X was 3.9 TFLOPS with 190W
M295X was 3.4 TFLOPs with ? not sure difference between it and the 280X in terms of TDP.
M290 (with only 1,280 stream processors, yeah older generation) was 2.3 TFLOPs with 100W.

Now also bear in mind the Polaris RX470 has 3.7 TFLOPs FP32 at 120W TDP, and how close that is to the old generation Tonga (with additional disadvantage of being 28nm) E8950 MXM of 3 TFLOPs at 95W, and can see there is a challenge of correlating discrete to embedded platforms.

So unfortunately it is not possible to correlate the embedded solutions/technology and conclude this is an overall efficiency-performance improvement for Polaris generally on all platforms.
Cheers
 
Last edited:
It is not possible to correlate the embedded solutions/technology and conclude this is an overall efficiency-performance improvement for Polaris generally on all platforms, otherwise the same could had been said historically when the E8950 MXM was released.
I'm assuming he has a source for the metal spin, which is why I'm still taking that with a huge grain of salt. It's always possible the new metal spin is at a different fab, but would seemingly be landing sooner that I'd have expected.
 
I'm assuming he has a source for the metal spin, which is why I'm still taking that with a huge grain of salt. It's always possible the new metal spin is at a different fab, but would seemingly be landing sooner that I'd have expected.
You know I spent a fair bit of time making sure I had the right info.
Before defending him explain the logic of why embedded Tonga uses much less power than the discrete version, or why the Tonga embedded is as good as Polaris discrete, which if you consider one is also 28nm either makes his article baffling or Polaris was an utter disaster technically (which it is not) at launch.
Edit:
The Polaris RX470 has 3.8 TFLOPs FP32 at 120W TDP, and how close that is to the old generation Tonga (with additional disadvantage of being 28nm) E8950 MXM of 3 TFLOPs at under 95W TDP, and can see there is a problem of correlating discrete to embedded platforms.
Cheers
 
Last edited:
You know I spent a fair bit of time making sure I had the right info.
Before defending him explain the logic of why embedded Tonga uses much less power than the discrete version, or why the Tonga embedded is nearly as good as Polaris discrete.
Cheers
Apologies as I'm not defending him. I'm skeptical of his claims as there are no other sources, but claiming a new metal spin would seem difficult to make up. Embedded versions to my understanding were normally clocked lower, although I haven't looked into them very much.
 
Apologies as I'm not defending him. I'm skeptical of his claims as there are no other sources, but claiming a new metal spin would seem difficult to make up. Embedded versions to my understanding were normally clocked lower, although I haven't looked into them very much.
Ah thanks.
But again it breaks down with this simple example:
The Polaris RX470 has 3.8 TFLOPs FP32 at 120W TDP, and how close that is to the old generation Tonga (with additional disadvantage of being 28nm) E8950 MXM of 3 TFLOPs FP32 at under 95W TDP, and can see there is a problem of correlating discrete to embedded platforms.
His article was nonsense and should be ignored IMO, otherwise I might as well say Polaris was crap as it had no improvements over 28nm embedded Tonga.
Cheers
 
Last edited:
The Polaris RX470 has 3.8 TFLOPs FP32 at 120W TDP, and how close that is to the old generation Tonga (with additional disadvantage of being 28nm) E8950 MXM of 3 TFLOPs FP32 at under 95W TDP, and can see there is a problem of correlating discrete to embedded platforms.
Wasn't the 470 around 5TFLOPs at 120W? 480 being 5.8@150W. Coorelation doesn't seem all that bad.
 
A new metal spin could explain why the mobile parts are still all missing in action. For something that was allegedly designed for mobile first (in case of Polaris 11) it sure is surprising there's still nothing to be seen.
I'm still sceptical, though...
 
Wasn't the 470 around 5TFLOPs at 120W? 480 being 5.8@150W. Coorelation doesn't seem all that bad.
Yeah good point.
I think a bit depends upon the card-spec-clocks-boost, what about those that were true 4GB and not the rebadged 8GB higher spec models.
Also worth remembering it was not 150W but around 165W when tested for the 480.

The 460 is 2.2 TFLOPs at 75W TDP, to hit 3TFLOPs that correlates to 102W.

Edit.
Maybe some incorrectly reporting lower TFLOPs for the 470 or basing it upon the base clock (such as videocards), which wrongly skews it *shrug*
Problem with no 'reference' 470 design tricky to do a reasonable comparison, can only really compare custom AIB 470 to equal custom AIB 480 model to see how the performance and consumption compares.
Anyway point that embedded uses substantially reduced power compared to discrete still stands.

Cheers
 
Last edited:
Wasn't the 470 around 5TFLOPs at 120W? 480 being 5.8@150W. Coorelation doesn't seem all that bad.

With it being late/early (tired) it was wrong of me to muddy the discussion bringing in the 460 and 470 as the only applicable GPU with regards to embedded now is the RX480/Polaris embedded MXM and the cards associated with Tonga/previous gen embedded MXM.
The most accurate-detailed measurement we have to date is from PCPer and Tom's Hardware and with the reference design they were not hitting peak compute due to thermal/power constraints that need manual tweaking with the 'reference' card (just like with the Nvidia ones).
So both averaged around 1210MHz with 160W-165W.
Overclocking that then averaged 1255MHz (when stressed for longer than 3 mins, 1st 70secs can hold around 1300MHz) with 190W.
TPU when they reviewed the MSI RX480 with default custom 1303MHz clock with 196W, better than the 'reference' model anyway at such peaks and not surprised considering temps being kept cooler helps.
Hardware.fr measures the Sapphire Nitro+ 8GB with default custom silent mode 1266MHz locked with 156W-189W depending upon game, and normal mode worst case 1335MHz with 214W (Witcher 3) and best case 1345Mhz with 192W (BF4)

Just putting this out there because like many others I took the real board earlier measurements from these sites to mean the full 1266MHz clock when in fact they were not, ideally we will eventually get a greater base of results from such sites.
And goes to show how variable the relationship is with the latest algorithms and dynamic power management-thermal technology with games-software-benchmarks, along with how long the test is run.
PCPer/Tom's Hardware/TPU were using a single game that is power demanding for either manufacturer for these measurements.
This post is separate to the power demand/TDP of the embedded MXM platform.
Cheers
 
Last edited:
Back
Top