AMD: Speculation, Rumors, and Discussion (Archive)

Status
Not open for further replies.
I disagree with your analysis of the curve beyond the sweetspot - can't think of anything else to say.

Additionally, the sweetspot is at a higher frequency for FinFET, which is precisely what everyone wants and expects.

AMD chose to end the FinFET curve where it did for some reason,
After the sweetspot the process is effectively in "runaway". The gradient on the FinFET curve at its top-most power is steeper (i.e. less power per unit of frequency) than the 28nm curve.

which might be something to look out for later.
I think it's probably fair to say that AMD has been running its 28nm GPUs further along the curve than NVidia. Any clawback that AMD benefits from with the higher sweetspot on FinFET is one that benefits NVidia at least equally. And, arguably, NVidia gains even more - as NVidia is running at typically 20% higher clocks (at least in portions of the GPU, who knows if clocks across the entire GPU are 1:1). The gentler gradient after sweetspot might take NVidia to 25% higher clocks for example.

At least for CPUs that can push things further, the margin of improvement does reduce at least somewhat as the curves flatten out further.
CPUs can push further because practically none of the die is active at Fmax, especially in single-threaded work. Heavy multi-threaded AVX-512 usage leads to massive amounts of throttling in Intel CPUs.

Somewhere around .5-.7, there's a point where AMD could spend 2x the transistors at 2/3 the overall FMax, and given where 28nm reaches those speeds it might be a tie or a small win to FinFET in absolute power. That's about 4 units on the Y axis out of maybe a reasonable max of 6 for 28nm--when I think there is evidence from other GPUs that AMD has had trouble sustaining or getting much benefit from that portion.
2x transistor costs real money (AMD profit margin) in terms of dies per wafer. Relative Power costs AMD's users money. If AMD can persuade them to buy...

AMD might be taking advantage of the frame cap, if it can drop the GPU into the portion of the FinFET curve below 28nm's minimum, which might be lost if the frame rate were allowed to vary more.
The frame cap should at least mean that both GPUs are running at their max frequency.
 
Now, if 850Mhz is the "base clock" then indeed base clock to base clock the 950 has a 21% clockspeed advantage. But if we're going by fastest clock speeds that's a 40% clockspeed advantage for the 950 from using AMD's card as a base.
AMD's specified engine clock is its boost clock. AMD clocks only go in one direction, down, and are never guaranteed.

NVidia specifies a base clock (20% higher than AMD's boost clock in this case) with a boost clock of 40% higher, as you say. In reality the 60fps vysnc means that both cards are running at maximum clocks, so NVidia is likely running at 40% higher clocks.

What I cannot find anywhere on teh intarwebz are benchmarks for GTX950 in this scenario. And then there's the question of how strongly this game favours AMD.

Maybe someone out there can put together some 28nm metrics for Battlefront Medium settings on this training mission in terms of performance per mm² and performance per watt. And be careful to use a mixture of Maxwell and Maxwell v2 GPUs to compare with Tonga.

So right now we have no idea which GPU is closer to its performance limit in this test.

We also haven't a clue as to how many transistors are actually in the AMD card, as the only report so far is that the "card looks really small" and even if you could measure it with a tape measure, that wouldn't tell you the specific average transistor density used for this particular GPU. "Double the transistor density" is only a guideline for if you go straight transistor density with no regard to clockspeed/power draw.
I strongly suspect there's rather more than double the transistor density. Spending transistors on better performance through architecture is completely normal in GPUs. Spending transistors to compensate for low clocks (witness 28nm) and still losing on performance per watt lead to AMD comprehensively losing at 28nm (no matter now many 7970s were sold for bitcoin mining).

One could argue that Gameworks bullshit was explicitly designed to knock-back AMD: making performance per watt and per mm² hurt even more. "Neutral code" would have shown AMD being much more competitive. But we still see games (e.g. Battlefield 4) that hammer AMD, despite "being written for AMD"...
 
Additionally, the sweetspot is at a higher frequency for FinFET, which is precisely what everyone wants and expects.
I have not disputed that it is higher. It's just that the power point's improvement over the prior node is not in same order of magnitude as earlier in the curve, and it starts to become less steep before the 1 unit mark. I consider this to be a power-limited scenario and so tend to make comparisons at the same power points, unless AMD decides to leave performance on the table.

After the sweetspot the process is effectively in "runaway".
I agree that that is after the sweetspot, but I wouldn't consider the point of a curve just short of runaway to be the ideal.

Any clawback that AMD benefits from with the higher sweetspot on FinFET is one that benefits NVidia at least equally. And, arguably, NVidia gains even more - as NVidia is running at typically 20% higher clocks (at least in portions of the GPU, who knows if clocks across the entire GPU are 1:1). The gentler gradient after sweetspot might take NVidia to 25% higher clocks for example.
Quite possibly, I have only been referencing AMD's claims for the prior and upcoming process.
Its curve for whatever process or amalgam of processes it is has similar behaviors in planar versus FinFET that Intel made back when it transitioned.

CPUs can push further because practically none of the die is active at Fmax, especially in single-threaded work. Heavy multi-threaded AVX-512 usage leads to massive amounts of throttling in Intel CPUs.
And even they generally draw the line where the FinFET process would have become a significant regression versus if it were planar.

2x transistor costs real money (AMD profit margin) in terms of dies per wafer. Relative Power costs AMD's users money. If AMD can persuade them to buy...
Power in a power-limited regime determines performance and what parts of the market it can address, which is a significant factor in how many customers AMD can entice and at what price relative to competitors and the now discounted prior generations.

The frame cap should at least mean that both GPUs are running at their max frequency.
The 850E figure AMD gives for clocks sounds like it's a case of a case of a target that PowerTune is operating around.
 
Maybe someone out there can put together some 28nm metrics for Battlefront Medium settings on this training mission in terms of performance per mm² and performance per watt. And be careful to use a mixture of Maxwell and Maxwell v2 GPUs to compare with Tonga.
I already posted some numbers here - https://forum.beyond3d.com/posts/1889628/
In my experience, X-wing training has higher framerate than most of lanscape missions, so avg framerate for reference 950 is most likely around 94 fps
 
I'm talking about the implications for enthusiast discrete: utterly miserable. 30% more performance from the node change, before architectural improvements, after 5 years is just horrible.

That's why you don't go with the speed improvement, you go with the power savings.
With 14Finfet you get ~50-60% power savings from staying roughly at the same frequency, so you can double your transistor count, getting somewhere around twice the performance from last gen in the same package, roughly the same die size and power consumption.

Edit- To clarify the "30% more performance" is talking about an increase in speed of the transistor, not overall performance improvements for GPUs.

2nd Edit (because I think I see what happened)-

Ryan's quote is misleading.
Ryan Shrout said:
AMD’s Joe Macri stated, during our talks, that they expect this FinFET technology will bring a 50-60% power reduction at the same performance level OR a 25-30% performance increase at the same power. In theory then, if AMD decided to release a GPU with the same power consumption as the current Fury X, we might see a 25-30% performance advantage.

What he is forgetting to take into account is the die size. If AMD just shrunk Fury X, theoretically, it would be < 300mm2 at <150w. If they wanted to make that < 300mm2 die use ~250-275w, they could just increase frequency by 30% and call it a day.
But, has either Nvidia or AMD ever done that in the past? No, they typically try to find the sweet spot in regard to both metrics. Like if they find they can reduce power per transistor ~40-45% but also increase frequency ~10-15% which would still allow them to yield a near doubling of transistor count.

~16b trannies, ~500-550mm2, ~1.1-1.2ghz, <275w
That should be good for close to 2x over Fury X performance.

I find it hard to believe Ryan didn't know Joe was specifically talking about pure 14FinFet transistor improvements...
 
Last edited:
I already posted some numbers here - https://forum.beyond3d.com/posts/1889628/
In my experience, X-wing training has higher framerate than most of lanscape missions, so avg framerate for reference 950 is most likely around 94 fps
Both systems were capped to 60 FPS with v-sync

There's something wrong with the system description. It says "Core i7 4790k" with 4x4 DDR4 2600. That's just not possible as Haswell and the Z97 boards don't support DDR4.
Furthermore, the numbers don't add up. A supposedly lower-power system with the GTX 950 was found to consume close to 160W in Dragon Age Inquisition (same FrostBite 3 engine).
The slide has error in it with the memories, their video about it says correctly DDR3.

 
well yeah that's the kicker lol, AMD isn't going to show performance with comparison figures to any graphics card out right now this early in production that would give too much opportunity for competition to formulate a marketing/sales plan. And possibly even production but it might be too late for any major changes in production though.
 
Thanks, so at a first approximation, R9 270X is about the same performance. 1050MHz, 1280 ALU lanes, 180GB/s, 212mm². R9 285 is 359mm², has about the same bandwidth and is about 13% faster. Sadly, I think this benchmark is too easy for GPUs of this spec.
 
I know, but I wonder why some people here have already decided that demonstrated Polaris GPU is match to GTX 950 in uncapped performance

Because they explicitly said so in the video.
 
Battlefront appears to be around average for the performance difference between Nvidia and AMD, maybe slightly better for AMD than average but not by anything like a significant amount: http://www.tomshardware.com/news/star-wars-battlefront-pc-benchmarks,30608.html

Since the clock speed is so drastically low the demo seems bent on showing off the most power savings it can, not surprising given the PR focus of "low power" and mentions of targeting laptops. If we can assume a clockspeed of 40-50% more for targeting more performance for the GPU (reasonable given AMD's charts seems show 1.2-1.3ghz frequency as the higher end of the curve, but there's no actual marking so who knows) it would place it on the higher end of between a 960 and a 970. Not a bad place to be for a low end GPU, given a 970 is just not quite enough for 1440p 60fps gaming as it is (Battlefront has it <60fps average on Ultra) but more than enough for 1080p.

Given Nvidia seems to have just outed a Pascal GPU that targets around a 970 exactly (4 teraflops per according to their car X2 thing, right round the 970 mark) it'll be interesting to see, assuming both assumptions are correct, how each is positioned against each other.
 
What he is forgetting to take into account is the die size. If AMD just shrunk Fury X, theoretically, it would be < 300mm2 at <150w. If they wanted to make that < 300mm2 die use ~250-275w, they could just increase frequency by 30% and call it a day.
But, has either Nvidia or AMD ever done that in the past? No, they typically try to find the sweet spot in regard to both metrics. Like if they find they can reduce power per transistor ~40-45% but also increase frequency ~10-15% which would still allow them to yield a near doubling of transistor count.

~16b trannies, ~500-550mm2, ~1.1-1.2ghz, <275w
That should be good for close to 2x over Fury X performance.
I think you're right and I've just over-reacted. What you've described is what has happened with node transitions historically.
 
What he is forgetting to take into account is the die size. If AMD just shrunk Fury X, theoretically, it would be < 300mm2 at <150w. If they wanted to make that < 300mm2 die use ~250-275w, they could just increase frequency by 30% and call it a day.
But, has either Nvidia or AMD ever done that in the past? No, they typically try to find the sweet spot in regard to both metrics. Like if they find they can reduce power per transistor ~40-45% but also increase frequency ~10-15% which would still allow them to yield a near doubling of transistor count.

~16b trannies, ~500-550mm2, ~1.1-1.2ghz, <275w
That should be good for close to 2x over Fury X performance.

There's some problems here, specifically it's that with the new Finfet nodes you don't get both a 2x transistor per area shrink and an improvement in clockspeed for power usage. You could, theoretically, shrink a Fury/Titan X down to 300mm, but you'd get around similar power usage as before. A tradeoff will need to be made, and only Nvidia and AMD know what tradeoffs they've chosen for the moment.

There's also the consideration that, for this upcoming first generation at least, yields from the new nodes appear to be relatively bad. Meaning a huge die, in other words anything >500mm and possibly anything >450mm won't be appearing until these two nodes turn out more valid chips than they do now. Fortunately you can still shrink transistor density quite a bit while still running at a higher frequency for less power, so even with these limitations Finfet will still provide ample opportunity for improvement. But there's not going to be anything like a straight doubling of performance on the part of either Nvidia or AMD for the first run of Finfet, and possibly even the second.
 
There's some problems here, specifically it's that with the new Finfet nodes you don't get both a 2x transistor per area shrink and an improvement in clockspeed for power usage. You could, theoretically, shrink a Fury/Titan X down to 300mm, but you'd get around similar power usage as before. A tradeoff will need to be made, and only Nvidia and AMD know what tradeoffs they've chosen for the moment.

That would require zero power scaling from going to the FinFET node. Poor (still non-zero) scaling in terms of power/transistor was why 20nm was so disappointing and why FinFET has been so eagerly awaited. They do get the density increase and improve power per transistor, due to the structural change of the devices.
 
That would require zero power scaling from going to the FinFET node. Poor (still non-zero) scaling in terms of power/transistor was why 20nm was so disappointing and why FinFET has been so eagerly awaited. They do get the density increase and improve power per transistor, due to the structural change of the devices.

Exactly. They get both the density improvement and the speed or power savings, obviously not to the full degree as the marketing numbers but hopefully pretty close. That's why I said theorectically with my die size and power numbers, things don't scale down perfectly but I was using a best case scenario of the marketing numbers.

I believe I heard/read that Samsung/GF 14FinFet is supposed to be ~2.3x the density of 28HPM.

Edit- And I'm aware of my 500-500mm2 GPU on 16/14FinFet won't be seen anytime soon. It likely would be 1H '17 at the earliest. I was just speculating on what a true Fury successor could look like with these new nodes.
 
Last edited:
Status
Not open for further replies.
Back
Top