AMD: Pirate Islands (R* 3** series) Speculation/Rumor Thread

You mean Fiji isn't still made on the old 28nm? What am I missing here?
 
Good thing Pascal isn't going to come out until next year then, and odds are it's just Maxwell on "14nm" with HBM.
At the risk of having to stuff my foot in my mouth (again) when it comes out: how close is Maxwell to being optimal in terms of perf/W (and to lesser extent perf/mm2)? I don't think it would be such a terrible product if it's 'just' a shrink to 14nm. Chances are that, adjusted for process, it'd architecturally still beat GCN-Fiji in terms of efficiency.

The "14nm" process from Samsung/TSMC is too complex to change much from the second release of Maxwell otherwise, the lead times on design, tape out, and manufacturing are quite a bit higher than 28nm.
I don't understand. What do the lead times have times have to do with the complexity of change the design? As long as they started the design of Pascal early enough, it doesn't matter?

This has little to do with lead out and design times required for 14nm, you can't really just "throw more money at the problem" to simultaneously switch to a new and far more complex patterning, masking, and manufacturing scheme while simultaneously redesigning a large portion of your GPU pipeline and expect it to "just work".
Why not?
Do you expect there to be significant repercussions between the physical design domain of a 14nm process and the symbolic world of ones and zeros? Now that would be a huge change indeed, and, if so, would most certainly require major changes in how the digital logic is designed. [emoji6]
 
At the risk of having to stuff my foot in my mouth (again) when it comes out: how close is Maxwell to being optimal in terms of perf/W (and to lesser extent perf/mm2)? I don't think it would be such a terrible product if it's 'just' a shrink to 14nm. Chances are that, adjusted for process, it'd architecturally still beat GCN-Fiji in terms of efficiency.

Performance per watt for Fiji is supposed to be pretty close to Maxwell, perhaps even the same to better as Fiji is rumored to be both about as performant as a Titan X while being physically smaller and having more 64bit compute performance enabled. Or who knows, maybe it's terrible, AMD will finally go bankrupt, Nvidia will have a monopoly on workstation and desktop graphics and suddenly triple all its card prices.

Either way this has nothing to do with a 14nm Maxwell being better, of course it'll be better, and probably have a fully compute enabled Titan X card out with it thanks to higher transistor density. It's just probably not going to be much of an architectural change

I don't understand. What do the lead times have times have to do with the complexity of change the design? As long as they started the design of Pascal early enough, it doesn't matter?

Why not?
Do you expect there to be significant repercussions between the physical design domain of a 14nm process and the symbolic world of ones and zeros? Now that would be a huge change indeed, and, if so, would most certainly require major changes in how the digital logic is designed. [emoji6]

Because you need to physically validate all your boards on the actual process, and 14nm samples haven't even been out that long. So simultaneously validating both a new design and a process change multiplies the amount of work you need to do. It's why Intel trades off architectural improvements and process improvements ever other year, so they don't get the two fouled up and end up with long delays
 
Because you need to physically validate all your boards on the actual process, and 14nm samples haven't even been out that long. So simultaneously validating both a new design and a process change multiplies the amount of work you need to do.
No, it doesn't multiply anything. Process validation is something that is done in parallel to design validation. They're rarely any overlap between the two except for analog blocks (IOs, PLLs, RAMs to a certain extent), but those need to be validated irrespective of whether it's a new architecture or now.

It's why Intel trades off architectural improvements and process improvements ever other year, so they don't get the two fouled up and end up with long delays
Intel has their own reasons, but they're about the only ones who do this. Just look at the history of GPUs and find me an example of where a new process was not accompanied by a new architecture. I can't think of any.
 
Last edited:
No, it doesn't multiply anything. Process validation is something that in parallel to design validation. They're rarely any overlap between the two except for analog blocks (IOs, PLLs, RAMs to a certain extent), but those need to be validated irrespective of whether it's a new architecture or now.


Intel has their own reasons, but they're about the only ones who do this. Just look at the history of GPUs and find me an example of where a new process was not accompanied by a new architecture. I can't think of any.

I think 55 nm was a shrink-only node for NVIDIA (GT200B & friends).
 
No, it doesn't multiply anything. Process validation is something that in parallel to design validation. They're rarely any overlap between the two except for analog blocks (IOs, PLLs, RAMs to a certain extent), but those need to be validated irrespective of whether it's a new architecture or now.


Intel has their own reasons, but they're about the only ones who do this. Just look at the history of GPUs and find me an example of where a new process was not accompanied by a new architecture. I can't think of any.

40nm? RV740
 
So you're either suggesting any board already validated on one process can be validated on a totally different process instantly and for free, or are suggesting that simultaneously designing a new chip on a new process won't create extra work in disentangling whether the results were from the new design or the new process.

And once again ignoring that 20nm (as it should be named with its feature size and backend) Finfet adds a lot of new steps that designers are unfamiliar with, vastly increasing complexity and cost over the already complex transitioning from one node to another. Yet somehow Nvidia will be able to, without doubling their engineering staff or more, completely redesign their GPU architecture as well as simultaneously learn all the extra complexity involved with multiple patterning and other steps needed for the transition to Finfet. Oh and they'll do this despite both previous architectures, namely Kepler and Fermi, both taking up 2 series of cards apiece, but Maxwell can only have one series cause why not? Combined with Nvidia's own hype around Pascal relying solely on mixed precision compute, HBM/Unified memory, and supporting NVlink, well you've convinced me. Obviously no reason Nvidia would want to toot its own horn about any other architecture upgrades whatsoever.
 
Intel has their own reasons, but they're about the only ones who do this. Just look at the history of GPUs and find me an example of where a new process was not accompanied by a new architecture. I can't think of any.
Actually it is quite common for NVIDIA to trial a process on a known architecture, before moving to a new architecture, but there generally isn't much fanfare about the first part because it is often a small part, which is out of the limelight. The prime example is the last transition to 28nm; while Kepler got all the headlines, their first 28nm chip was actually GF117; a Fermi architecture base without display targeted for notebook. Memory fails me, but the best I can see from quick googling suggests that NVIDIA's first 28nm chip was actually GT218 (again entry level/notebook focused) before Fermi took the press interest. Their 55nm transition was done a little higher up the stack, with G92B, but this was more or less a straight die shrink of G92, while the mainstay of their 55nm products were the Tesla line.

Assuming GT218 is correct, that's the last three processes accounted for with NVIDIA. I'm sure there are more examples further back, to the point that you could characterise it more as the norm for NVIDIA.
 
Actually it is quite common for NVIDIA to trial a process on a known architecture, before moving to a new architecture, but there generally isn't much fanfare about the first part because it is often a small part, which is out of the limelight. The prime example is the last transition to 28nm; while Kepler got all the headlines, their first 28nm chip was actually GF117; a Fermi architecture base without display targeted for notebook. Memory fails me, but the best I can see from quick googling suggests that NVIDIA's first 28nm chip was actually GT218 (again entry level/notebook focused) before Fermi took the press interest. Their 55nm transition was done a little higher up the stack, with G92B, but this was more or less a straight die shrink of G92, while the mainstay of their 55nm products were the Tesla line.

Assuming GT218 is correct, that's the last three processes accounted for with NVIDIA. I'm sure there are more examples further back, to the point that you could characterise it more as the norm for NVIDIA.

And their first 20nm chip was a SoC with the smallest Maxwell we know of, Tegra X1.
 
So you're either suggesting any board already validated on one process can be validated on a totally different process instantly and for free,
Since when is 'in parallel' the same as 'for free'?

or are suggesting that simultaneously designing a new chip on a new process won't create extra work in disentangling whether the results were from the new design or the new process.
Yes, that's exactly what I'm saying.

Because there really isn't much disentangling to do. The issues that are process related are things like:
- a PLL doesn't perform as expected in all corners.
- IO behavior (rise/fall times, threshold voltage, ...) are not right
- RAMs don't operate at expected speed or at expected low voltages
- ESD structures aren't strong enough

Even if they impact in some way the digital world of ones and zeros, it's usually trivial to isolate where that the problem is non-digital: they start to work when you increase or decrease the voltage or temperature.

The things where analog effects can really mess up digital behavior are crosstalk between wires and faulty timing constraints, but those are almost always due to human error and, in my experience, just as common with new designs on old processes as for old designs on new processes.

And once again ignoring that 20nm (as it should be named with its feature size and backend) Finfet adds a lot of new steps that designers are unfamiliar with, vastly increasing complexity and cost over the already complex transitioning from one node to another.
You are throwing designers onto one heap as if they all have to deal with it. It doesn't work that way. Or better: it doesn't work that way for companies other than Intel, where, AFAIK, handcrafted custom design is still a thing.
The design rules of finfet are a problem for the designer who lays out a RAM or standard cells. It doesn't impact the designer who's job it is to write 'a <= b * c;'

Yet somehow Nvidia will be able to, without doubling their engineering staff or more, completely redesign their GPU architecture as well as simultaneously learn all the extra complexity involved with multiple patterning and other steps needed for the transition to Finfet.
Multiple patterning? You just described the job of TSMC, not Nvidia.

Oh and they'll do this despite both previous architectures, namely Kepler and Fermi, both taking up 2 series of cards apiece, but Maxwell can only have one series cause why not?
I have no idea what point you're trying to make here? Are you suddenly mixing marketing and engineering?

Combined with Nvidia's own hype around Pascal relying solely on mixed precision compute, HBM/Unified memory, and supporting NVlink, well you've convinced me. Obviously no reason Nvidia would want to toot its own horn about any other architecture upgrades whatsoever.
How is that relevant to the rest of the discussion? I thought we were talking about technical aspects and suddenly you descend dangerously close to fanboy related topics?
 
Actually it is quite common for NVIDIA to trial a process on a known architecture, before moving to a new architecture, but there generally isn't much fanfare about the first part because it is often a small part, which is out of the limelight. The prime example is the last transition to 28nm; while Kepler got all the headlines, their first 28nm chip was actually GF117; a Fermi architecture base without display targeted for notebook. ... I'm sure there are more examples further back, to the point that you could characterise it more as the norm for NVIDIA.
Let's leave aside GT218, where the timing just doesn't match for 28nm to be ready.

Though it may seem like it, I'm not trying to argue that a new process preferentially means a new architecture. My point is that it's wrong to assume causation between a new process and there being (or not) being a new architecture.

They are simply unrelated. Let's take the case of GF117, whatever that chip may be. The question you have to ask is: how long was it taped out before the first Kepler chip was taped out. Early 28nm chips took months in the fab, so the gap between GF117 back from the fab and Kepler taping out can't be all that big.

If that's the case, what benefit can you get from having your smaller design early?
My answer is:
1) you get some schedule benefit: you can get a head start on some analog characterization work, which always takes a long time.
2) since the digital validation of the early architecture has already completed, that will finish early as well. So you get the benefit of a small chip with good yields which is nice to have for low margin products.
3) you get a potential early warning if there's something catastrophically wrong with the new process. You could hold off your tape-out. That's more of an insurance policy.

What you don't get is what Pony is arguing: that it somehow solves a complexity problem. The issues that you run into due to new processes and due to a new architecture are playing in different worlds that rarely interact.

Let me tell you about one case where it did happen: a single, unreliable via due to a defect in a mask. It wasn't that there was no contact, it was unreliable. The fact that it was not on a new process actually made things harder: on a new process, at least you keep the possibility in the back of your mind that the problem is not digital, on an old process you do not. It's also much harder to convince the fab to look into it.
 
Why not?
Do you expect there to be significant repercussions between the physical design domain of a 14nm process and the symbolic world of ones and zeros? Now that would be a huge change indeed, and, if so, would most certainly require major changes in how the digital logic is designed. [emoji6]
Maybe at 4nm...

Performance per watt for Fiji is supposed to be pretty close to Maxwell, perhaps even the same to better as Fiji is rumored to be both about as performant as a Titan X while being physically smaller and having more 64bit compute performance enabled.
And the rumor mill is harping on it having water cooler standard, so there's that.

Or who knows, maybe it's terrible, AMD will finally go bankrupt, Nvidia will have a monopoly on workstation and desktop graphics and suddenly triple all its card prices.
I don't think that's fair to Maxwell or Fiji. It's unlikely Fiji is big enough a revenue driver, even if it's rather successful, to change the trajectory if that's what the trajectory is.

Either way this has nothing to do with a 14nm Maxwell being better, of course it'll be better, and probably have a fully compute enabled Titan X card out with it thanks to higher transistor density. It's just probably not going to be much of an architectural change
The hybrid nodes promise the effective doubling of density and power efficiency over 28nm a traditional node transition used to provide. That is a lot of leeway for GPU architectures, which scale up and down in resources like nobody's business.
 
Let's leave aside GT218, where the timing just doesn't match for 28nm to be ready.

I think that's just a typo, and I believe GT218 was NVIDIA's first 40nm GPU, hence Dave's comment that the aforementioned chips account for the last three nodes.
 
I vividly recall ATI doing similar things with foward lithography processes - all new processes were used for the tiny high-volume, high profit margin and yet simplistic chips. As the lithography process matured, the larger chips would show up on that process.
 
I think that's just a typo, and I believe GT218 was NVIDIA's first 40nm GPU, hence Dave's comment that the aforementioned chips account for the last three nodes.
Yeah, sorry, typo. "Tesla" was primarily on 55nm, with GT218 moving to 40nm before "Fermi" filling out the 40nm stack/node.
 
I vividly recall ATI doing similar things with foward lithography processes - all new processes were used for the tiny high-volume, high profit margin and yet simplistic chips. As the lithography process matured, the larger chips would show up on that process.

Yes, sort of.

If memory serves, RV670 was AMD's first 55nm GPU, and it was around 192mm², I think. Not exactly low-end, but not big either, even though it was pretty much the fastest thing they had at the time.
Then RV740 was their first 40nm GPU and that was very much a mainstream chip (around 140mm² I think). But Tahiti was their first 28nm chip, and it was quite big.

It also had questionable I/O performance relative to their physical size when compared to Pitcairn, so maybe going big with the first chip on a new process isn't such a great idea after all.
 
Last edited:
That may be what I was remembering, but to be honest, I'd have to go searching to make my memory pop back into place. I thought there were several iterations of small stuff (GPU) on the small stuff (lithography) before moving up to large stuff (GPU) on the small stuff (lithography.)

I'll defer to you for now :)
 
Yeah, sorry, typo. "Tesla" was primarily on 55nm, with GT218 moving to 40nm before "Fermi" filling out the 40nm stack/node.
That makes more sense.

A different point is: what you call a new architecture anyway? Was GT200 a new architecture? It didn't add a whole lot compared to G80, except for FP64 support. And if it was, then what about GT218? Is FP64 a more complicated jump than DX10.1?
I think we can agree that Fermi and Kepler and big Maxwells were major jumps, but what about gm107?

If Fiji still has a largely unmodified GCN core, an evolution instead of a major overhaul, is it still the same architecture as Tahiti?

IMO it doesn't really matter. If smaller chips are created on a new process long before a new architecture is taped out on the same process, there could be some minor benefits. But I suspect that the prime reasons for doing it in that order are yields on new process: the smaller chips has a good chance of having profitable yields right from the start, while the big one doesn't.
 
Back
Top