Forbes: AMD Created Navi for Sony PS5, Vega Suffered [2018-06] spawn

dobwal · Jan 30, 2019

Do we need a long drawn out debate over semantics involving a couple words from an article that’s nothing more than rumor?

Designing a completely unique exclusive arch for the PS5 involves Sony swallowing 100s of million in R&D costs by itself. Completely contrary to AMD goals with its semi custom business which is to spread its core R&D costs across multiple businesses.

w0lfram · Jan 30, 2019

Silent_Buddha said:
Which...

And since there's no actual attributable quote there's no way to know if the "source" was just generalizing all console efforts into a pool and the article writer then assuming it means Sony/PS.

Considering no other industry source has collaborated what Forbes has said...

It's far more likely that the shift in resources was a response to ALL contracted console manufacturer's wanting custom parts and not just Sony.

Especially if you consider that at the time this all supposedly happened, Microsoft was by far AMDs largest customer for custom console graphics. They were a proven partner with a proven track record and had just basically matched Sony console for console.

In other words, past history said that Microsoft was a better bet for profitability for AMD than Sony.

Either way, had ANY other reputable tech. sites corroborated with their sources what Forbes wrote, then it'd at least have a tiny bit of credibility that Sony was commanding that amount of RTG resources at the expense of other AMD partners.

But there isn't. So people are left grasping at straws...no wait...grasping at one straw to justify Sony being the prime beneficiary of that much of RTGs engineering resources. When it's far more plausible that that amount of engineering resources was in fact reallocated to all custom console requests (Sony, MS, and whoever else wants custom SOCs).

No-one disputes that Sony have asked for and gotten customizations that are specific to their implementation. But so has Microsoft and presumably any other semi-custom partners that they have.

Regards,
SB

Your entire post goes against AMD's business philosophy!

AMD is moving to their business model to providing solutions to their customers using their own patented chiplet design & (heterogenous design). (See China/SONY/MS/etc..) Dr Su said herself about working closely with their customers and at some length (integrated) to be able to provide them with a custom design suiting their exact needs/wants/goals.

3dilettante said:
{snip}
What is "re-taping"?
{snip}

sigh*

Navi had an issue & had to be "re-taped" out. Yes that deep of an issue, but easily remedied as someone has been told. (Navi will be an Aug product, instead of April.)

The *thread spawn* comes from Me asking "What if... NAVI got additional updates learned from working with the likes of SONY/MS/etc, and snuck a few of those "updates" into the newly taped out uArch.

It (thread spawn) was a very simple question, yet people unfolded and went on different tangent.

BRiT · Jan 31, 2019

w0lfram said:
Your entire post goes against AMD's business philosophy!

You entirely misunderstood that post and the concept of what AMD customizations provides.

Think of it more like a Burger King. Yes, you can customize your order your way, but there is no way to have them come up with a Steak and Lobster dinner. They simply don't have those as ingredients.

3dilettante · Jan 31, 2019

w0lfram said:
sigh*

Navi had an issue & had to be "re-taped" out. Yes that deep of an issue, but easily remedied as someone has been told. (Navi will be an Aug product, instead of April.)

Tapeout is a late step in bringing a design to manufacture, where the masks are being developed from a database that has gone all the way through the design process and pre-production validation. That's after the chip is ideally as final as the designers can manage. In a lucky scenario, that initial revision is good enough to go after the first chips come back and are validated.

If there are issues after tapeout, those are generally from the initial production samples. Tweaks and subsequent passage of test wafers are called respins.
The changes at that point are intended to be minimal, since a significant investment in resources and time went into verifying the chip in its taped-out form, and the more alterations go from the later metal layers to the fundamental transistor layer can mean more weeks/months of time needed start mass production.

Adding design features like semi-custom architectural changes threatens to throw out substantial portions of the base design and negating much of the engineering and verification on silicon that no longer corresponds to what was engineered and tested for months or potentially years.

The *thread spawn* comes from Me asking "What if... NAVI got additional updates learned from working with the likes of SONY/MS/etc, and snuck a few of those "updates" into the newly taped out uArch.

In this scenario, perhaps it could be argued that the uArch is dead, and to be replaced by another one. Perhaps if this new uArch was already in progress and could be used to replace the rejected one, there would be a lesser delay. Just deciding at the point of tapeout is being a few months from ramping and setting the clock back to an intermediate point in a 3-4 year process.
I'm not sure what would be compelling enough to go that far back on a design that didn't have a serious need to be reset further back than tapeout or just replaced for other reasons.

w0lfram · Jan 31, 2019

Yes, that is why I mentioned "respin".

And nothing you have suggested, refutes what is now widely reported, or to what I have said. The scenario I suggested, didn't hint at Navi being dead/broken, etc... as you've suggested. It was delayed by 4 months until they can hand out samples from the respin (after the re-taping).

Given that^ tidbit/fact..
I had asked; What if.. AMD reworked other aspects of Navi's uArch (for this new re-spin), perhaps simple lessons learned from their closer collaboration with SONY (Microsoft/China/etc)...?

It was a simply question really...

Additionally, when asked about Navi's progress (delay?), Dr Su said mentioned that AMD was happy where Navi was, with a few small issues along the way, but are choosing to do the "right thing" (unlike Polaris being rushed) so Navi is more complete for the consumer, or some such nonesense. She also mentioned in some fashion that AMD is excited to where Navi is technically and mention they were able to incorporate more into their design, than had been slated. It somehow ties into their leapfrogging philosophy of teams.

I did stay at a Holiday Inn last night..

3dilettante · Jan 31, 2019

w0lfram said:
Yes, that is why I mentioned "respin".

As I said, a respin is intended to have as small an impact as can be managed. It's about fixing minor faults or functional errors in existing units, not adding them or changing their design. The design and verification of the units already in the chip is essentually final by that point, and taking features or elements from a different design will discard the time invested in the prior version.

And nothing you have suggested, refutes what is now widely reported, or to what I have said. The scenario I suggested, didn't hint at Navi being dead/broken, etc... as you've suggested.

Then there's significant pressure to not do what you are saying. Chips are almost set in stone a long time before final production. If the design doesn't have a serious problem requiring a significant reset, it's expensive and high-risk to do it for some marginal gain.

It was delayed by 4 months until they can hand out samples from the respin (after the re-taping).

That's in-line with a new stepping with some bug fixes, since the fab pipeline has up to several months of turnaround time and then weeks of physical testing. That doesn't leave time for new functionality or design changes to be inserted, and there is significant risk with unverified alterations being put in at the time where errors take the most time and money to fix.

I had asked; What if.. AMD reworked other aspects of Navi's uArch (for this new re-spin), perhaps simple lessons learned from their closer collaboration with SONY (Microsoft/China/etc)...?

That would imply a new design already nearing completion, not restarting a number of steps earlier in the design process of an implementation. The time frame given doesn't give time to change the design or to go through the months of internal simulation and testing that led up to the first tape-out. Transplanting an element implemented in a Sony or Microsoft product into an architecture that didn't include it requires redoing a lot of work, and the 4 months for a respin happen after that work is done again.

The process from specification to design to silicon is multiple years. Changing design rolls something like the later third of the process back.
An example of GCN-based hardware is the PS4, which its lead designer said took 2 years to spec, 2 years to create custom designs, and 2 years to build the platform around it.
Taking some years off because of the larger scope of a total SOC and platform, the work going into the chip could take around 4 years, and would have been substantially locked-in perhaps 2 years prior to release.
That can readily allow for there being many months to over a year of work expended after the design featureset had been decided on, so design changes can revert things quite far in time.

If there's something to be added, it doesn't follow what would be so pressing as to risk delaying a product for much longer than a respin rather than put out the current chip and have the next GPU include the features--assuming there are really game-changing features the console makers would not want exclusive to their chips.

https://www.digitaltrends.com/gaming/meet-the-guy-who-engineered-the-playstation-4/

We had six years to make the hardware, and it only takes about four years to do the actual engineering, so we had two years to figure out what we wanted to make the PlayStation 4.
...
The process of creating the hardware is about four years. Two years into that it’s locked enough that you can start talking about all this other stuff that’s going to surround it.

w0lfram · Feb 1, 2019

3dilettante said:
As I said, a respin is intended to have as small an impact as can be managed. It's about fixing minor faults or functional errors in existing units, not adding them or changing their design. The design and verification of the units already in the chip is essentually final by that point, and taking features or elements from a different design will discard the time invested in the prior version.

Then there's significant pressure to not do what you are saying. Chips are almost set in stone a long time before final production. If the design doesn't have a serious problem requiring a significant reset, it's expensive and high-risk to do it for some marginal gain.

That's in-line with a new stepping with some bug fixes, since the fab pipeline has up to several months of turnaround time and then weeks of physical testing. That doesn't leave time for new functionality or design changes to be inserted, and there is significant risk with unverified alterations being put in at the time where errors take the most time and money to fix.

That would imply a new design already nearing completion, not restarting a number of steps earlier in the design process of an implementation. The time frame given doesn't give time to change the design or to go through the months of internal simulation and testing that led up to the first tape-out. Transplanting an element implemented in a Sony or Microsoft product into an architecture that didn't include it requires redoing a lot of work, and the 4 months for a respin happen after that work is done again.

The process from specification to design to silicon is multiple years. Changing design rolls something like the later third of the process back.
An example of GCN-based hardware is the PS4, which its lead designer said took 2 years to spec, 2 years to create custom designs, and 2 years to build the platform around it.
Taking some years off because of the larger scope of a total SOC and platform, the work going into the chip could take around 4 years, and would have been substantially locked-in perhaps 2 years prior to release.
That can readily allow for there being many months to over a year of work expended after the design featureset had been decided on, so design changes can revert things quite far in time.

If there's something to be added, it doesn't follow what would be so pressing as to risk delaying a product for much longer than a respin rather than put out the current chip and have the next GPU include the features--assuming there are really game-changing features the console makers would not want exclusive to their chips.

https://www.digitaltrends.com/gaming/meet-the-guy-who-engineered-the-playstation-4/

Again, I think everyone here understand that.

I had asked, that (If so), and they were going back to re-tape Navi (no how matter how small or insignificant that would ental), that with leapfrogging teams and having updated (revisions) already stacking behind each other. So that when AMD went back to "re-tape" Navi out (for some minor error), that they would encorporate the most up-to-date masking (A revision that AMD had already been working on...) into the tape out and re-spin.

As logic would imply, not only with the updated (fix) and tape out, Navi would also adopt a newer (more up to date) revision of the latest masking, or production process. Again, going back to Dr Su saying, that they do not want to make the same mistake they made with Polaris (at this stage in development) and rush it. Instead they are doing it right. If Navi was indeed going to be announced at CES and was delayed, then do you think that 4-month delay would include Navi's latest revisions (since the last tape out..?)

What little changes that could be made within a short revision, will probably be done.

But it sounds (to me), that You are suggesting (if true) AMD will be using the exact same revision, but are just going to fix (whatever they found in error) and tape out & respin it... without taking any opportunity to sneak newer revisions into that new tape out..?

You touched on a newer revision briefly, then went on to explain what that process entitles and why AMD would (at this stage) not change the core functions of Navi's architecture. (an argument nobody was making)

However slight that revision might be, the opportunity to tape out as a further revision of itself, isn't all that bad of a prospect. I am just wondering what (if anything) could they do. In addition to perhaps better thermals & efficiencies do to a newer refined processes..?

3dilettante · Feb 1, 2019

w0lfram said:
Again, I think everyone here understand that.

I had asked, that (If so), and they were going back to re-tape Navi (no how matter how small or insignificant that would ental), that with leapfrogging teams and having updated (revisions) already stacking behind each other. So that when AMD went back to "re-tape" Navi out (for some minor error), that they would encorporate the most up-to-date masking (A revision that AMD had already been working on...) into the tape out and re-spin.

Masks are unique to the specific chip they are generated for, and the most up to date set is the chip that is currently taped-out. Designs earlier in their process are further from being in physical form, and so would not have much value to the chip that is taped out and can have physical samples for real testing.

The presented scenario is that there is a chip A being taped out with a chip B being further back in development. Then chip is "re-taped" and takes elements from B.
This hybrid design is neither A or B, so their respective validation and simulation results cannot provide guarantees in the areas where they are combined.

Instead they are doing it right. If Navi was indeed going to be announced at CES and was delayed, then do you think that 4-month delay would include Navi's latest revisions (since the last tape out..?)

Three or so months isn't out of line with respins discussed in the past, and 7nm without EUV is noted to have even longer turnaround time due to the large increase in steps from heavy multi-patterning.
EUV's reduction in multi-patterning is touted by the foundries as reducing cost and reducing lead times, if the EUV lithography tools can improve exposure times and power sources.
If the decision to do a respin is made after the first chips come back from the fab, there's not much more time besides time-constrained fixes to the show-stopper issues and focused validation on those changes.
Features that have too many issues could even be disabled in order to salvage the schedule.

But it sounds (to me), that You are suggesting (if true) AMD will be using the exact same revision, but are just going to fix (whatever they found in error) and tape out & respin it... without taking any opportunity to sneak newer revisions into that new tape out..?

For a respin, the revision of the chip that came back from the fab has minor corrections made, and a new mask stepping is made and sent off.
There are ways of reducing the manufacturing lead time and the risk of alterations, but those work by changing less and less of the chip.
It's a bad time to be changing things just to change them, and there's no clear upside. The risks of severely impacting the design are high, and no features we've seen from the consoles provided a massive benefit. Even if there were an upside to the features, the right thing to do is generally to tape-out the current chip and have the next product incorporate the new features.

You touched on a newer revision briefly, then went on to explain what that process entitles and why AMD would (at this stage) not change the core functions of Navi's architecture. (an argument nobody was making)

No specific level of change was cited, but if this is somehow taking features from Sony or Microsoft, it's already sounding non-trivial. Since there's no sign of them being in production, they wouldn't have any physical feedback to give to a chip being taped-out--and even physical learning doesn't have a lot of carry-over between designs. That leaves feature changes with a physical, electrical, and logical footprint. Those effects would need to be validated.
Even minor changes can have an impact if new transistors or layouts affect how the lithographic patterns interfere with each other, or how the physical differences that result from the choice of patterns in a region can affect mechanical and chemical effects of the fabrication process.
One of the advantages touted by AMD for its chiplet strategy is that if you want to change one part of an SOC, it requires re-evaluating for complex side effects.

w0lfram · Feb 5, 2019

Thank you. That was insightful.

But, as you have suggested, that in most those cases presented were worse case scenarios. And again, you didn't take into account AMD's leap-frogging teams, with validated designs of their own, etc. (just saying)

(ie: What if regardless of Navi's original release, AMD was planning a 2nd revision within 6 months...)

So basically:
TSMC will have refined their process node a tad more for better efficiencies (over the 4 month delay). And that AMD (in the mean time), who vowed "not to take the easy route" and to "do it the right way", will simply retape/respin the dies & correct only what needs to be done. (No updates, or revisions.)

Seems like wasted time.

BRiT · Feb 5, 2019

Why would they plan to spend millions on another revision within 6 months?

entity279 · Feb 5, 2019

w0lfram said:
Seems like wasted time.

Nope, that's just sane engineering (in any field). You'd need a working, stable baseline to improve upon first.

3dilettante · Feb 7, 2019

w0lfram said:
Thank you. That was insightful.

But, as you have suggested, that in most those cases presented were worse case scenarios. And again, you didn't take into account AMD's leap-frogging teams, with validated designs of their own, etc. (just saying)

(ie: What if regardless of Navi's original release, AMD was planning a 2nd revision within 6 months...)

The time frame given earlier was 4 months, which is shorter than the 6 month claim given now.

I would need to see the specific statements about leapfrogging teams.
The relationship between two teams where one team's product is re-designed based on another project's features (when the other project is not in the process of initial sampling) sounds more complicated than leap-frogging, especially if they are piling up at tape-out despite historically taking longer than the initially claimed four months for "re-taping".

Despite this, leap-frogging doesn't prevent injecting a separate project's design elements from disrupting the first team's late-stage development timeline.

So basically:
TSMC will have refined their process node a tad more for better efficiencies (over the 4 month delay). And that AMD (in the mean time), who vowed "not to take the easy route" and to "do it the right way", will simply retape/respin the dies & correct only what needs to be done. (No updates, or revisions.)

Unless the Sony and Microsoft chips are in production, the chip that would be able generate any learning from the new refinements would be the chip being "re-taped", which is still a term I haven't seen defined clearly.
As a pipe-cleaner, Vega 20 may have provided some information for the design for manufacturing of future chips, since it taped out and went into production much earlier.

Seems like wasted time.

In the development of very complex architectures and chips, the general process is that each stage involves making design and engineering choices based on projections on what will happen later, and each later stage builds sequentially on what came before. Each stage reaches a point where it must commit, and the full verdict of those decisions may not be known for a long time. When choices were made a six months or a year ago and all the work since has been based on them, changing the foundation of those lapsed months requires new investments of time and engineering. There's not much borrowing of another design's time.

For a rather ancient example of a chip revision rather than "re-taping", there's what AMD did with the Thoroughbred A and B cores in 2002.
AMD modified the layout, added a metal layer, and added some transistors and decoupling capacitors when going from A to B. This allowed for higher clock speeds, but AMD did not want to differentiate the two revisions beyond that.
https://www.anandtech.com/show/972/3

This was over 16 years ago, so the complexity and lead times for doing this were far different, and the x86 development and refinement process showed more intensity that AMD's GPU cadence.
Even so, Thoroughbred A was not "re-taped" or held up for B. Revision A was taped out, validated, and sold for months before Revision B started to come out. The "right thing" in that case was to finish the current chip and bring it to market, and let the next chip adopt what optimizations are available at its final stage of development.

Clukos · Feb 7, 2019

milk said:
Sony might not have a great track record in actually developing good graphics silicon, but the experience they have in AAA low-level graphics and game code is unmatched on PC-space, and arguably also on Xbox. Don't right off completely SONY's ability to provide AMD useful input.

milk said:
I can understand the adverse reaction to people who extrapolated from that Forbe's comment that sony is pretty much doing most of the designing for navi, which is a simplistic and stupid extrapolation to make. But to then say Sony's know-how has absolutely no value for AMD is equally simplistic and stupid.

This pretty much. It's fair to assume that Sony's input will be valuable to AMD in developing Navi (or whatever ends up in Ps5/Nextbox) but to assume that they are basically developing Navi is a bit far fetched. I would expect some custom features (potentially something akin to mesh/primitive shaders perhaps) that won't make it to desktop Navi but not anything groundbreaking.

metacore · Feb 8, 2019

Are we arguing about who is better to play lego with transistors or ... that guys who's bread and butter is makeing best code on limited resources for retail code, guys who codded in assembly on 30mhz2MB, then 3 cores and five memory subsytems on ps2 and then the cell have absolutely nothing to input? zero bad and good experience to share. On the other hand there is paved road of failed features culminating with vega.

BTW PS2 > Turing <runs> ( that comments are about meshlets )

https://twitter.com/i/web/status/1040858390233341952

https://twitter.com/i/web/status/1042319698443100160

vipa899 · Feb 8, 2019

Intresting that with ps2, future hardware it was. One of my favorite hardware of all time.
Do they mean it was too slow, but the architecture was more flexible?

If we still havent "beaten" ps2, then why was the 2001 xbox faster in about everything

Rootax · Feb 8, 2019

I believe they talk about logical processing of those tasks and efficiency, flexibility, more than absolute raw power.

metacore · Feb 8, 2019

Yeah it's about felxibility, there is further comment by Sebastian

https://twitter.com/i/web/status/1040899318268538880

vipa899 · Feb 8, 2019

Ok it could have been flexible, but if it cant do most effects at reasonable performance those effects werent of much use. Things like bump mapping, pixel shading, detail textures, etc werent all that common on the PS2. Some games went ambitious (SotC) but @ 15fps it wasnt all that smooth.
Cant really say PS2 was better then (more) fixed function variants like OG xbox, or perhaps GC to some extend. I think PS2 was a 90's design with software rendering in mind, it was common then, much more flexible but slower.

Wasnt one of the unreleased or limited Voodoo 5 6000 a multipass gpu, or atleast an idea of PS2-like design?

bgroovy · Feb 9, 2019

No one is suggesting that VUs would have stayed trapped at 300mhz had the concept been brought forward.

vipa899 · Feb 9, 2019

Wonder how a further developed EE/GS future would have looked like. Some said it wasnt forward thinking.

Forbes: AMD Created Navi for Sony PS5, Vega Suffered [2018-06] spawn

dobwal

w0lfram

BRiT

(>• •)>⌐■-■ (⌐■-■)

3dilettante

w0lfram

3dilettante

w0lfram

3dilettante

w0lfram

BRiT

(>• •)>⌐■-■ (⌐■-■)

entity279

3dilettante

Clukos

Bloodborne 2 when?

metacore

vipa899

Rootax

metacore

vipa899

bgroovy

vipa899

Forbes: AMD Created Navi for Sony PS5, Vega Suffered [2018-06] *spawn*

(>• •)>⌐■-■ (⌐■-■)

(>• •)>⌐■-■ (⌐■-■)

Bloodborne 2 when?

Forbes: AMD Created Navi for Sony PS5, Vega Suffered [2018-06] spawn