PlayStation 4K - Codename Neo - Technical analysis

onQ · May 5, 2016

Shifty Geezer said:
There's no such thing as a RAM maximum for performance. You can trade processing solutions with memory solutions often enough. eg. Calculate dynamic LODs using processing, or store 100 different precalculated LOD models and just fetch the required one. Calculate procedural materials in realtime, or calculate them once and store the result.

I didn't say it was "a RAM maximum for performance" I said "Maybe it's just that 5GB is all that's needed for games on a console with less than 2TFLOPS"

Meaning that they figured that 5GB was enough for what they would be doing with the PS4 games & thought that it would be better to reserve the other memory for the new models in cause they want to do other stuff with the OS.

Shifty Geezer · May 5, 2016

onQ said:
I didn't say it was "a RAM maximum for performance" I said "Maybe it's just that 5GB is all that's needed for games on a console with less than 2TFLOPS"

You're relating the RAM amount to the available power - "2TF has a RAM requirement ceiling of 5GB as more than that is no use."

Meaning that they figured that 5GB was enough for what they would be doing with the PS4 games.

If they had more RAM, they could do more.

Doing more is really about cost. One could argue that the cost of filling 5 GBs of RAM is enough and more RAM would mean more expense for devs. But it doesn't work that way either as with streaming, you can have a resource requirement in the stratosphere. Basically, you can't have too much RAM. Well, you could. 5 GB on a ZX Spectrum or C64 would be too much. So somewhere there is a sweetspot, I agree.

liolio · May 5, 2016

msia2k75 said:
Do we even know if current PS4 APU is "shrinkable"?

NO we don't.
Lots of people expect more on every account for less, and all that extra power to be readily available to dev disregarding the associated costs put on publishers on the matter of dealing with many SKUs. I've not seen a single hint that Jaguar cores or better are to ever see the light on better process than 28nm, I got a little carried away when the wave of rumors hit but now... I'm back to scepticism on the matter as it is not something you hide away from investors and to disclose that information would have no effect on MSFT, Sony or Nintendo affairs with AMD.

Michellstar · May 5, 2016

msia2k75 said:
Do we even know if current PS4 APU is "shrinkable"?

Yes we do

http://venturebeat.com/2016/01/14/a...eo-lisa-su-says-2016-is-the-year-amd-is-back/

"Question: One thing that’s happened in previous generations of the console business, there have been changes in the architecture of the chips mid-cycle in some consoles. Will that happen in this cycle?

Su: Without talking about any particular console road map, because our console customers are very sensitive, there will be opportunities to cost reduce. As we know, the consoles tend to be very sensitive to price point. As the price comes down, the console volume goes up. There will be opportunities to do that in this cycle as well."

From the horse´s mouth. Lisa Su, AMD´s CEO back in january

Metal_Spirit · May 5, 2016

Love_In_Rio said:
You are not taking account the process node change. Going 2,1 Ghz at 14nm could perfectly suppose a reduction in TDP. In fact Puma+ was a revision of Jaguar in 28nm that already reduced greatly the TDP (without architecture changes).

With the rumored specs PS4 Neo consumption could end up being less than original PS4 (that was, what? 150 watts?).

Well, Puma does not feature Heterogeneous System Architecture or zero-copy, so we are not realy talking Puma, but Jaguar!
But If the process goes from 28 to 14nm, why only 2.1 Ghz, and not more? Only 20% more than what Microsoft offers on 28 nm?

function · May 5, 2016

Michellstar said:
Yes we do

http://venturebeat.com/2016/01/14/a...eo-lisa-su-says-2016-is-the-year-amd-is-back/

"Question: One thing that’s happened in previous generations of the console business, there have been changes in the architecture of the chips mid-cycle in some consoles. Will that happen in this cycle?

Su: Without talking about any particular console road map, because our console customers are very sensitive, there will be opportunities to cost reduce. As we know, the consoles tend to be very sensitive to price point. As the price comes down, the console volume goes up. There will be opportunities to do that in this cycle as well."

From the horse´s mouth. Lisa Su, AMD´s CEO back in january

Doesn't mean she's talking about the PS4 dropping to 14nm though. She could be, but that's suitably vague to mean it might not be.

As some have pointed out here, the PS4 die as is would have trouble shrinking due to the physical layer of memory controller appearing to take up an entire edge of the chip. The chip might come in too small to fit it's memory bus without substantial reworking.

Once yields are good enough one route to cost reduction might be to remove redundant CUs, or perhaps newer layout libraries might allow for a denser chip, or ... something.

Michellstar · May 5, 2016

function said:
Doesn't mean she's talking about the PS4 dropping to 14nm though. She could be, but that's suitably vague to mean it might not be.

As some have pointed out here, the PS4 die as is would have trouble shrinking due to the physical layer of memory controller appearing to take up an entire edge of the chip. The chip might come in too small to fit it's memory bus without substantial reworking.

Once yields are good enough one route to cost reduction might be to remove redundant CUs, or perhaps newer layout libraries might allow for a denser chip, or ... something.

Come guys, I would be extremely surprised if both orbis and durango don´t get shrinked to 14nm, yes the´ll be reworked to 14 nm finfet. they have to be anyway

Lisa is talking generalities, but i bet it´s not about WiiU

The whole interview is about 2016 and 14nm Finfet as a turning point to the company

function · May 5, 2016

Michellstar said:
Come guys, I would be extremely surprised that both orbis and durango don´t get shrinked to 14nm, yes the´ll be reworked to 14 nm finfet. they have to be anyway

Lisa is talking generalities, but i bet it´s not about WiiU XD

The whole interview is about 2016 and 14nm Finfet about a turning point to the company

X1 looks like a reasonable fit for shrinking down to 14nm. Cribbing off 3dilettante here, but the memory controllers (or PHYs if that's not the right term) are located in the corners and so the chip can shrink without the memory controllers / PHYs needing to. Plus, I guess, all the sram and the disabled CUs should make the chip relatively fault tolerant.

If you look at the PS4 die though an entire edge is taken up by a memory controller that is unlikely to shrink well (anyone that knows feel free to correct me on the terms being used!). PS4 at half the area would be bigger than any AMD chip with a 256-bit GDDR5 bus. That's one of the reasons people have been wondering about PS4 slim and Neo possibly being made from the die (PS4 slim being a harvest of Neo) - it might get past the issue of die size for the bus.

If you take a look at the die shots from Chipworks you can really see what the issues might be regarding bus / size.

Edit: I guess it's possible that the existence of Neo is actually tied in to the need to create a larger chip ...

function · May 5, 2016

So, like, yeah. You want to go to 14nm, but your chip is too small for the bus and you'll end up playing for dead space on your die. Money down the drain.

So you make a larger chip by doubling up on the CUs. Now you have a chip that can be used for your PS4 slim (with a ton of redundancy) and a more fully operational chip that you can put in a premium product (Neo).

And when yields get good enough and when the Neo is no longer premium, you drop the 4 and just sell the Neo. Something like that. Two products from one chip, that's not much bigger than it would have been for just the shrunk PS4 with lots of dead space on the chip.

Edit: Bonus chip cut and paste, using elite level MSPaint.

Assuming the sections labelled "memory controller" won't scale, and 67% (crudely ~ 45% area) for the rest of the ... stuff ... wanted to see what would fit nicely with the minimum of re-layout. 3 x 13 CUs (with 3 x 12 active) goes in pretty well, with minimum redundancy. Fits into a package about 80% of 28 nm PS4 die, and that that you get PS4 slim or neo from the same chip. Whether or not that could work I have no idea (break BC?). Then again I generally have no idea, and my lunch is over.

liolio · May 5, 2016

Michellstar said:
Yes we do

http://venturebeat.com/2016/01/14/a...eo-lisa-su-says-2016-is-the-year-amd-is-back/

"Question: One thing that’s happened in previous generations of the console business, there have been changes in the architecture of the chips mid-cycle in some consoles. Will that happen in this cycle?

Su: Without talking about any particular console road map, because our console customers are very sensitive, there will be opportunities to cost reduce. As we know, the consoles tend to be very sensitive to price point. As the price comes down, the console volume goes up. There will be opportunities to do that in this cycle as well."

From the horse´s mouth. Lisa Su, AMD´s CEO back in january

Well that is a pretty elusive answer, she could not say in the place of her costumers posibly altering theirs business. Either way once their new IP are rolling (Zen and polaris) as they get the material and how to about that new process they may allocate the necessary effort into porting the cat cores, though I don't expect that to be a secret move. Intel recently (as I thought) said that they are going to keep Atom cores around, it makes sense down the line to get the Cat cores hunting some more Though such a move won't be wrapped in secrecy.

3dilettante · May 5, 2016

function said:
Edit: Bonus chip cut and paste, using elite level MSPaint.

Assuming the sections labelled "memory controller" won't scale, and 67% (crudely ~ 45% area) for the rest of the ... stuff ... wanted to see what would fit nicely with the minimum of re-layout. 3 x 13 CUs (with 3 x 12 active) goes in pretty well, with minimum redundancy. Fits into a package about 80% of 28 nm PS4 die, and that that you get PS4 slim or neo from the same chip. Whether or not that could work I have no idea (break BC?). Then again I generally have no idea, and my lunch is over.

I played around a bit with this concept earlier, and one item that probably will not shrink like in the picture is the display and system IO on the right side of the die.
Also, it is possible to have a memory interface go around a corner, or at least it has happened in past GPUs like RV770.

These two possibilities might interfere with each other. Going around corners might allow some extra slack in the perimeter, but if the right side has a lot of other IO, then there is less that can be done. The die shot isn't that great about that side of the chip, but a majority (3/4?) seems to have some kind of interface. The silicon that pairs with that IO doesn't necessarily shrink ideally either, since it needs to interface with and work with external electrical levels.

There are other variables, such as doubling up interfaces, although which ones it would be practical on Orbis is unclear.
We know one console's DDR3 interface does save perimeter area that way, and if you play silicon tetris with Durango you can see it has more slack to play with, which is one advantage to the memory implementation it has.

If it were a straight shrink, Orbis at 14/16nm would have over a third of its area being GDDR5 PHY. Durango after a straight shrink gets slightly over what Orbis has now in area consumption.

function · May 5, 2016

3dilettante said:
I played around a bit with this concept earlier, and one item that probably will not shrink like in the picture is the display and system IO on the right side of the die.
Also, it is possible to have a memory interface go around a corner, or at least it has happened in past GPUs like RV770.

Ah, I didn't consider that the memory interface might be able to wrap around corners so I was trying to stick to what seemed to be 64-bit "chunks" in straight lines. I knew I was shrinking some IO stuff I probably shouldn't, but wasn't sure what stuff around the IO should or shouldn't be shrunk so I gave up ...

These two possibilities might interfere with each other. Going around corners might allow some extra slack in the perimeter, but if the right side has a lot of other IO, then there is less that can be done. The die shot isn't that great about that side of the chip, but a majority (3/4?) seems to have some kind of interface. The silicon that pairs with that IO doesn't necessarily shrink ideally either, since it needs to interface with and work with external electrical levels.

I notice on both Orbis and Durango that the memory interfaces are symmetrical along a line running between the two CPU clusters, and that both designs seem to arrange the memory interfaces as close to the CPUs as possible. I'm guessing this is to do with reducing and mirroring latency for the CPUs? When playing "silicon Tetris" I was trying to maintain this type of arrangement.

There are other variables, such as doubling up interfaces, although which ones it would be practical on Orbis is unclear.
We know one console's DDR3 interface does save perimeter area that way, and if you play silicon tetris with Durango you can see it has more slack to play with, which is one advantage to the memory implementation it has.

If it were a straight shrink, Orbis at 14/16nm would have over a third of its area being GDDR5 PHY. Durango after a straight shrink gets slightly over what Orbis has now in area consumption.

I too noticed from playing around with the Durango die that the memory interface is significantly smaller. I suppose that's another thing to factor in to the whole "esram value" equation. My best go for Durango Tetris came in at around 200 mm^2, but I was shrinking all the IO stuff on that too, though if you were to wrap the memory interface around the die edge where the CPU is (again keeping it symmetrical) and reveal more die edge to the sides of the CUs there should be room for those and the stuff near them to stay much the same size as it is now.

How big is the memory interface for DDR4 and LPDDR4 compared to DDR3? LPDDR4 in particular offers the possibility for 4266 mHz was latencies supposedly around about those of DDR3 2133. Do you think a switch to a 128-bit bus could ever be practical?

AlNom · May 5, 2016

You probably doubled up on the RBEs as well.

But nice effort in the pics.

function · May 5, 2016

AlNets said:
You probably doubled up on the RBEs as well.

But nice effort in the pics.

Who doesn't want 48 ROPs eh?

I over-shrank a little in those pics, down to around 45%. It started out as fun, but then I got lost in being a pro-grade chip engineer ... then I was out of time. :no:

3dilettante · May 5, 2016

function said:
I notice on both Orbis and Durango that the memory interfaces are symmetrical along a line running between the two CPU clusters, and that both designs seem to arrange the memory interfaces as close to the CPUs as possible. I'm guessing this is to do with reducing and mirroring latency for the CPUs? When playing "silicon Tetris" I was trying to maintain this type of arrangement.

It might be more about avoiding overly complicating things around the north bridge and graphics memory controller, which the CPUs are mirrored around and the GPU plugs into on one side.
There are many examples of CPUs and APUs that can live with off-center interfaces. It seems to generally creep back to symmetry once enough interfaces are added on larger chips, but even then there are examples of surviving with small irregularities.

There are some decently wide bus connections from the GPU into the interconnect, which might lead trying to keep things more clear in terms of wiring congestion from the GPU's far side and no strong reason to make different versions of the interface sections rather than mirroring one. Perhaps someone versed in that process would have feedback on that possibility.

The non-memory IO interfaces have modest demands, and it seems like chips don't mind them being off-center.
It seems like symmetry isn't a dealbreaker, but can be preferred depending on where die space savings are on the priority list.

How big is the memory interface for DDR4 and LPDDR4 compared to DDR3? LPDDR4 in particular offers the possibility for 4266 mHz was latencies supposedly around about those of DDR3 2133. Do you think a switch to a 128-bit bus could ever be practical?

I haven't tried comparing the interfaces between generations for Intel's Broadwell to Skylake transition. I don't recall anything visually standing out.
If DDR4 latencies are equivalent to DDR3 latencies, it might not be straightforward to take a double-speed DDR4 and cut it in half. DRAM and its controllers have various latencies that are not measured in clock cycles, but in actual time, and there are benefits to quantity of controllers/channels--which AMD commented on in terms of behavior under load for HBM.

DieH@rd · May 5, 2016

function said:

Michellstar · May 6, 2016

liolio said:
Well that is a pretty elusive answer, she could not say in the place of her costumers posibly altering theirs business. Either way once their new IP are rolling (Zen and polaris) as they get the material and how to about that new process they may allocate the necessary effort into porting the cat cores, though I don't expect that to be a secret move. Intel recently (as I though) say that they are going to keep Atom cores around, it makes sense down the line to get the Cat cores hunting some more Though such a move won't be wrapped in secrecy.

Unfortunately it appears that the cat core team were disbanded, and AMD is zen all the way.

I mean, they are not going to improve or iterate the design, but i guess they could shrink it, just the basic jaguar core, into 14nm. This must have been signed since the start, from both parties. Durango/Orbis projects and future shrinks. Sony or Ms took care of R&D, that´s what semi-custom was all about, sharing costs

And it fits with the narrative, why they could use for a refresh an updated GNC core, but with basic jaguar.

Michellstar · May 6, 2016

function said:
So, like, yeah. You want to go to 14nm, but your chip is too small for the bus and you'll end up playing for dead space on your die. Money down the drain.

Many thanks for the effort and for your Paint Kung fu, I get all you are saying, if t´s not an evenly distributed layout, the´ll get a not optimal shrink.
But do they have room to fiddle with cores??, if i recall in Durango the cpu cores are rotated, or placing the shared memory controller in another side.

An yes, maybe that´s the reason for the Neo.

We´ll see soon, if they shrink Orbis to 14nm
Or if they phase-out Orbis, for Neo. or keep using 28nm

Anyway, thanks Function and 3dilettante as well.

sebbbi · May 6, 2016

Michellstar said:
Unfortunately it appears that the cat core team were disbanded, and AMD is zen all the way.

I mean, they are not going to improve or iterate the design, but i guess they could shrink it, just the basic jaguar core, into 14nm. This must have been signed since the start, from both parties. Durango/Orbis projects and future shrinks. Sony or Ms took care of R&D, that´s what semi-custom was all about, sharing costs

And it fits with the narrative, why they could use for a refresh an updated GNC core, but with basic jaguar.

Puma is just a tweaked Jaguar core (same Family 16h microarchitecture). Puma improved the power saving features and had manufacturing improvements (to reduce leaking). These improvements allowed turbo clocks. Puma should run all Jaguar code with identical performance (but with lower power required, assuming identical clocks and no turbo active). I don't see a reason why the die shrink could not upgrade the cores to Puma if needed. All existing software should work identically (but with lower power).

Michellstar · May 6, 2016

sebbbi said:
Puma is just a tweaked Jaguar core (same Family 16h microarchitecture). Puma improved the power saving features and had manufacturing improvements (to reduce leaking). These improvements allowed turbo clocks. Puma should run all Jaguar code with identical performance (but with lower power required, assuming identical clocks and no turbo active). I don't see a reason why the die shrink could not upgrade the cores to Puma if needed. All existing software should work identically (but with lower power).

Sorry if it wasn´t clearer.
I was just trying to give liolio reasons why the supposed Neo just uses jaguar.
He argues that given AMD scarce resources, they won´t shrink puma cores to 14nm.

I think the shrinks should have been signed (with blood if necessary) in the original projects, with a lifespan of 5/7 years.

PlayStation 4K - Codename Neo - Technical analysis

onQ

Shifty Geezer

uber-Troll!

liolio

Aquoiboniste

Michellstar

Metal_Spirit

function

None functional

Michellstar

function

None functional

function

None functional

liolio

Aquoiboniste

3dilettante

function

None functional

AlNom

Moderator

function

None functional

3dilettante

DieH@rd

Michellstar

Michellstar

sebbbi

Michellstar

Similar threads