Predict: The Next Generation Console Tech

Status
Not open for further replies.
Sure, no one knows anything, MS are better than Apple at keeping secrets and all this stuff that's leaking is a sneaky ploy by them to get Sony to underestimate their system.

Now, where did I leave my tinfoil hat...?

Well, what do you know? Oh, that's right absolutely nothing. You just proved my point.
 
Well, what do you know? Oh, that's right absolutely nothing. You just proved my point.

I know that a lot of independent, reliable sources are corroborating specs that are completely believable.

What do you know?

Though it's good that you've recovered from your "Security guards at MS mean a 1.2 TF console is BS!" fanboy panic a few pages back.
You seem much calmer now
 
The reason it needs to be on the interposer is that's how the connection will be made to the logic dies (APU, GPU). The whole point is to keep everything really close together which is how Wide IO and HBM is possible.

It's about future costs too. Being stuck on a 256-bit controller with GDDR5 will add up in costs more in the long run than straight DDR4 from the start. Also it doesn't need to draw as much power, so way less watts and heat. Less TDP from one part of the system means more room for elsewhere.
I'm not sure it's even possible to put a DDR4 chip on an interposer, and anyway they'd need 16 of them (16bit max each), that's almost impossible they'd go that route. There's nothing in the official specs that shows a lower power interface for 2.5D, it's just a PCB interface with a standard ball grid packaging. 2.5D doesn't seem to be part of the specs at all (unless there's another doc about it?). It's a good idea to save money and use slower main memory if the real world performance is compensated by the eDram, but the interposer really doesn't make sense for DDR4.

If they would care about future shrinks, the first release wouldn't have an interposer at all (not needed), and they'd add one later on when the shrink requires it, but it has nothing to do with DDR4. I believe the best reasons to use DDR4 are because it costs less and it allows more memory at the same bus width, at the expense of being 3 times slower. If the reasoning is that they want more than 4GB memory at 256bit wide, they'd be much better with GDDR5M instead of DDR4.

It's sad nobody ever talked about GDDR5M in the leaks and rumors. It's exotic so it would make a lot of click-money for gaming journalists. That would also be a good memory to feed the (now infamous) Durango Blitter, at least according to my insider who told me about the Blitter "Fat Lady" a long time ago. My old dev kit doc says the Fat Lady part is only 21000 transistors, a very efficient circuit, but it's made on a much larger CMOS process.
 
I think same thing here something customised like -> A10 APU (5800->384SIMD/6 CU/800MHz/246mm^2= 614,4Gflops ) + HD8770 Bonaire GPU XT (768 SIMD/12 CU /800MHz/160mm^2-> 1228.8Gflops) + 4 GB GDDR5 at 192GB/sec.

(APU + GPU without "Special sauce" -> =~ 406mm^2)

And A10 is on 32nm so if they do go with Kaveri we might see a noticeable shrink there.

They could comfortably go over 2-2.5TF aggregate APU + GPU and have modest chip size and fitting < 200 TDP. Will they though...
 
And A10 is on 32nm so if they do go with Kaveri we might see a noticeable shrink there.

They could comfortably go over 2-2.5TF aggregate APU + GPU and have modest chip size and fitting < 200 TDP. Will they though...

If it is an APU + GPU, I think we're getting the total number with 1.8TF. That's about 70-80W of GPU TDP.
 
I'm not sure it's even possible to put a DDR4 chip on an interposer, and anyway they'd need 16 of them (16bit max each), that's almost impossible they'd go that route.

It's been covered already in this thread before but it's definitely doable and part of the JEDEC spec, and can go up to 8 layers.

The specs are out there but basically DDR4 will draw much less power than DDR3. And the entire nature of stacking chips in 2.5/3D lets you run more efficiently because of greatly reduced interconnect length.

I believe the best reasons to use DDR4 are because it costs less and it allows more memory at the same bus width, at the expense of being 3 times slower. If the reasoning is that they want more than 4GB memory at 256bit wide, they'd be much better with GDDR5M instead of DDR4.

I think the entire reason of going with DDR4 is it'd just be doing it right economically and technically. The volume will be there over time. And stacking them will be the only realistic way to get that rumored high bandwidth 512-bit bus. Of course they could still stack DDR3 though and the only reason they might would be if they needed better densities to get the bump up to 4+gb.
 
If it is an APU + GPU, I think we're getting the total number with 1.8TF. That's about 70-80W of GPU TDP.

Depends on the APU. If they did switch to Jaguar that leaves room for more GPU. And Kaveri is supposed to be at 100w... but after console tweaking it'd probably be under that.
 
Depends on the APU. If they did switch to Jaguar that leaves room for more GPU. And Kaveri is supposed to be at 100w... but after console tweaking it'd probably be under that.

In my eyes an APU + GPU solution only makes sense when you use a big bulldozer derivative, for example Piledriver or Steamroller. Jaguars are tiny compared to these cpu cores, you'll have no problem fitting them with a heavy hitting GPU in one SoC. But Steamroller and a heavy hitting GPU in a SoC? Could be tricky.

Instead of running after pipedreams that some anonymous users create, I just rely on these sources:

Yole Développement forecasts 2.5D stacking for next gen PlayStation
Sony might be building a 1bn Dollar SiP for next gen PlayStation

These are insiders talking, namely Yole and Tsuruta. Both are talking about 2.5D stacking for the next PlayStation.

What do we see when we look at the PlayStation Vita? It's Sony's new philosophy of the Post-Kutaragi-Era: We see conservative designed and slightly tweaked processor tech, "off the shelf", if you want, that has been refined with a state-of-the-art packaging technology for logic and memory. A console from developers for developers, or as Sony calls it: Let the hardware do the hard work.

This very philosophy, slightly tweaked and unexotic off the shelf hardware combined with-state-of-the-art packaging is exactly what a 2.5D system in package with AMD HSA tech can deliver. I think we are conform on the fact that using AMD processors makes only sense when you use the last ace up in their sleeve: The HSA. And I also think that we are conform on the idea that both next gen systems will be using a heterogeneous processor architecture. Someone brought up the Cell BE a few sites ago, which was renownedly Sony's first hetergeneous approach. A Trinity APU like the AMD A10 5800k delivers around 750 GFLOPS. This is an out of the world performance as compared with the Cell, not least because of the four big Piledrivers contrary to the Cell PPE.

Yole is not only forecasting 2.5D Interposer for the new PlayStation, they're also showing a picture that displays a Global Foundries approach of this tech. Also if this is heavy speculation now, I guess a lot of users here know what GloFo means: It means that Sony is probably not going for the Jaguar. But a Trinity doesn't work either. The next gen systems need 2013 HSA tech to erase the copy overhead which basically is the biggest enemy to GPGPU-algorithms and HSA is all about GPGPU-algorithms. So we either can expect a Kabini, a Richland or a Kaveri as a base for the new PlayStation. I don't know if Richland has the same HSA featureset as Kabini and Kaveri, so maybe we can only expect one of the last two, but Richland, or more generally speaking a 32nm APU/processor, would be easy to integrate since we're talking about 2.5D stacking.

When we take a look at the 32nm AMD A10-5700 Trinity APU then we have four cores with 3.4Ghz frequency and 384 VLIW4 streamprocessors with a TDP of 65W. An optimized APU with 28nm node, four cores at 3.2Ghz and 384 GCN streamprocessors would be able to easily undercut a TDP of 50W. An APU with a Jaguar would undercut it anyhow. As I mentioned earlier, I think that it depends on the CPU cores whether we will see a big APU or an APU + GPU solution due to the fact that we have insiders talking about 2.5D stacking in the new PlayStation. Sony pushing fabrication (big SoC) and packaging (2.5D stacking) to the limit within one system is very unlikely if you ask me.

Why am I not talking about the new XBox? Because until know I haven't stumbled upon real information like for example the Yole forecast or the Tsuruta interview. All we know about the new XBox is contradictory speculating based on statements of (in almost every case) anonymous "leakers". I don't want to intervene in this kind of speculation that already started to go round in circles. No offense. It's entertaining though. I like the idea of a big HSA SoC in combination with dedicated Kinect hardware for the XBox, if you want my opinion.
 
If Sony's investing money on the PS4 to use stacking and interposers, it would be prudent because there are multiple platforms besides the game console that would benefit from the engineering effort. It's a much more straightforward and physical benefit than the hoped-for mass synergies of Cell.
It would also be prudent because Sony may not be able to count on the PS4 to make back all that investment and drive all that volume on its own.

One argument against off-the-shelf at this point in time is that right now is actually a very awkward time for off-the-shelf. We know we are this close to things being significantly better, and current products have glaring deficiences. In either Sony or Microsoft's case, it might not hurt to try to go a bit beyond the big turn that is coming in physical implementation and system design in the next few years. Anything learned in the process is going to apply to a lot of other things.

If there were an argument against using a current APU, or Richland--which is barely any different from a current APU--it is that the programming model and system architecture are just a cycle or two away from being generally acceptable.
Kaveri might get much closer with the shared memory space, although there are future changes for quality of service, programmability, and overall performance robustness that won't be present if it's using a current GCN SIMD architecture.
 
Yeah, you're right on that. But I said "slightly tweaked" (just like the Vita SoC), so there maybe is a little space for these kind of things. But I still think Sony has a different emphasis this time.


What I forget to mention:

You're probably wondering "$1bn SiP with off the shelf processor tech?". This is very the funny part for the engineers begins: If I were Sony, I would aim for a SiP with maximum bandwith and minimum latency at the same time, which is pretty much just perfect for gaming. But this is also very expensive and we're probably see some avant-garde approach from Sony on that one.

I can imagine that slightly modified and unexotic off the shelf HSA tech with x86 ISA in combination with state-of-the-art packaging and an ultra high bandwith/ultra low latency solution would be that kind of stuff that coders at the Sony Worldwide Studios are dreaming off. Design-wise it would just be a stationary Vita.
 
If Sony's investing money on the PS4 to use stacking and interposers, it would be prudent because there are multiple platforms besides the game console that would benefit from the engineering effort. It's a much more straightforward and physical benefit than the hoped-for mass synergies of Cell.
It would also be prudent because Sony may not be able to count on the PS4 to make back all that investment and drive all that volume on its own.

Yes I think it all depends on whether or not their R&D into 3D IC is more of a trial run for future projects in general or if they view it as a way to really make a leap in hardware performance for Orbis. They've already shown to be pretty experienced in related techniques with Vita and stacked CMOS sensors, plus whatever Toshiba is cooking up.

Hecatoncheires said:
In my eyes an APU + GPU solution only makes sense when you use a big bulldozer derivative, for example Piledriver or Steamroller. Jaguars are tiny compared to these cpu cores, you'll have no problem fitting them with a heavy hitting GPU in one SoC. But Steamroller and a heavy hitting GPU in a SoC? Could be tricky.

With Steamroller I doubt a SoC would be possible. But in a stack I think it's feasible though still with heat issues. But there's a couple different ways to stack those lego pieces to mitigate that.

Hecatoncheires said:
You're probably wondering "$1bn SiP with off the shelf processor tech?". This is very the funny part for the engineers begins: If I were Sony, I would aim for a SiP with maximum bandwith and minimum latency at the same time, which is pretty much just perfect for gaming. But this is also very expensive and we're probably see some avant-garde approach from Sony on that one.

Right, I hope they really made some ground on 3D & TSV so we can see something special. Jaguar APU + a discrete GPU which may or may not exist doesn't really warrant that kind of effort.
 
Last edited by a moderator:
FWIW last time I spoke to anyone at Epic, I believe he told me that although the demo's used SVO's, they'd since changed direction.

I read they had some difficult to solve problem with textures due to using SVO. It would be a pitty to go back to a more traditional lighting rendering approach.
 
Last edited by a moderator:
Meh, I don't think what was in effect photon mapping (just not called that for various reasons) was ever going to fly in general. It would be more of a pity to have more boxes connected by corridors just to make the lighting model tractable ...
 
Meh, I don't think what was in effect photon mapping (just not called that for various reasons) was ever going to fly in general. It would be more of a pity to have more boxes connected by corridors just to make the lighting model tractable ...

Anyway they made their lighting solution the main selling point of UE4... if finally they ditch the SVOGI algorithm they will have a hard time re-marketing it. Maybe it´s not only technical problems, but number crunching problems and they are making it more next Xbox-PS4 friendly once they realized they weren´t going to be enough powerful to make their vision feasible.
 
Last edited by a moderator:
So apparently, the ESRAM and DDR can be accessed simultaneously, for a total of 170 GB/s.

Even if thuway is wrong on that, if it was true, what it mean in practical terms?

I'm assuming the ESRAM this time is more of a general purpose scratch pad, and unlike Xenos can be accessed by the CPU as well (and perhaps you aren't forced to write the framebuffer to it and can render out from main memory)
 
So apparently, the ESRAM and DDR can be accessed simultaneously, for a total of 170 GB/s.

Even if thuway is wrong on that, if it was true, what it mean in practical terms?

I'm assuming the ESRAM this time is more of a general purpose scratch pad, and unlike Xenos can be accessed by the CPU as well (and perhaps you aren't forced to write the framebuffer to it and can render out from main memory)

Why wouldn't you be able to access them simultaneously? What he said was simply that you can't simply add up those bandwidth numbers. It doesn't work that way. You have to load your data from main memory into your SRAM before you can work on it from your SRAM. That means that your maximum usable bandwidth is roughly the same as you max SRAM bandwidth. At least as far as I understand things. Of course, they could have a completely different approach, but I don't see any way where it works as if you can add up those bandwidths.

Another thing, why are we putting any trust in that 102 GB/s rumor? Does that guy have any credibility that I apparently don't know of?
 
Why wouldn't you be able to access them simultaneously? What he said was simply that you can't simply add up those bandwidth numbers. It doesn't work that way. You have to load your data from main memory into your SRAM before you can work on it from your SRAM. That means that your maximum usable bandwidth is roughly the same as you max SRAM bandwidth. At least as far as I understand things.
If the GPU can only texture etc. from ESRAM/eDRAM (not SRAM), then you'd be right. But if the GPU can read simultaneously from RAM and eDRAM, the BW is aggregate.
 
If the GPU can only texture etc. from ESRAM/eDRAM (not SRAM), then you'd be right. But if the GPU can read simultaneously from RAM and eDRAM, the BW is aggregate.

Well, then I predict a more flexible approach, one where the CPU and GPU can make use of the available eSRAM however they see fit.

If I'm making predictions anyway, then I'll predict that there will be 64 MB of eDRAM/1T-SRAM anyway :p. (That's roughly 50 mm² of die space.)
Now to be honest, I'm not sure if there will be any eDRAM on die, given the troubles MS has with it on the X-Box 360. Or maybe the reason MS keeps a daughter die with eDRAM is simply cost concerns and not a technological limitation. Actually, that seems much more likely, as IBM doesn't seem to have any trouble with implementing eDRAM on one of their huge die CPUs.

Does anyone have any idea how much extra work and costs are involved with implementing eDRAM on a chip? All I know is that eDRAM takes some extra steps compared to SRAM and that makes it harder and more expensive to implement. I just don't know the specifics.
 
Status
Not open for further replies.
Back
Top