Predict: The Next Generation Console Tech

Status
Not open for further replies.
Do what first party devs do: MLAA and Tile-Based Deferred Lighting!
Biggest problem would be that this 15% of the chip is only a tiny part of the bigger picture (and hopefully far more powerful CPU) and likely would be using a different instruction set. So if you don't fit inside it's computational limits for your subtask you are out of luck.
 
They need to define what PS4 is first.

The SPUs are very handy in real-time machine tracking/recognition/learning, computationally intensive compression/decompression (OnLive-like service ?), fast security, etc. while the GPU is running. The architecture is also great for managing memory wall and bandwidth without edRAM. If they can come up with a good *system* design to address all these challenges, then it doesn't really matter if Sony uses SPU or not. The only question is backward compatibility.

Personally, I'm more interested to see if Sony will add Thunderbolt to PS3 and PS4. Upgrade the firmware to support a Playstation grid for various personal needs (Make the box slim and flat/stackable !). ^__^

The original concept of Cell computing is to preserve your existing investments by "growing" a personal network of intelligent (heterogeneous) nodes. While cloud computing addresses a good part of the needs, there are other very interesting applications we can do with a local cloud, especially when combined with the capabilities above. It would be useful even if PSN is down. :devilish:

Specifically, programming model-wise, if they continue to encourage developers to optimize for data locality, they should be able to distribute the work to various nodes more easily.
 
You read that just fine. Take a look at one of DICE's recent slides with one stat on culling, I believe. There are others. That wasn't an "out of the blue" statement. There are tasks that Cell is about twice as fast at than modern 4-core Intel offerings.

Yes there are, and there are many, many more where it's far slower. That doesn't make it faster overall. That just makes it a completely different architecture. If you want to prop up a weak GPU in graphics related tasks Cell probably is a bit better but if you want a decent CPU, theres no comparison. Or put another way, which do you think the 360 would have benefited more from instead of Xenon? An i7 or Cell?
 
I can't see a non-Cell chip emulating Cell, so Sony will have to think hard about whether they want to lose everything done on the PS3 to future generations. Given how widely they have been pimping PS1 classics, it's not like they have no appetite for selling older generation games if they can manage it.

Wikipedia says that Cell was built for a $400 million budget, which is considerably less than has been mentioned in this thread. If Sony were able to get something like a 4 PPE, 12-16 SPU processor for PS4, I would think that would serve them in good stead for the next gen, while preserving their software investment that they spent developing PS3.

As far as the question about Xenon vs. i7 vs. Cell, I don't know the answer to that.. how much of what Xenon does in current games is vector processing? How much main processor power would actually be necessary when you've got GPGU processing on a 10,000 thread GPU?

How difficult is GPGU programming vs. SPU programming, anyway?
 
Yes... if PS4 is also based on Cell, and can tap on my PS3's Cell over gigabit (or better !), then it would reward old timers like us too. :p

But I don't want to put pressure on Sony unnecessarily [Trying very hard not to fidget here]
 
Yes... if PS4 is also based on Cell, and can tap on my PS3's Cell over gigabit (or better !), then it would reward old timers like us too. :p

But I don't want to put pressure on Sony unnecessarily [Trying very hard not to fidget here]

I'm not a computer architect, but I did stay at a Holiday Inn Express last night, and I can tell you that Sony will not be doing Cell multiprocessing over GigE. :LOL:
 
Another kind of variant Sony and IBM could do to Cell would be to keep the same number of SPUs but increase their local storage. Cell was always designed to be able to scale along multiple dimensions, as I understand it.
 
B/C aside, is there any reason that Sony couldn't use something derived from the Power 7 series, complete with OoOE and all the other bells and whistles?
 
The technical specs will be dictated by economics.

A BOM of $500 seems fair for a $400-$450 launch price, I don't believe Sony nor MS will abandon the loss leading model. This gives us a cost breakdown very similar to last gen:
Around 450 mm² die area in total for CPU and GPU.
4GB RAM
Optical drive
$30-$50 worth of permanent storage

Both MS and Sony had reliability problems related to heat/cooling. Microsoft had a tons of heat related RRODs and Sony had failures related to their neat low noise centrifugal cooling system. Since quality cooling solutions costs, I expect both to aim at a slightly lower power consumption of 120-150W.

Speed will be primarily dictated by power consumption, secondly by yield. Power scales with operating frequency to the third degree, lowering speed by 26% halves power consumption, this gives a lot of room for hitting the right power envelope.

Cheers
I would like that to come true as I managed to convinced my self that a single pretty big APU was the way to go.
I'm no longer sure about the 4GB of RAM, last Laa-Yosh post managed somehow to convince me that 2GB could be enough. It just sounds "wise".

I assume that you think of a 2 chips design (one CPU and one GPU). I would you split the budget?
I assume
a lot
that 1/3 for the CPU and 2/3 for GPU is a reasonable bet. That's ~150mm2 for the CPU and 300mm2 for the GPU.
Asuming 32nm and 28nm processes would be used, one could fit 4/6 high performance CPUs cores (~gross estimate based on Llano supposed size) and something bigger more powerful than cayman.
Without taking in account power consumption and more importantly thermal dissipation that means really high single thread performances and maybe ~3TFLOPS of power from the GPU.
I checked the Hd 6970/50 characteristics in this regard again (idle/ under load):
HD 6970: 22/210 W //// 40° C / 92° C //// 39db / 49 db
HD 6950: 20/160 W //// 47° C / 90° C //// 39db / 47 db
That's pretty ugly, manufacturers may have (as you say) to significantly cut the clock speed to meet the requirements of a pretty tiny closed box especially as they would want all the valid chip on the wafer no matter their power/thermal characteristic.
Consistently cutting clock speed the chip should deliver ~2TFLOPS (gross guesstimate). That's beefy it's almost a ten time increase in raw power vs say a Xenos (more taking in account the gain in efficiency).
I assume that the manufacturers would go with a UMA memory model with the GPU integrating the north bridge (aka the 360). That means CPU perfs will suffer a bit from the latency of GDDR5 to begin with and from the extra latencies associated with an external memory controller.

I can help but think that with that silicon budget, the power and thermal constrains one could try to push out a "larrabee like". I've been closely tryign to follow the conversation about the odds of larrabee @22nm, I wonder if it would be possible now in a closed box. The context in not the same in the aforementioned thread it's about the relevance of CPU rendering vs using an IGP, they are clearly different opinions actually may be not considering the same time frame.
Nick has some neat ideas to alleviate some of the lacking of nowadays CPUs (that were actually share by the larrabee core we know). He gave links on some Intel paper showing how scatter-gather could be implemented on a CPU. He also explained how extending the registers width vs the actual SIMD/ALUs width could be beneficial. If I understand the benefit, it hide X times the latency (depending on the ration between the actual SIMD width and the register width), it also the number of instruction you need to get your core busy.
We're not interested that much in IGP level of performances at least not by nowadays IGP level of performance, so texture units are needed for the design.

While trying to be conservative I wondered about what such a chip could do nowadays.
I considered 8 cores @2.4GHz.
The chip would mostly be mix of the one described in the Intel paper, original larrabee as well as as bending in Nick insight and features that are now common in nowadays CPUs (and more importantly in IBM ones) and Xenon/PPU.
64KB of L1 (32+32). (I$ 2-way D$ 4-way)
x8 256KB of L2. (8-way)
Lower than latency than the one in Xenon/PPU
16 Wide SIMD (512 bits as in larrabee).
64 wide registers (2048 bits as in I guess Xenos and lots of ATI GPUs).
The scalar pipeline would be an improved (not drastically) version of the xenon one.
4 way multi threading (either barrel or round/robin).
Some MB of L3.
There would be up to date text units.
Cores and text units should be tied to a ring bus.
integrated memory controller (DDR3).
Fast chip to chip interconnect.

The CPUs would have pretty complete power management measures, turbo mode and clock gating. For example if the SIMD is not used it's killed power wise and the scalar pipeline is allowed to run at way higher frequency (at least the frequency of xenon) and depending on the temperature of the chip as a whole frequency would adapt further(say plenty of cores are idle).

Such a chip would deliver way higher single thread performance than for example the same number of larrabee cores (not hard) and if SIMD are killed the goal should be to do a bit better per cycle than a PPU/Xenon. It should deliver 8x16x2x2.4 =>614GFLOPS 2 of them 1.2 TFLOPS.

Say the system consists of two chips and 2GB of DDR3
Vs the first system ( cpu+gpu 2GB of GDDR5 ) it would fall short in raw power that's sure, there's less to begin with and the first system doesn't have to hit in resources to ensure operations as rasterization for example.
On the other hand the system would cheaper, R&D for only one chip, scale economy (you produce twice as much), cheaper RAM. Actually the chip could end reasonably tiny and 2 may fit on the same package from scratch 32 cores larrabee was big @45 nm, a big quarter of that @32nm even "non Intel process could be pretty reasonable. Larrabee was +500mm2 taking 1/3 of that and going by a 0.7 scaling while moving to 32nm gives 120mm2 (which leaves room for the L3 + optimizing the design for power vs density), why not a video encoding/decoding unit).
So it could end with either more ram, more Flash on board, or simply a price advantage.
Overall it would be I think an interesting match with in standard situations a performance advantage for the first system (something like one rendering @1080p and the other at 1440*1080) but we may see stuffs impossible or tough to pull out on first system on the second one.
 
Last edited by a moderator:
Or put another way, which do you think the 360 would have benefited more from instead of Xenon? An i7 or Cell?

Xenon has ~165M transistors. Dualcore I7 has ~500M. For the budget of one Xenon you could get a bit more than half an I7 core. Cell had ~241M
B/C aside, is there any reason that Sony couldn't use something derived from the Power 7 series, complete with OoOE and all the other bells and whistles?
From what I understand power7 has a metric ton of extra hardware for stuff that consoles most likely will never need like hardware BCD and all sorts of encryption things.
 
B/C aside, is there any reason that Sony couldn't use something derived from the Power 7 series, complete with OoOE and all the other bells and whistles?

The Power7 core itself (including 256kB L2) is pretty tiny. It's all the bells and whistles that make it huge (SMP linkage and eDRAM). Just look at the die photo. :p
 
Xenon has ~165M transistors. Dualcore I7 has ~500M. For the budget of one Xenon you could get a bit more than half an I7 core. Cell had ~241MFrom what I understand power7 has a metric ton of extra hardware for stuff that consoles most likely will never need like hardware BCD and all sorts of encryption things.

A dualcore i7??

What are you talking about?
 
Part two :)

I see plenty of use for such a system.
Thinking of what N is planning to do on the Wii2 well I see a world of opportunity. For example say they set limits to the amount of resources arcade games (in # of cores, the amount of ram), if they use the same kind of controllers I can imagine people playing different games.
Depending on the OS, there is a lot a lot of possibility, one watch some show while browsing some stuff on the internet, another one too, one plays, etc. It could be possible with virtualization of the resource when the system is not running "full blown games".
To some extend that means for MS to compete with themselves but as PC are slowing and Windows 8 on ARM is quiet a bet why not? Ms could try to push extra functionality through the live, like OK your system may replace your PC but to access those functionality you'll have to have a gold account.
 
I'm hoping Sony & Nvidia chose to use Nvidia's Maxwell architecture instead of Kepler because Kepler will almost certainly be a DX11 part, whereas Maxwell will probably be DX12. I know Sony won't use DirectX, they use OpenGL, I'm just saying because I'm not familar with the kind of OpenGL Sony uses with PS3.

Anyway I'd like to see a 4PPE / 32 SPE Cell CPU, a Maxwell variant for the GPU and Rambus Terabyte Initiative RAM,.
 
A dualcore i7??

What are you talking about?
My point was that you simply wouldn't have enough transistors to even create one core of I7 so you couldn't have anywhere near the performance of even a single I7 core in a console if you had used that technology in 2005
I know Sony won't use DirectX, they use OpenGL, I'm just saying because I'm not familar with the kind of OpenGL Sony uses with PS3.
Technically they have a variant of OpenGL ES in PS3 but pretty much everyone are using native API instead that has some resemblance to OpenGL but is quite different from it.
 
Status
Not open for further replies.
Back
Top