Predict: The Next Generation Console Tech

Status
Not open for further replies.
Yes. This has the additional advantage that you don't have to worry about your software architecture as much and you will reach better average performance earlier in the console lifecycle.

Creating a software architecture that scales well over multiple threads is pretty hard and most developers don't have the skills for that. Considering that next-gen games will most probably have even larger development teams and will use even more middleware, balancing and optimizing a software architecture for 4 core is much easier than optimizing for 16 cores.

Wouldnt the addition of APUs (or even of just a few SPUs, just for the hard work) be able to do that too, ie, being easier to dev, get a nice averange performance ...

The only question is if you would sacrifice too much flexibility, but if he extra performance is just for a few, meybe then its worth the "relatively" hardwired HW, IMO.
 
One thing you shouldn't assume is that you need to educate me on the subject matter especially when I truly said nothing to suggest you need to.
Actually I was just kidding about the education and I wasn't referring to you at all. So if I've offended you I'm sorry. I didn't mean to sound condescending. :oops:
If I offer that memory will be faster and you say no it won't be fast enough there is little to discuss on the matter isn't it? The only thing we can do is see if the memory is delivered as promised. By the by Rambus is not the only player in the memory market. Rambus was merely a practical example. If it is any consolation I am as skeptical of Rambus as anyone because even if they do make good on their claims Rambus is likely to price themselves right out the market in typical Rambus fashion.
That TB/s figure you mentioned has been thrown around so often that I simply refuse to belive the hype until someone at least is showing a working pre-production sample with a fixed release date for an affordable price. A memory technology that delivers TB/s throughput while maintaining acceptable latency would be a revolution. How many revolutions did we have in the last 20 years? IMO Computer technology is all about evolution. Sometimes the gains are larger, some gains are smaller, but next-to-never the gains are in the o-my-good-I-simply-cant-belive-this range. If I'm wrong, good for us! Until then, let's just extrapolate from the past.
You are correct when you say most tasks in a game are not fully independent but there are quit a few things that are done asynchronously still.
Yes, there are some independent tasks. And there are some tasks that can parallelized quite well. But these tasks need to be scheduled and the results need to be synchronized. The question is: Are there enough tasks to keep 8 or 16 cores busy? Which leads us to the real question for next-gen systems:

What is the biggest bang for the buck? 4 or maybe 6 really fast "fat" cores (using OOoE, larger Caches, etc) or 8-16 smaller, simpler cores?
Rather than having a protracted debate on the value of multi-core I'll go back to the original question and that is how likely is it that we will see more than 8 cores on console CPUs? I thinks it's likely given Cell's roadmap that we will see more than 8 cores from Sony's side. I don't think its likely well see CPUs with cores as nice as I7 with a multiplicity of 8 or more cores because of thermal/power/cost constraints. There is a chance MS may make a few wet dreams come true by using a couple Larabee's to do "it all" in their next console and if anything LRB is a multi-core design with far more than 8 cores.
When you say more than 8 core in cell, you mean SP's, right? There is a chance we may see this, since cell is a special case. Personally I think that Sony is the hardest to predict, since they are, well, technology geeks. As for MS and Larabee: I strongly doubt it. Microsoft is all about protecting the investement of their clients and leveraging existing code base and know-how. Which is why I think they may rather go fat before adding more cores.
This is what I was thinking of when I first responded to you but when I perused what you initially offered as support for what you were saying I didn't think what you presented was going to your point. The article you linked mainly spoke to issues with the memory wall as I previously noted and while I fully recognize Amdahl's law I feel there is much to be had with concurrent execution on multiple cores.
I don't know if you have ever done any multi-threading software development yourself, but partitioning a software into independent tasks is not an easy feat. There are some low-hanging fruits, but once you picked them it, gets increasingly harder. Plus the larger the team and the larger your code base the more you want to hide the overall architectural details and let them concentrate on their tasks at hand as well as avoid having them introduce hard to find concurrency bugs, deadlocks, livelocks, etc. You're bound to loose performance here. Basically: The more cores, the more fine-grained your tasks need to be in order to utilitze your cores to their fullest potential, the more scheduling and synchronizing you will need, the harder Amdahl kicks you in the nuts.
 
Wouldnt the addition of APUs (or even of just a few SPUs, just for the hard work) be able to do that too, ie, being easier to dev, get a nice averange performance ...
Yes. I think we will see some calculation-intensive tasks (like physics) offloaded to the (GP)GPU next-gen, which even reduces the need for more cores. Which leads to the question: Does the cell concept still make sense?
 
Yes. I think we will see some calculation-intensive tasks (like physics) offloaded to the (GP)GPU next-gen, which even reduces the need for more cores. Which leads to the question: Does the cell concept still make sense?
Yes ;)

It really depends on who you ask, some say the CPU & GPU will be one homogeneous chip that does everything, some say we'll have all general purpose vector processing intensive tasks in addition to graphics on the GPU and the branchy stuff on the CPU, some say we'll adopt a cell like model.

Personally I think the Cell type model makes a bit more sense right now. Graphics are so demanding, you'll be hard stretched to find a situation where there is going to be lots of free processing power available for non graphics operations. CPUs are plenty fast at general purpose operations, but with calls for so much more flexibility in hardware (i.e more processing power in reserve for the GPU if needed), and more complex music, physics, decompression and such, the CPU could really do with becoming faster at vector processing than in previous generations; and even though the current cell has some disadvantages its made one of the biggest moves in terms of dedicating much, much more of the die to SIMD processing.

That's my opinion!
 
Actually I was just kidding about the education and I wasn't referring to you at all. So if I've offended you I'm sorry. I didn't mean to sound condescending. :oops:

Thanks. Much appreciated.

I'll reiterate that I am skeptical of Rambus so we in fact don't disagree there. I am less skeptical of GDDR5 or its successor still making things more amenable in the future not to mention going with more on chip memory isn't a bad idea at all. 100s of GB of bandwidth doesn't sound bad to me although latency could still be a rather prickly issue.

nOOb said:
Yes, there are some independent tasks. And there are some tasks that can parallelized quite well. But these tasks need to be scheduled and the results need to be synchronized. The question is: Are there enough tasks to keep 8 or 16 cores busy? Which leads us to the real question for next-gen systems:

What is the biggest bang for the buck? 4 or maybe 6 really fast "fat" cores (using OOoE, larger Caches, etc) or 8-16 smaller, simpler cores?

That's a rather wide sweeping discussion that has been had on B3D before. I could come up with useful ways to keep all cores busy and I could also come up with ways to get the same results using less cores. I have to take on OOoE vs. IOe. Deep pipelining vs shallow pipes, caches vs LS etc etc etc. I also have to define what "fast" means in itself...fast at what...relative to what???

In the end I would craft a grandiose defense of my opinion but it would be my opinion all the same just like yours. I'm not sure its worth it given opinions abound already here on B3D.

In the context of this discussion I've always looked at it from the perspective of what is most likely to happen and practically attainable not from the perspective from what I want or what I purely feel is the "best." If I were to go that route I say damn multi-core altogether and give me a 1000 peta-hertz single core CPU I can program purely by voice command.

nOOb said:
When you say more than 8 core in cell, you mean SP's, right? There is a chance we may see this, since cell is a special case. Personally I think that Sony is the hardest to predict, since they are, well, technology geeks. As for MS and Larabee: I strongly doubt it. Microsoft is all about protecting the investement of their clients and leveraging existing code base and know-how.

Yes I was referring to SPUs. Console hw typically is a collection of special cases. Anyhow, I also think Larabee has an uphill battle for it to appear in MS's next console but none the less the possibility exists because there MS may be able to do exactly as you suggest. LRB understands directx (or your own software rendering pipeline should you want to make one) and is flexible enough to run bog standard x86 code which could make it quite appealing to MS. I have expressed my skepticism on LRB here on B3D but left room for the fact that my concerns could be addressed by MS.

nOOb said:
Which is why I think they may rather go fat before adding more cores. I don't know if you have ever done any multi-threading software development yourself

I wrote my first multi-process /multi-threaded app nearly a decade ago using forks and posix threads. I've written production code at a lot of different places ranging from single CPUs/cores to multi-threaded grid/cluster/cloud applications which must synchronize not only in a local environment but across disjoint grid/cluster/cloud networks spanning the globe.

I assure you that I understand you.

nOOb said:
, but partitioning a software into independent tasks is not an easy feat.

I don't disagree. However if you've truly written scalable code your code should be mostly agnostic to the core count or easily tweaked to account for it.

nOOb said:
There are some low-hanging fruits, but once you picked them it, gets increasingly harder. Plus the larger the team and the larger your code base the more you want to hide the overall architectural details and let them concentrate on their tasks at hand as well as avoid having them introduce hard to find concurrency bugs, deadlocks, livelocks, etc. You're bound to loose performance here. Basically: The more cores, the more fine-grained your tasks need to be in order to utilitze your cores to their fullest potential, the more scheduling and synchronizing you will need, the harder Amdahl kicks you in the nuts.

Which is exactly why the majority of your team shouldn't be allowed to use threads but instead write methods that plug into a scalable framework that accomplishes what you want it to. Like it or not but the burden is actually on proper design, management and execution a great deal more than it is on chip design. This is something I spoke to in the now dead thread discussing multi-platform development concerns.

I understand Amdahl's law but that is not synonymous with development practice or process if you will. I don't want to sound flippant so I do want to say I understand that issues with multi-core development greatly affects the development process.
 
Last edited by a moderator:
How is the current recession going to effect a console company shopping for technology? Are they likely to get better deals, or deals that were otherwise not possible except under the shadow of the current recession?

For example, if a console manufacturer were to approach AMD would a partnership where one provides the expertise and the other provides the money and at the end of the day a product is developed which can be produced at the discretion of both? For example, could a console manufacturer approach AMD and slap a cool half billion into their hand and say "Make me a CPU/GPU combination similar to the Cell but capable of performing the functions of both simultaniously and efficiently"?
 
How is the current recession going to effect a console company shopping for technology?
Personally I'm still not sure whether the shopping habits (of the console company) will be more affected by fluctuating cash flows and tech prices of semiconductor companies, or the fear of consumers not adopting hardware too far out of their price range.
scificube said:
I don't disagree. However if you've truly written scalable code your code should be mostly agnostic to the core count or easily tweaked to account for it.
Good point. I think many lessons learnt from this generation will transfer well to the next. Even if initial software that just tries to account for the increase in processors doesn't take full advantage of the 8-16 cores, refinements in multi-threading over time could offer a lot of headroom.
 
For example, if a console manufacturer were to approach AMD would a partnership where one provides the expertise and the other provides the money and at the end of the day a product is developed which can be produced at the discretion of both? For example, could a console manufacturer approach AMD and slap a cool half billion into their hand and say "Make me a CPU/GPU combination similar to the Cell but capable of performing the functions of both simultaniously and efficiently"?

Althought Cell is not a GPU, and it doesnt try to be, AMD already is doing one its Fusion, and if Intel is to integrate the Larrabe in the CPU (that the plan? right) they are doing it too.

Probably both companys would be happy to have thir chips on a console, but AMD would be easier I think, for one side they need it more (the money, plus the leap thier C/GPUs could get in gaming(+?) benchmarks, plus a better knowledge of the CPUs, and on the top and widespread of their new design). Also it should be easier to costum the chip (M-Space thing) and to get a new contract as they should be happy with their ATI contracts.
 
Since the Power Processor Element in Xenon and CELL were derived from POWER4, I wouldn't be surprised if the main cores of the next-gen Xbox and PS4 CPUs are derived from the current POWER6 rather than the upcoming POWER7.

The PPE was derived from an older research processor called GuTS. It has nothing to do with POWER4 (other than being compatible with it).

The POWER7 is meant to be a "hybrid core". I suspect this means being able to dynamically switch the number of threads it's running.

There were some leaked Cell slides that the next gen PPE is based on POWER7 technology.
 
Althought Cell is not a GPU, and it doesnt try to be, AMD already is doing one its Fusion, and if Intel is to integrate the Larrabe in the CPU (that the plan? right) they are doing it too.

Probably both companys would be happy to have thir chips on a console, but AMD would be easier I think, for one side they need it more (the money, plus the leap thier C/GPUs could get in gaming(+?) benchmarks, plus a better knowledge of the CPUs, and on the top and widespread of their new design). Also it should be easier to costum the chip (M-Space thing) and to get a new contract as they should be happy with their ATI contracts.

I guess its one thing slapping a GPU and CPU together, and its a completely seperate "thing" to design the processor so the main threads of a program could process on the GPU shaders. I would suggest the latter rather than the former would be the ultimate design goal for a next generation console. Off chip bandwidth is expensive and you'd save a lot of money and cut the packaging/cooling requirements significantly I guess if it was all done on the one die. Theres also the aspect of duplicating roles, if the GPU architecture does some things better I guess it may pay to have more of it.
 
Disclaimer: not sure I properly understood the last two posts.

It's likely that at some point Ms will manage to merge the xenon and the xenos on the same chip, I don't think that it will be a problem if next time around they chose to do it from scratch not matter if they use a PowerPC + ATI part instead of a complete ATI/AMD solution.
For retro compatibility sake I see Ms choose the former solution.

A little question as systems seem more and more likely to launch around 2012, I think that the GPU (no matter the manufacturer) are really likely to be larrabee like ie bunch of multi threaded (at least 4 threads) simple CPUs (no matter the chosen ISA inc) stuck to wide SIMD units.
What do you dear members thinks about how could look the gpu by this time?
 
Last edited by a moderator:
The PPE was derived from an older research processor called GuTS. It has nothing to do with POWER4 (other than being compatible with it).

The POWER7 is meant to be a "hybrid core". I suspect this means being able to dynamically switch the number of threads it's running.

There were some leaked Cell slides that the next gen PPE is based on POWER7 technology.

Can you give more details and a source please.

I guess its one thing slapping a GPU and CPU together, and its a completely seperate "thing" to design the processor so the main threads of a program could process on the GPU shaders. I would suggest the latter rather than the former would be the ultimate design goal for a next generation console. Off chip bandwidth is expensive and you'd save a lot of money and cut the packaging/cooling requirements significantly I guess if it was all done on the one die. Theres also the aspect of duplicating roles, if the GPU architecture does some things better I guess it may pay to have more of it.

Well I got the idea (meybe is wrong) that AMD, doesnt plan to just make a dual chip processor with a CPU and a GPU but they really plan to make Fusion, I belive they said that the GPU part could process others things (GPGPU) much more easly this way too. IIRC the only diferences between processors would be 1) the number of cores/execution units and 2) the kind/number(?) of APUs.

About intel they could do more than a dual chip processor (althought, given their past I doubt on a first gen), actually if wasnt the memory architeture (one big pool on Larrabe, meybe later shared with the CPU?) and some hardwired GPU stuff it would be quite like the Cell (or a second genaration of it).
 
Disclaimer: not sure I properly understood the last two posts.

It's likely that at some point Ms will manage to merge the xenon and the xenos on the same chip, I don't think that it will be a problem if next time around they chose to do it from scratch not matter they use a PowerPC + ATI part instead of a complete ATI/AMD solution.
For retro compatibility sake I see Ms choose the former solution.

A little question as system seem more and more likely to launch around 2012, I think that the GPU (no matter the manufacturer) are really likely to be larrabee like ie bunch of multi threaded (at least 4 threads) simple CPU (no matter the chosen ISA inc) stuch to wide SIMD units.
What do you dear member thinks about how could look the gpu by this time?
From my point of view, manufacturing the chips on the same physical die (CPU + GPU) will almost certainly happen; simpler manufacturing process, potentially lower cost and greater bandwidth between the main parts.

As for the actual architecture of the GPUs, at this point I'm with the idea that shader units will become completely general purpose, but I don't think every aspect of graphics should be software accelerated when some can remain fixed function in my opinion.
 
I really want to see if/how the Cell architecture could counter Larrabee in the field of the general purpose mass-arrays. The 45/40nm process would certainly give some engineering edge for the next wave of the new consoles. Sony wanted, even prior the PS3 release, for Cell to play a bigger role in GFX-related loads. An 4+ GHz or even async PPE/SPE clock domains would prompt for shifting some priorities for the GPU impl costs away.
 
PS4 Spec are many guestimate ideas. Edopost say

PS4 Motherboard
The PS4 motherboard looks like the later PS3 motherboards (no PS2 compatibility chips).

The architecture of the PS4 motherboard divides the main pieces of the system into 1024MB DDR4 memory, Cell2, RSX2, and 1024MB GDDR5. The HDMI2 display is connected to the 1024MB GDDR5 (the video memory). The communication path also lines up in that order. Therefore, communication with the 1024MB DDR4 memory must go through Cell2, and communication with the GDDR5 must go through the RSX2. Below has more info on each of the components.

The processor is ultimatly idea. Wow.
The Cell2 CPU has three 4.2Ghz PPE (Power Processor Element) with two threads and eighteen 4.2Ghz SPE (Synergistic Processing Elements). The three PPE are general purpose CPUs, while the eighteen SPE are geared towards processing data in parallel. Two SPEs are disabled to increase yield, so the PS4 can have at most 22 threads runnings at the same time (6 from PPE and 16 from SPE). Note that one SPE is reserved for the hypervisor, so PS4 programs can take advantage of 21 threads. The Cell2 was introduced at 45nm and later PS4 model numbers starting with CECKG uses the 32nm version.
3 PPE (Power Processor Element)
4.2Ghz
2 threads (can run at same time)
L1 cache: 32kB data + 32kB instruction
L2 cache: 512kB
Memory bus width: 64bit (serial)
VMX (Altivec) instruction set support
Full IEEE-754 compliant
18 SPE (Synergistic Processing Element)
4.2Ghz
2 SPE disabled to improve chip yield
1 SPE dedicated for hypervisor security
256kB local store per SPE
128 registers per SPE
Dual Issue (Each SPE can execute 2 instructions per clock)
IEEE-754 compliant in double precision (single precision round-towards-zero instead of round-towards-even)
45nm technology (initial models)
32nm technology (later PS3 models)

RSX2 Ultimate?
The RSX2 is a graphical processor unit (GPU) based off of the nVidia GeForce GTX 480 graphics processor, and is a G500/G600 hybrid with some modifications. The RSX2 has unified vertex and pixel shader pipelines. The following are relevant information about the RSX2...
48 vertex shader texture units
48 pixel shader texture units
48 geometry shader texture units
32 ARB texture units
48 Raster Operations Pipeline units (ROPs)
Includes 1024MB GDDR5 graphics memory
GDDR5 Memory interface bus width: 512bit
55nm and 45nm technology
More features are revealed in the following chart delineating the differences between the RSX2 and the nVidia GeForce GTX 480



Aha..What are your opinion about these spec ?

Now some sources in Japan has some prototype machines from SCEI already.
Guest about it specification of next PS family console?

Here is 2 model of prototype machines.

Model 1 (likely and upgraded PS3 maybe market around Q12010 as PSX2)
CPU Cell Boardband Engine (1PPE/8SPE) @ 4.5GHz
2GB DDR3 SDRAM Dual Channel
GPU Nvidia GTX280 with PCIe 2.0
1GB DDR3 SDRAM
SATA300 HDD 7200RPM 16MB Buffer
HDMI 1.3A

The system target as center of multimedia player functions. It can do image processing, media processing, web browser (maybe new version of GoogleChrome) and many future
application to work together with Digital Camera, Digital Video Camera and flat panel tvs.
Ofcourse, It can play PS3 PS2 and PS1 games.

Model 2 (Unlikely Cell BE platform)
CPU Sony Toshiba Custom CPU @ 4.5 GHz. (Unknow architect but not same as PS3)
GPU Sony Toshiba and Nvidia Custom GPU @1.2GHz with 64MB eDRAM)
RAM 2GB Rambus XDR2 @ 6.4GHz
HDD Sumsung SSD 250GB SATA300
HDMI 1.3A

Many devs in japan expect this as next PS family product however many factors of this system are secret parts. Maybe this is PS2/PS3 hybrid machine they've back to use
eDRAM again since PS2's GS GPU debut.
 
The specs seem very weird to me and certainly not conscious of the market. Firstly, wasn't the first PSX a total flop in Japan? Why make another one? And why would you put in a GTX280 into it when nobody's going to make games for it?
 
Simply looking the rsx2 specs, make obiovus that are made by a kid masturbating while multipling the actual specs not knowing that it's old technology abandoned by nvidia
 
The RSX2 is a graphical processor unit (GPU) based off of the nVidia GeForce GTX 480 graphics processor, and is a G500/G600 hybrid with some modifications. The RSX2 has unified vertex and pixel shader pipelines. The following are relevant information about the RSX2...
48 vertex shader texture units
48 pixel shader texture units
48 geometry shader texture units

32 ARB texture units
48 Raster Operations Pipeline units (ROPs)
Includes 1024MB GDDR5 graphics memory
GDDR5 Memory interface bus width: 512bit
55nm and 45nm technology
More features are revealed in the following chart delineating the differences between the RSX2 and the nVidia GeForce GTX 480

Oh the irony. Seems someone copied and pasted some stuff and forget to edit it all.
 
Status
Not open for further replies.
Back
Top