Predict: The Next Generation Console Tech

Status
Not open for further replies.
...Does anyone see 15 in order cores sharing 5MB of cache as being a "Good Thing (TM)" in 2011-2017? MS may go with a more "traditional" multicore design, but there is going to be a need for a lot of progress in the design outside of slapping the same system together but with more cores.
To get efficient use, that's true. However, they do have the option of not thrashing the hardware, going cheap and 'good enough'. The law of diminishing returns will be even more applicable next-gen. A quarter of PS4's (assuming for illustration purposes PS4 goes the whole-hog performance king route) performance will still look 'good enough' in all areas, and be a lot cheaper and potentially less prone to errors. As Nintendo have demonstrated (more in long-term handheld performance than short-term Wii which we don't know how it will pan out over the coming years) you don't need the performance crown to be a very successful business. I think a mediocre platform that's launched early on the back of previous success and manages a large leap in quality is a viable strategy, at which point ease of development might be well worth pursuing over performance optimizations. If a dozen standard cores slapped onto the same die are easy to work with even if they are performance monsters, it might happen. I don't know what development would be like on such a CPU though.
 
If MS sticks with the same tech just more cores for the CPU, would it b possible for them to utilize a 256-bit bus for the entire system for more bandwidth? It would seem like that was a failing on MS's part for this generation. Something that could be fixed, assuming using such a wide bus isn't super expensive still.

That and maybe add more cache?


256-bit bus is no longer considered super wide. it's the standard for midrange GPUs now.

512-bit is the widest conventional, external memory bus there is, not counting internal ring-busses or eDRAM busses.

I'd like to think that Microsoft will go with a 512-bit external memory bus for next-gen Xbox (~4-5 years away) providing at least 500-750 GB/sec for the CPU and GPU (with GDDR6?)
that won't even be the highest bandwidth possible given that Rambus will have 1000 GB/sec (1TB/sec) external memory by 2010.

combined with even higher bandwidth eDRAM. If external memory can do 1 TB/sec by 2010 in Rambus memory, which Xbox3 probably won't have, I wonder what kind of bandwidth eDRAM could have, 10 TB ?
 
Last edited by a moderator:
I don't know what development would be like on such a CPU though.
Microsoft is very interested in many-core chips and investing huge money to develop the software process, it's even more relevant in their cash cow.
http://www.nytimes.com/2007/12/17/technology/17chip.html?pagewanted=1&_r=2
Faster Chips Are Leaving Programmers in Their Dust

By JOHN MARKOFF
Published: December 17, 2007

...

The potential speed of chips is still climbing, but now the software they run is having trouble keeping up. Newer chips with multiple processors require dauntingly complex software that breaks up computing chores into chunks that can be processed at the same time.

The challenges have not dented the enthusiasm for the potential of the new parallel chips at Microsoft, where executives are betting that the arrival of manycore chips — processors with more than eight cores, possible as soon as 2010 — will transform the world of personal computing.

...

Microsoft sees this as the company’s principal opportunity, and industry executives have said that the arrival of manycore microprocessors is likely to be timed to the arrival of “Windows 7.” That is the name the company has given to the follow-on operating system to Windows Vista.
Hmm you may be familiar with this analogy ;)
Microsoft executives argue that such an advance would herald the advent of a class of consumer and office-oriented programs that could end the keyboard-and-mouse computing era by allowing even hand-held devices to see, listen, speak and make complex real-world decisions — in the process, transforming computers from tools into companions.

...

The opportunity for the company is striking, Mr. Mundie said, because manycore chips will offer the kind of leap in processing power that makes it possible to take computing in fundamentally new directions.

He envisions modern chips that will increasingly resemble musical orchestras. Rather than having tiled arrays of identical processors, the microprocessor of the future will include many different computing cores, each built to solve a specific type of problem. A.M.D. has already announced its intent to blend both graphics and traditional processing units onto a single piece of silicon.

In the future, Mr. Mundie said, parallel software will take on tasks that make the computer increasingly act as an intelligent personal assistant.
 
Microsoft is very interested in many-core chips and investing huge money to develop the software process, it's even more relevant in their cash cow.
I meant specifically the 'dumb' many-core design, that's just a shovel full of cores slapped on a big cache and left to get on with it, with no regard for inter-core operability or efficiencies. Are the MS tools in development going to be happy with that, or are they designing for advanced, properly ground-up designed many-core processors and thus their tools won't be very functional on what basically amounts to a dozen PPCs in a line on the mobo.
 
It's safe to assume that Nintendo would HAVE to make a significant upgrade and not another GC > Wii level upgrade, simply due to manufacturing costs, right? I mean, in 5 years or so, it will be more cost efficient for Nintendo to spend money on more powerful CPUs, since the much older (Xbox level) models wouldn't be as cheap since manufacturing would go into the more recent models, right?
 
I think MS will go with more homogenous cores with more (shared) cache. In the 2010 time frame I expect they'll be able to fit about 8 cores on a die, with up to 4 hyperthreads each. I believe they'll stick to in-order execution.
Hyperthreads should be the preferred way to hide latency (as opposed to OOO), except for one little problem, it forces you to manually syncronize access to shared memory. I wish they had a HW solution that does a super cheap atomic operations if the contention is between the hyperthreads, or just disables switching to a hypethread whenever you're in a critical section. Right now you pay the same (expensive) price for atomic ops or mutex locks regardless if you have contention or not, and regardless if the contention is between real threads or hyperthreads.
Of course a programmer has no way of knowing where the contentions is coming from, unless one micromanages where the threads are assigned to, which might be quite difficult if you have 32 logical cores, not to mention the potential perils if one were to mistakenly assign the threads to the wrong logical cores.

In general, the beauty of GPU architecture/shaders is that it hides the access latency AND deals with concurrency issues for you.

We still need to figure out the correct mix of SW and HW to achieve this kind of abstraction on a general purpose core.
Thanks for your response!
Do you (not only you devs or other knowledgeable people) really need that many hardware threads? Interestingly you seems to back u Intel choice with larrabee which supposley will support 4 threads per core.
For which king of workload this would interesting?

Gubbi said:
I whole heartily agree. We need cpu cores with good solid single thread performance, - but at a low power usage point so that we can stick a whole bunch of them on the same die. Which is why I still consider PA Semi's PPC core the best building block for the next gen massively multicored MPU SOC.
Cheers
here some link for those interested in what PA sem pwr cores are ;)
http://www.pasemi.com/processors/downloads.html
http://www.realworldtech.com/page.cfm?ArticleID=RWT102405055354&p=1

Barbarian what would you think of 8 cores of this kind?

Gubbi what do you consider as massively multi cores?
I remember you say that the sweet spot was 4 cores, Barbarien seems to consider 8 cores, and valves said that they were working on a engine that scale almost linearly till 4 cores but do very well till 8.

More this core are tinny, MS would be left with quiet some transistors for some others stuffs.

EDIT i want to add that each core can achieve a spike performance of 16 GFlops, but in fact the building blocks are made of two cores(32GFLOPS) ,so a eight cores would be worse 128Gflops.
For a marketing point of view really not a good thing has they sold the xenon @88GFOPS, they could have legs to clock these core @ higher speed, but still not a huge improvement "PR wize".
 
Last edited by a moderator:
Microsoft executives argue that such an advance would herald the advent of a class of consumer and office-oriented programs that could end the keyboard-and-mouse computing era by allowing even hand-held devices to see, listen, speak and make complex real-world decisions — in the process, transforming computers from tools into companions.

...

The opportunity for the company is striking, Mr. Mundie said, because manycore chips will offer the kind of leap in processing power that makes it possible to take computing in fundamentally new directions.

He envisions modern chips that will increasingly resemble musical orchestras. Rather than having tiled arrays of identical processors, the microprocessor of the future will include many different computing cores, each built to solve a specific type of problem. A.M.D. has already announced its intent to blend both graphics and traditional processing units onto a single piece of silicon.

In the future, Mr. Mundie said, parallel software will take on tasks that make the computer increasingly act as an intelligent personal assistant.


so not only manycore but also many different types of cores.
 
so not only manycore but also many different types of cores.
I wonder if this is important for the next generation of consoles, I'm not sure that a system on a chip would provide enough performances for the next generation.

But MS will have to deal with that anyway for the low end part of the PC market.
 
It's not system on a chip, but special logic units. It's no different to the existing 'single core' CPU's having floating point, arithmetic, and memory management units. Going back to the 68000, these functions were provided as separate processors. Because every CPU is required to handle these processing tasks, it makes sense to integrate the logic onto one die. Likewise it makes sense to integrate future specialised processing circuits onto one die. So following that same pattern, we should add a SIMD processing unit, and a graphics processing unit (specifically some SIMD processor designed to handle 'texture' lookups) and so on. Looking at it from another way, you'd want units to handle branchy code, units for vector processing with structured memory access that can work with low latency, units for the same but on high latency accesses, and all the different workloads, such that for each function the execution is ideal, rather than a generalised processing system that fumbles around a bit with non optimal code. Clearly that was never and option when we just had enough room for a few cores at best, but going forwards with the option of a dozen or more processors, it becomes and option to integrate the functions, just like the ALU, MMU and FPU came together. Quite how one would have to code for such a monster though, I don't know! Ideally it'd sort out the workloads itself, but i've no idea what the state of play for that is, and I imagine it's very poor. Compilers can't do it automatically, so how could a processor?!
 
I said said sytem on a chip but I could have said "fusion like" processor I hope that AMD didn't buy Clearspeed for nothing ;)
 
some might disagree with me, and I could be wrong, so eventually I might even disagree with myself, but I was thinking that next-gen consoles should have Super-CPUs that do everything except the final few stages in the graphics rendering pipeline. all the stuff currently handled by GPU's Vertex Shader, Geometry Shader, Tessellator, texture units, pixel shaders/

so basicly the graphics chip becomes more like the Graphics Synthesizer in PS2, or the EDRAM chip in Xbox. containing the ROPS, anti-aliasing unit, units for other back-end functions, filtering, etc and EDRAM if any.

so 1/2 to 2/3 of the graphics pipeline gets done by certain cores within the CPU. since CPUs can be clocked several times higher, it could benefit. even with regional clocking (?) the more that gets put into the CPU the better. that would also let the graphics chip have more room for pure rasterization. yes there would be need for a graphics chip, not everything should be put on the CPU, otherwise you'd start to compromize.


I also agree that Nintendo will need a major hardware upgrade. it won't be enough to make a Gamecube 4x.
something with somewhat more power than the Xbox 360, which would still make it old and cheap technology by 2011.

Wii 2 should be dual-core, and each core much more robust than the PPE in Cell or PPEs in Xenon.
not a complex CPU. just 2 cores, but each one should be much, much newer than the G3-derivatives in Gekko and Broadway. Also, even a lowend AMD GPU in 2011 should be more than either Xenos or RSX.
 
Last edited by a moderator:
Disclaimer: I do not follow the graphics industry as religiously as most of you guys do, so take this with a grain of salt. :oops:

Given the connected nature of games these days and the pervasiveness of Digital Rights Management, I think the processing potential of consoles should improve only marginally, while the prowess of online servers should increase geometrically. Games are getting more complicated; and it doesn't make sense for all consoles on a network to have to mull over the same problems (and potentially come up with different results). I think it would be better to have server farms handle most of the calculation/production issues and send the results to consoles, like broadcasters and TVs.

Timesharing on a vast collection of supercomputers could mean more elaborate scenery, more meticulous physics, more 'human' AI, and data that is neither stored nor executed on the user's machine. More importantly, the environment will tolerate sloppy code without ill effect; and this should encourage a quicker time to market.

I guess what I envision is a completely different manufacturing and networking model.
 
Timesharing on a vast collection of supercomputers could mean more elaborate scenery, more meticulous physics, more 'human' AI, and data that is neither stored nor executed on the user's machine. More importantly, the environment will tolerate sloppy code without ill effect; and this should encourage a quicker time to market.

Your theory is all nice but system like this has two problems: network bandwidth and latency. I can't imagine that some fast FPS could have most of its processing being done far away from the place stuff gets shown to the player.
 
some might disagree with me, and I could be wrong, so eventually I might even disagree with myself, but I was thinking that next-gen consoles should have Super-CPUs that do everything except the final few stages in the graphics rendering pipeline. all the stuff currently handled by GPU's Vertex Shader, Geometry Shader, Tessellator, texture units, pixel shaders/
so basicly the graphics chip becomes more like the Graphics Synthesizer in PS2, or the EDRAM chip in Xbox. containing the ROPS, anti-aliasing unit, units for other back-end functions, filtering, etc and EDRAM if any.
If you think at something like a cell with lot of spe this would be a serious change in the way modern graphics are done! I 'm sure a lot of editors would be peaced off , no open GL or directX, start again almost from scratch.
so 1/2 to 2/3 of the graphics pipeline gets done by certain cores within the CPU. since CPUs can be clocked several times higher, it could benefit. even with regional clocking (?) the more that gets put into the CPU the better. that would also let the graphics chip have more room for pure rasterization. yes there would be need for a graphics chip, not everything should be put on the CPU, otherwise you'd start to compromize.
Look at GF8xxx shaders cores are already running @ +1GHz, they will improve in future.

We have to see how gpu evolve they already impressive for some non graphical tasks, they will get better, I don't think it's a good idea to bypass the shader cores.
They are really promising, I would prefer more silicon invest in the gpu than in SIMD units include in the cpu.
For collisions, particles, etc Gpu will be impressive!
 
Your theory is all nice but system like this has two problems: network bandwidth and latency. I can't imagine that some fast FPS could have most of its processing being done far away from the place stuff gets shown to the player.
There was one or two week ago an article from gamasutra who spoke about physic and how it can cause problem for fast pace game, ie synchronaisation between players.
 
256-bit bus is no longer considered super wide. it's the standard for midrange GPUs now.

512-bit is the widest conventional, external memory bus there is, not counting internal ring-busses or eDRAM busses.

I'd like to think that Microsoft will go with a 512-bit external memory bus for next-gen Xbox (~4-5 years away) providing at least 500-750 GB/sec for the CPU and GPU (with GDDR6?)
that won't even be the highest bandwidth possible given that Rambus will have 1000 GB/sec (1TB/sec) external memory by 2010.

combined with even higher bandwidth eDRAM. If external memory can do 1 TB/sec by 2010 in Rambus memory, which Xbox3 probably won't have, I wonder what kind of bandwidth eDRAM could have, 10 TB ?

I would hope they would go with a 512-bit bus, but I could have sworn we all thought they were going to go with a 256-bit bus in the 360. Aren't they using a 128-bit bus now?
 
It will be interesting to see not only how many bandwidth is available to ram pools but also how many is available between the different chips.
the needs for frame buffer won't improve that much ( I secretely hope they will aim for 720P OR even better for 1380/768 as this kind of displays are way more common than real 720P devices...).

I think that the bandwidth for a 8cores cpu + textures operations won't sky rocket and will be really easily provide.
What Is really needed is fast and clever connection between cpu and gpu.
 
As has oft been mentioned before one of the reasons for not going to 256-bit for PS3 or X360 is that it is a lot more difficult in cost reduction via size reduction. There are simply too many wires. The other reason would be the availability and cost of high end RAM chips at the time of each console's launch date.


From the Xenos article:

The primary issue here is, again, one of cost - the lifetimes of a console will be much greater than that of PC graphics and process shrinks are used to reduce the costs of the internal components; 256-bit busses may actually prevent process shrinks beyond a certain level as with the number of pins required to support busses this width could quickly become pad limited as the die size is reduced. 128-bit busses result in far fewer pins than 256-bit busses, thus allowing the chip to shrink to smaller die sizes before becoming pad limited - by this point it is also likely that Xenos's daughter die will have been integrated into the shader core, further reducing the number of pins that are required.
 
As has oft been mentioned before one of the reasons for not going to 256-bit for PS3 or X360 is that it is a lot more difficult in cost reduction via size reduction. There are simply too many wires. The other reason would be the availability and cost of high end RAM chips at the time of each console's launch date.


From the Xenos article:

[/COLOR]

If this is the case, then are they ever going to use a higher bit bus? I mean the same excuse could be used for any bus size other than 128.
 
If this is the case, then are they ever going to use a higher bit bus? I mean the same excuse could be used for any bus size other than 128.

Well sure. All this means is that if you want to have a wide bus you've got to design your whole system with that in mind, including any potential die shrinks & mobo revisions. This is probably why Sony uses Rambus with each of their consoles. Spend more money on DRAM to get the same bandwidth, but be able to shrink components as process nodes shrink.
 
Status
Not open for further replies.
Back
Top