A new X86 architecture with Cell-like tech ?

xbdestroya said:
SPM I think you're underestimating Linux in the server space to begin with; it's big. Also note that Sun is a big Opteron supporter and AMD is pretty much the poster child for x86 Solaris. I'd be surprised if the Clearspeed inclusion had much to do with anything other than targeting an increased range of customers with Opteron - maybe trying to torpedo the ever listing Itanium.
Exactly. I think that this announcement should be considered alongside what SGI is doing with it's Project Ultraviolet (FPGA-on-the-NUMAlink technology), more than desktop PC space. It's the sort of technology that the Three Letter Agencies are interested in, for example.
 
ERP said:
OK I'm going to repeat myself here....

I have yet to see a compiler, including ones targeted at In Order designs do an even vaguely decent job of scheduling to hide memory and instruction latencies.

Even if one of these compilers did exist an OOO core could and in general would ALWAYS do a better job.

The reason that OOO cores are prevalent in the desktop space is simply, it's been the best way to improve performance on existing applications. OOO designs will display considerably higher sustainable IPC's in real applications than in order designs, simply because they can mask latency where in order designs can't.

In Cell and X360 IBM is trading off the execution benefits of OOO designs to get back transistors to throw at FP Performance, they obviously believe that this is the primary performance bottleneck in games. It's a very radical tadeoff, trust me you'd be surprised how poorly Cell or X360 CPU would run apps not tailored towards their architectures.

I should start a poll on roughly how many Instructions Per Clock a dev ought to expect out of these in order architectures in a real application, I'll give you a clue, it's a LOT lower than the peaks people throw around on BBS's.

I agree with you about OOO cores producing better performance -they do, but they also require more silicon and therefore have fewer cores on a multi-core chip. The question is which produces more performance per mm2 of die - an OOO core or an in order core with compiler optimisation. If you are running Windows, the answer is clear - OOO wins hands down. On the other hand, if you are running Linux the reverse may be the case. Itanium which is an in-order core does produce good performance with Linux, although it is rubbish with Windows.

The other issue is what exactly you want the performance for. If you have a conventional server which tends to be i/o bound, you want to manipulate lots of big pointers and stacks, and a processor like Cell's PPE which is cut down to reduce cache and remove OOO logic will be undesirable.

Just to underline the different requirements, IBM's big iron z-series mainframe servers capable of running hundreds of virtual Linux and Unix instances have pretty pathetic CPU performance when compared with a PC based load sharing server cluster. However they absolutely kill a PC server cluster when running database and web serving applications. This is because hits come in peaks, and the mainframe's total CPU workload can be all allocated to the virtual server instance that takes the hit as and when it is required, so the much lower total CPU MIPS isn't a handicap. However the mainframe's massive i/o bus throughput really puts it in a class of it's own allowing each virtual server instance to handle i/o orders of magnitude higher than any server in PC load sharing cluster is able to. The IBM z-series mainframe represents the exact opposite philosophy to the Cell. Each has it's uses.

On a desktop PC, speed and responsiveness depends on the most speed critical aspects - graphics, video and sound. Does the PPE really need to go as fast as the latest AMD or Intel CPUs on a desktop PC? Doesn't a spreadsheet or wordprocessor already run fast enough on a 3.2GHz PPE? Won't the SPEs make things go faster overall on a desktop PC? What I am saying is that Cell may have a better balance of processing power for a desktop PC than the current Intel and AMD processors.

Of course Cell will also be perfect for supercomputing clusters, where floating point performance rather than i/o performance is important. By the way IBM is producing a DP Cell variant for exactly this purpose, which will have a similar performance boot in DP floating point performance over the current AMD and Intel chips that Sony's version of Cell has in SP floating point.
 
Back
Top