Larrabee, console tech edition; analysis and competing architectures

I think this is a really good point. Microsoft basically owns the Xenon design. They paid IBM to design it and then IBM turned it over to them. It really was "contracted out" rather than being some sort of big collaboration. IBM isn't even fabricating all the Xenons; Microsoft has a second-source supplier. So, this is one of the reasons that IBM isn't hyping up the Xenon chip: they have nothing to gain from it! If Microsoft sells more Xenons, only Microsoft makes more money. I think IBM gets a cut on each Cell chip sold...

I think more than anything AP, some of your statements here betray a lack of familiarity with the Cell project... and even to an extent with the XeCPU, despite your fondness for the later. :) Indeed MS owns the IP for the processor, in the sense that it has been licensed to them, but that certainly does not preclude IBM from using a near-identical design for their own purposes. Keep in mind that both the Cell's PPC core and the Xenon's trio shared a common origin/design based on earlier IBM prototypes. If IBM wanted to put the XeCPU into their product lineup, by one means or another they most certainly could. I think the question goes back to though... why would they? If there is an advantage you see in this chip vs Cell itself or other alternative architectures in the scientific space, you're not doing a good job of clarifying what you feel it is.

So, this goes back to my point about the Xenon being a reasonable part. It is three cores on 90nm (165 million transistors total). For the next generation XBox it would be on at least 45nm but more likely even 32nm. It seems like IBM could just put several cores Xenon cores on a chip (8 or more) and make a pretty reasonable processor for the next XBox.

Yes they could... but will they? Even if IBM were to design the next XBox chip, which I suspect they will as well, I doubt there wouldn't be changes of some significance made. Given a situation where Cell and Xenon stay static and simply scale with the process nodes, Cell's advantages grow geometrically on the back of its inherent Flop/transistor advantage and the trends favoring it with five years developer familiarity breaking down some of the approachability barriers and the entire industry going parallel. So, whatever happens with Sony on their CPU, I would expect some changes for XeCPU - though while maintaining direct lineage of course for legacy 360 code.

I would still assume a companion GPU chip (Xenon is not a GPU).

See, now you're just insulting people. :p

In contrast, Larrabee (like it or hate it) is targeting the GPU space.

Well but that's the crux of it all though, isn't it? And again I'm a fan of what it's looking to achieve, but the question on whether Larrabee will be successful in the GPu market is still an open question. And either way in truth it must be approached from an angle that sees convergence across what until now have been distinct architectural paradigms (CPU vs GPU).
 
What's wrong with the Larrabee being the GPU for the next Xbox even if the CPU is Xenon with more cores ? Also couldn't IBM design their own Larrabee like GPU with PowerPC cores for next Xbox ?

Yes, this is a very good point. Certainly Larrabee could replace ATI as the GPU supplier. Seems unlikely, but it could happen. Alternatively, maybe AMD/ATI's Fusion might woo Microsoft (again, unlikely but interesting to consider).

One more thing about XBox. I think Microsoft wants to make the next XBox a "7.5th Generation" console that comes out long before the next PlayStation. In such a quick turnaround environment, it seems keeping the next XBox hardware similar to the current XBox would have some advantages in terms of compatibility and such.

It should be interesting.
 
I respect that you disagree. Let me try to convince you anyway... :) I also see I've now got myself in heated debate in two threads, which means I'm unlikely to keep up for long...

Well, for what it's worth I'm sure I can speak with authority when I say so long as your energy keeps up, your participation is appreciated. :)

I was thinking the non-GPU computations of gaming actually. As that is what Xenon was solely designed for.

Fair enough.

This is mostly due to contracts and economic incentive. Xenon was bought and paid for by Microsoft. IBM design the chip and just turn the design over to Microsoft. It wouldn't surprise me if IBM can't even sell Xenon chips if it wanted to.

Well, this was addressed in another post so I won't repeat, but MS has 'rights' to the chip, rather than owning the IP inherent in the design. IBM can do what they want with the underlying architecture.

In contrast, Wikipedia estimates $400 million was spent on the R&D for Cell. I think IBM gets a cut on each Cell chip sold (as they still own IP on the design).

Of course, though keep in mind it wasn't IBM's $400 million for the most part, but Sony's. IBM obviously beyond getting a royalty slice out of each Cell, clearly makes money when it sells a Cell blade, chip, or support plan. But they could do the same with the Xenon-design. It's simply I can't see where in any of the markets to which they presently market Cell, the XeCPU would make a viable (let alone desirable) alternative.

One more reason I'm not impressed with Cell: when Cell started out, it was going to be the *GPU* and *CPU* for PS3. It was more like AMD Fusion (or something) in that regard. In the end, Sony realized it was going to really suck as a GPU, so they quickly (in a huge panic) talk to NVIDIA to get them out of a tight spot.

For the record although I think you're keyed in to a common misconception here based on the original Cell BE patent, the Cell was not ultimately meant to be the original GPU. That was going to be a Toshiba design and we'll leave it at that for now. But yes, they switched to NVidia, IMO mainly due to ISA/approachability concerns. Just wanted to clarify though that although the patent had Cell as rasterizing, the actual design work centered around a separate GPU (though very exotic in its own right).

I apologize for going off topic. My main point wasn't to trash Cell (or, just to trash Cell), but to say that Microsoft was happy with Xenon, why wouldn't they just go back to IBM again?

Well and I agree with the crux of this, save I want to add that it's no longer off-topic so worry not; this is now *the* topic, so post away! :)
 
I think more than anything AP, some of your statements here betray a lack of familiarity with the Cell project... and even to an extent with the XeCPU, despite your fondness for the later. :)

I'm certainly not an expert on the history of Cell and/or Xenon, and it has been a while since I talked to any IBMers about it (I can't be everywhere at once). So, please do correct me if you (as I'm sure you will) if you disagree with any statement I make.

I was under the impression that when IBM agreed to build the chips for all three game consoles (Wii, PS3, and XBox 360), Sony and Toshiba got sort of worried. When I visited IBM in Austin back in 2003, you needed separate "STI" credentials to get into the buildings in which Cell was being design. That is, a regular IBMer not involved with the project didn't have access. From what I understand, this was part of the internal firewall between the Xenon designers and the Cell designers. For good reasons, Sony just didn't want all the R&D to go over the XBox. So, we get things like the Xenon and Cell both using 128-register 128-bit SIMD, but they aren't binary compatible (or quite the same instructions, from what I can tell). The SPEs aren't really PowerPC as much as some new ISA inspired by PowerPC, but I digress.

Keep in mind that both the Cell's PPC core and the Xenon's trio shared a common origin/design based on earlier IBM prototypes.

I can certainly believe that Xenon and Cell's bigger dual-threaded core could certainly have a common ancestor, or perhaps that was the one part of Cell they could share. I suspect some of the work on Power6 (which is also in-order with two threads) could have impacted both, but I really don't know the relative timeframes of the three projects or how they interrelate.

Indeed MS owns the IP for the processor, in the sense that it has been licensed to them, but that certainly does not preclude IBM from using a near-identical design for their own purposes.

I'm pretty sure Microsoft has patents on some of the new instructions in Xenon. I'm not sure how the cross-licensing of those patents works. It does seem likely that IBM could re-use most of the Xenon core for other purposes, but I'm not sure they could use it directly.

If there is an advantage you see in this chip vs Cell itself or other alternative architectures in the scientific space, you're not doing a good job of clarifying what you feel it is.

As I expressed somewhat on the other thread, I'm really a fan of the cache-coherent shared-memory model of todays multi-core CPUs, Larrabee, and Xenon. I'm not so much a big fan of the Cell and GPU-style of memory management. It is just a bias I have, I guess. When I look at Cell's message passing and full/empty bits stuff, it just reminds me of all the supercomputing companies throughout the decades that failed.

Yes they could... but will they? Even if IBM were to design the next XBox chip, which I suspect they will as well, I doubt there wouldn't be changes of some significance made. ... So, whatever happens with Sony on their CPU, I would expect some changes for XeCPU - though while maintaining direct lineage of course for legacy 360 code.

Sure, that sounds reasonable. I'm not saying they might not change it a lot. I think it depends on the timeframe they are targeting (as to how much change they will have time to make). I'm sure that both Cell and whatever CPU IBM might make for the XBox (if they are selected) could be quite different.


Well but that's the crux of it all though, isn't it? And again I'm a fan of what it's looking to achieve, but the question on whether Larrabee will be successful in the GPu market is still an open question.

Yes, yes it is still an open question. It wouldn't be nearly as much fun to debate things that are well-settled questions.

As somebody once said: "It is dangerous to make predictions, especially about the future". ;)

And either way in truth it must be approached from an angle that sees convergence across what until now have been distinct architectural paradigms (CPU vs GPU).

The really interesting thing about Larrabee is that it is the CPUs overtaking GPUs rather than just a meeting-in-the-middle type convergence (just as GPGPU is really about GPUs taking over the CPU's role). Perhaps ATI/AMD will find a comfortable middle path that is true convergence, but it seems that Intel or NVIDIA are more looking at a head-on battle to take over each other's turf (in terms of GPCPU vs GPGPU).
 
For the record although I think you're keyed in to a common misconception here based on the original Cell BE patent, the Cell was not ultimately meant to be the original GPU. That was going to be a Toshiba design and we'll leave it at that for now. But yes, they switched to NVidia, IMO mainly due to ISA/approachability concerns. Just wanted to clarify though that although the patent had Cell as rasterizing, the actual design work centered around a separate GPU (though very exotic in its own right).

Yea, the Toshiba story rings a bell. Thanks for correction. It is actually pretty reassuring that Cell wasn't design to be the GPU (because it is even a less good match for it then say, Larrabee..) :)
 
They are licensed to use the PPC design, not buying it. So IBM is making money too.

Ok, that seems likely that IBM is making some per-unit money on Xenon.

Yet, clearly Cell is IBM's favorite son, and Xenon is the red-headed step child. I guess Cell gets all the press and attention from IBM (and consequently the media), while a quite reasonable chip such as Xenon gets somewhat ignored. If I was one of the engineers on Xenon, I'd be annoyed. :)
 
I was under the impression that when IBM agreed to build the chips for all three game consoles (Wii, PS3, and XBox 360), Sony and Toshiba got sort of worried. When I visited IBM in Austin back in 2003, you needed separate "STI" credentials to get into the buildings in which Cell was being design. That is, a regular IBMer not involved with the project didn't have access. From what I understand, this was part of the internal firewall between the Xenon designers and the Cell designers. For good reasons, Sony just didn't want all the R&D to go over the XBox. So, we get things like the Xenon and Cell both using 128-register 128-bit SIMD, but they aren't binary compatible (or quite the same instructions, from what I can tell). The SPEs aren't really PowerPC as much as some new ISA inspired by PowerPC, but I digress.

Specifically as it relates to the ISA, this post/translation from a retrospective on the subject is a nice window into the past:

http://forum.beyond3d.com/showpost.php?p=517078&postcount=28

And certainly as to the decisions that went into Cell as a whole, it's a good thread in general.

I can certainly believe that Xenon and Cell's bigger dual-threaded core could certainly have a common ancestor, or perhaps that was the one part of Cell they could share. I suspect some of the work on Power6 (which is also in-order with two threads) could have impacted both, but I really don't know the relative timeframes of the three projects or how they interrelate.

I forget the name/code of the specific core, but they were based on an IBM prototype circa ~2000 that was a testbed for high-clock, in-order design. How much they both (Cell PPE vs Xenon core) shared in common development from that point, and how much of their similarities are the result of parallel development, I've certainly wondered myself as well...

I'm pretty sure Microsoft has patents on some of the new instructions in Xenon.

You're right about that, there are indeed some architectural/instruction considerations made for which MS owns the IP.

As I expressed somewhat on the other thread, I'm really a fan of the cache-coherent shared-memory model of todays multi-core CPUs, Larrabee, and Xenon.

Totally fair, I think most would agree with you, especially when it comes to choosing a situation they would wish to place themselves. But as an aside though, it's worth mentioning the XeCPU's cache is prone to thrashing... or such is the word on the street.

I'm not so much a big fan of the Cell and GPU-style of memory management. It is just a bias I have, I guess. When I look at Cell's message passing and full/empty bits stuff, it just reminds me of all the supercomputing companies throughout the decades that failed.

Well, it is a supercomputer on a chip afterall! :p

The really interesting thing about Larrabee is that it is the CPUs overtaking GPUs rather than just a meeting-in-the-middle type convergence (just as GPGPU is really about GPUs taking over the CPU's role).

Well right, and I agree. Which is why it's almost impossible to talk about Larrabee without framing it in the context of a CPU... even though its target ostensibly is the GPU.

**************************************************

Yet, clearly Cell is IBM's favorite son, and Xenon is the red-headed step child. I guess Cell gets all the press and attention from IBM (and consequently the media), while a quite reasonable chip such as Xenon gets somewhat ignored. If I was one of the engineers on Xenon, I'd be annoyed. :)

Well again though, it goes back to markets. XeCPU's not going to go up against IBM's Power 6 in the server arena, and I don't see where it competes with Cell in the HPC space. If they threw in some OOE and made it a desktop chip, I feel it would do well in a hypothetical Apple/Power scenario - moreso perhaps than even the Cell which was actually pitched! - but without that general utilitarian segment in which to compete, a decent general core (provided OOE in this scenario) doesn't have much of a role to play in IBM's offerings, as specialized as they've become.
 
Last edited by a moderator:
Yet, clearly Cell is IBM's favorite son, and Xenon is the red-headed step child. I guess Cell gets all the press and attention from IBM (and consequently the media), while a quite reasonable chip such as Xenon gets somewhat ignored. If I was one of the engineers on Xenon, I'd be annoyed. :)

I think you put too much faith in the Xenon design. Even though I like to criticize Cell, you won't find a single post where I praise Xenon. Xenon is a mediocre design at best, that just happens to get the job barely done and given the timing was MS's best bet.
Now, I like to bash Cell mostly because I think it had so much more potential, but ultimately someone dropped the ball. It is not coincidence that none other than the original three (STB) ever used the chip for anything else. IBM/Sony failed to attract big players like Apple etc.
IBM is touting Cell but they never really put much marketing weight behind it.
You better believe that Intel is not going to make the same mistake if they decide to go full production with Larrabee. Intel is going to push hard and get everyone on board - get full support from MS (DX, Vista, Visual Studio and what not), push the chip on OEMs, integrate it with chipsets etc.
 
Xenon is a mediocre design at best, that just happens to get the job barely done and given the timing was MS's best bet.

I think factors like MS having an off the shelf HAL for PPC and, at the same time, IBM trying to enter the merchant semi conductor business also played a large role.
 
Specifically as it relates to the ISA, this post/translation from a retrospective on the subject is a nice window into the past:

http://forum.beyond3d.com/showpost.php?p=517078&postcount=28

And certainly as to the decisions that went into Cell as a whole, it's a good thread in general.

An interesting read. Certainly my view of Cell has always been IBM-centric, and I had forgotten how involved the other two companies were.

I forget the name/code of the specific core, but they were based on an IBM prototype circa ~2000 that was a testbed for high-clock, in-order design. How much they both (Cell PPE vs Xenon core) shared in common development from that point, and how much of their similarities are the result of parallel development, I've certainly wondered myself as well...

Was it the guTS project from IBM's Austin Research Lab?

Well again though, it goes back to markets. XeCPU's not going to go up against IBM's Power 6 in the server arena, and I don't see where it competes with Cell in the HPC space...

All true. Xenon is really design to be the sweet spot for XBox, and it doesn't really make much sense in other domains.

One thing that makes Cell interesting is its huge amount of memory bandwidth provided by the XDR memory. GPUs and Cell have much more bandwidth than CPUs today, but Cell is likely more natural for many HPC applications than GPGPU (at least they are in competators). If you have very high-FLOP, high-bandwidth application, it will fly on Cell more than any other CPU today.

Of course, XDR isn't cheap (nor are all the pins to drive it), yet Cell is likely much cheaper per flop or per GB/second of bandwidth that just about any other non-GPU right now.

I still would have made some decisions differently than what Cell did, but that's true in lots of cases.. :) From the link you posted above, it does seem that the idea of local stores and such really did come from the belief that games needed hard real-time requirements. It seems like the set-locking of the Xenon's cache is "good enough" when it comes to that regard (and such real-time certainly isn't needed for HPC).
 
When you take a closer look at Xenon it's pretty clear that it was developed in a rush.

Some of the things like store-queue gotchas should never have been in the final product.

And then there are indications that it was supposed to run a lot faster than it does: Two cycle basic ALU result-forwarding latency, six (!!!) cycle load-to-use latency in the D$ while only hitting 3.2GHz on a state of the art process.

It's also pretty big for a dual issue in-order core (larger than an Athlon-64 core !!), it was probably laid out using automated tools.

There is a lot of room for improvement for the next XBox.

Cheers
 
When you take a closer look at Xenon it's pretty clear that it was developed in a rush.

Yes, certainly. The turn around time was remarkably short. But I don't blame the engineers for being on a tight schedule.

Some of the things like store-queue gotchas should never have been in the final product.

Can you say more about this?


And then there are indications that it was supposed to run a lot faster than it does: Two cycle basic ALU result-forwarding latency, six (!!!) cycle load-to-use latency in the D$ while only hitting 3.2GHz on a state of the art process.

A 6 cycle load-to-use latency does seem pretty grim. Any idea what it is for the Cell's PPC or SPUs?

It's also pretty big for a dual issue in-order core (larger than an Athlon-64 core !!), it was probably laid out using automated tools.

Yea, I think the heavy use of automated synthesis probably hurt the area quite a bit. Frankly, I was surprised how much larger the cores were compared to the Cell SPEs (on the same process). At the same number of transistors, you can only get 4 Xenon cores as Cell's 8+1 cores. Perhaps Microsoft didn't care. It knew that it would shrink over time (as they moved to 65nm) and that it would become a pretty cheap chip over time. I dunno. It certainly hasn't stopped XBox 360 from being a success.

There is a lot of room for improvement for the next XBox.

No disagreement there.
 
Can somebody remind us the size of spe core (without Ls) and of a xenon core (without cache)?
And why not a larrabee core?

Could the xenon cores be bigger than supposed due to a lot of built in redundancy?
How tiny would better implemented xenon core?
 
Can somebody remind us the size of spe core (without Ls)

The LS is a huge chunk of the SPE ;)

From an IBM paper:

The SPE design has roughly 20.9 million transistors, and the chip area including the SMF is 14.8 mm2 (2.54 mm x 5.81 mm) fabricated with a 90-nm silicon-on-insulator (SOI) technology. The 65-nm version of the design is 10.5 mm2.
Xenon 90nm, also from an IBM paper:
Xenon-90nmSOI.gif
 
And Larrabee, for...oh, no particular reason...

And why not a larrabee core?

Guys, I don't think this is known (or even finalized) information right now. :)

As for the SPEs and local store, they make up a large part of the SPE transistor count, but like most cache/memory are denser than the logic circuits, to come in at a minority of die area:

cell-1.gif
 
Last edited by a moderator:
Ok, you got me there, I surrender! :cry:

Anyway to get a good sense of the size of the Cell PPE, XeCPU core, and an SPE in relation to one another, we'd need length/width for the XeCPU (and from there we could guesstimate from the die shot)... which I'm not sure we've ever received an official number for.
 
Last edited by a moderator:
Back
Top