Larrabee, console tech edition; analysis and competing architectures

You should read that thread:
http://forum.beyond3d.com/showthread.php?t=46393
I think there some interesting links while the discussion is awesome (I fail to understand properly most of the post but it give a general idea of the pro/con of the design (minus what we know about it so far).

Well I went through it, but nothing is concrete and not alot of detail to be discussing its philosophy against Cell.

What I gather is that each of Larrabee core is VLIW SIMD-16. It has L2 cache, on one slide its 256k/core on another its 4 MB shared L2. It uses ringbus for communication between cores. On one slide it has texture sampler/processor units on another it has fixed function. Clock speed one said around 2 GHz on another 4 GHz.

I think they are still experimenting on what will work. I think their goal is to bring something that can handle really complex scenes (something with lots of tiny triangles) where hardware raster GPU will have trouble and where raytracing will flex its muscle.

Its most likely be paired to some beefy CPUs to handle the building of the acceleration structure every frame to handle dynamic scenes.

If they can't do that they might just put in a raster hardware and all those cores will be for shaders. To me its still up in the air what they might do. We will see if it is Cell done right or not with the result I guess.
 
I would be very surprised if the next Xbox CPU is not a beefier PowerPC with more core, and fully backwards compatible with Xbox 360.

I totally agree with this statement.

From what I've heard, Microsoft is pretty happy with IBM's Xenon chip. IBM put it together insomething incredible like <2 years from start to product. The chip is a pragmatic compromise in terms of cost, performance, power, and programmability. Funny that IBM spent all this time (years and years, hundreds of engineers work on it per year) on Cell, and then the little design team (mostly in frigid Rochester, MN) puts together the XBox chip in record time. Pretty ironic.

I've heard the Microsoft is already having discussions with IBM on the PowerPC-based Xbox chip (although I'm not sure it is a done deal). I suspect they will just take a Xenon-like design, maybe double the cores and the clock frequency, and Microsoft would be quite happy. Of course, the XBox has a separate GPU, and I wouldn't expect that would change for the next generation.
 
Funny that IBM spent all this time (years and years, hundreds of engineers work on it per year) on Cell, and then the little design team (mostly in frigid Rochester, MN) puts together the XBox chip in record time. Pretty ironic.
:???: One was a completely new architecture designed from the ground up with a committee of contributors all wanting different things, and the other was three modified existing cores stuck together with only one customer calling the shots.
 
:???: One was a completely new architecture designed from the ground up with a committee of contributors all wanting different things, and the other was three modified existing cores stuck together with only one customer calling the shots.

I agree that Cell was hampered by a committee of contributors. I think that really hurt it.

However, my point is that, in the end, the Xenon looks much more sane as compared to the Cell processor. In fact, put just a few more cores on the Xenon and you'd have a chip that would rival the Cell in just about every way.

Perhaps you're underestimating the beauty in the Xenon's pragmatic choices. Sure, you can dismiss it as just sticking together existing cores, but I think it is much more than that. It has new vector instructions for handling D3D textures, it has 128 vector registers (more than the standard PowerPC, and more like a Cell SPE), it has two-way multithreading. Most importantly, it has on-chip cache coherence (something Cell lacks) that make it much simpler to program and such.

Good engineering is knowing when you need a "new architecture designed from the ground up" and when you don't. In that way, Cell reminds me of Intel's Itanium.
 
I agree that Cell was hampered by a committee of contributors. I think that really hurt it.

However, my point is that, in the end, the Xenon looks much more sane as compared to the Cell processor. In fact, put just a few more cores on the Xenon and you'd have a chip that would rival the Cell in just about every way.

AP I can think of no better way to start this response off that to say plainly that I disagree. How many cores until it rivals Cells theoretical performance? How many more until it matches its real world performance? Keep in mind we're talking externally to gaming right now, but I imagine you're ok with that since your interests lie in the architectural comparisons anyway. And within all that you are faced with very real thermal, yield, and die size concerns.

You discussed the 'design-by-committee' aspect as a negative, and in some ways no doubt, but at the same time the design the committee came up with is one that outperforms many-an-architecture often by a factor of 10:1 depending on the task. Certainly we are not seeing the Xenon design proliferate outside of the specific use in the XBox; do you consider that a mistake on IBM's part, vs the institutional support Cell has gained in a number of academic/HPC sectors?

Perhaps you're underestimating the beauty in the Xenon's pragmatic choices. Sure, you can dismiss it as just sticking together existing cores, but I think it is much more than that. It has new vector instructions for handling D3D textures, it has 128 vector registers (more than the standard PowerPC, and more like a Cell SPE), it has two-way multithreading. Most importantly, it has on-chip cache coherence (something Cell lacks) that make it much simpler to program and such.

Good engineering is knowing when you need a "new architecture designed from the ground up" and when you don't. In that way, Cell reminds me of Intel's Itanium.

It's strange to me that given your views in the quote above, you're not in the other Larrabee thread extolling an octal-core Nehalem or something vs Larrabee itself, since it would seem more in line with the position you're taking here.

You have to understand that your statements in this thread are off-topic, and re-open many an old debate from the past. I'm ok with that as long as the quality is high, but we may need to spin it off into the CellPerformance section or something depending on where it goes.
 
I think you've missed the point of Cell's design decisions. A Xenon with more cores would be totally memory hampered and incapable of sustaining throughput of it's vector units, which is where the ground up design was useful. That's the problem Cell has nicely addressed, and the committee IMO produced a more balanced product than just a few big cores like IBM's choice and a few simple cores dependent on a secondary processor like Toshiba's choice. Though this is going OT.
 
I think you've missed the point of Cell's design decisions.

CELL is a vast grid of square holes. Fine if you only have square pegs....

Though this is going OT.

Alright, to bring this back on topic. The feature of CELL that stands out more than anything else is the non-coherent memory (local stores). Intel hasn't opted to use x86 in Larrabee because it's the best ISA evar, but because making chips that can run x86 binaries is what has made Intel the biggest chip manufacturer in the world. Using the ISA without using the memory model would make very little sense. So IMO, it's fairly clear that Larrabee will be nothing like CELL.

Cheers
 
Alright, to bring this back on topic. The feature of CELL that stands out more than anything else is the non-coherent memory (local stores). Intel hasn't opted to use x86 in Larrabee because it's the best ISA evar, but because making chips that can run x86 binaries is what has made Intel the biggest chip manufacturer in the world. Using the ISA without using the memory model would make very little sense. So IMO, it's fairly clear that Larrabee will be nothing like CELL.

Cheers

Wait, you call that bringing it back on topic? ;)

The topic is whether Larrabee would make sense for the next XBox, not whether it is similar or dissimilar to Cell... or how. Now, granted you're going off-topic in a way that's dominated the past page, and personally I think this aspect of the conversation is more interesting.

But for the reasons that I stated in my response to Joshua though, I don't agree with your thinking or reasoning behind saying Larrabee will be nothing like Cell. ISA and cache-coherence are important aspects of its lineage, but Larrabee is very much competing against Cell in the new massively parallel/vectorized realm rather than extending the OOE/monolithic paradigm.
 
ISA and cache-coherence are important aspects of its lineage, but Larrabee is very much competing against Cell in the new massively parallel/vectorized realm rather than extending the OOE/monolithic paradigm.

I absolutely agree. It seems almost stubborn for Intel to stick to a x86 design until one sees the beauty of how well it fits. If you want lot's of FLOPS you need many cores, but they have to be small, so no OOE, but how to hide latency, well super fast L1 cache and hyperthreads, but that means small state, so only few registers, so branch prediction and memop ISA, and here you go - x86 Larrabee core.
 
But for the reasons that I stated in my response to Joshua though, I don't agree with your thinking or reasoning behind saying Larrabee will be nothing like Cell. ISA and cache-coherence are important aspects of its lineage, but Larrabee is very much competing against Cell in the new massively parallel/vectorized realm rather than extending the OOE/monolithic paradigm.

But it does so with a programming model that most other architectures use. That means your multithreaded code on your monolithic CPU will run just fine (faster) on Larrabee. CELL will have to find a way to get on-die data-sharing going or it will die as an architecture (something like a big fat cache before the memory interface)

ON TOPIC: Larrabee won't be in the next XBox, purely for business reasons; Microsoft won't make the same mistake they did with the first XBox relying on external single suppliers for silicon, they will want to own the IP.

Cheers
 
Last edited by a moderator:
Ok, Squilliam I'm sorry but I had to exercise B3D eminent domain on this thread in order to foster the architecture-centric aspects of discussion... thus the thread name change.
 
But it does so with a programming model that most other architectures use. That means your multithreaded code on your monolithic CPU will run just fine (faster) on Larrabee.

I don't know about faster, but I think it'll run. I think Larrabee is more similar to Xenon rather than Cell. IBM might create a PowerPC version of Larrabee for the next Xbox.

Now anyone know if Larrabee for the GPU part gonna be a hardware raster or raytracer ?
 
But it does so with a programming model that most other architectures use. That means your multithreaded code on your monolithic CPU will run just fine (faster) on Larrabee.

I agree, definitely. And targeted as a GPU or into the HPC market, the code will certainly be multithreaded. At the desktop level though, I think it will have a lagtime in running legacy code at the speed of its architectural Intel contemporaries. Now, granted this all assumes a homogeneous core structure; I think most of us presume heterogeneous to be in its future. (Which is honestly another mark for 'Cell-like' IMO)

BUT, keep in mind I'm a big fan of the legacy x86 support as well, and yes the fact that it runs the legacy code... at whatever speed... I'm excited about.

CELL will have to find a way to get on-die data-sharing going or it will die as an architecture (something like a big fat cache before the memory interface)

It'll be interesting to see what direction Cell 2 goes in. Clearly a large part of it will be Sony dependent; at least hopefully it will be a part of PS4. Else I imagine low impetus for dramatic changes at IBM soley for the HPC market given the dramatic shift in volumes.
 
Now anyone know if Larrabee for the GPU part gonna be a hardware raster or raytracer ?

Larrabee as it's being discussed at present is a rasterizer... but obviously it's actually more of a sure thing to assume it'll be "strong" at ray-tracing, just as with other of the typical high-Flop/parallelism/scientific strengths. It doesn't pre-suppose a rendering shift to ray-tracing though... at least today. Of course there seems to be a new Larrabee rumor every month.
 
Now, granted this all assumes a homogeneous core structure; I think most of us presume heterogeneous to be in its future. (Which is honestly another mark for 'Cell-like' IMO)

Well, heterogenous as in different cores with different single thread performance, not heterogenous as in completely different ISA and memory model (CELL).

From a programmers point of view a Core-2 (with SLE and wide vector support) will look no different than a Larrabee core, - except for speed. That makes it homogenous from a software POV (which is what matters IMO).

Cheers
 
Well, heterogenous as in different cores with different single thread performance, not heterogenous as in completely different ISA and memory model (CELL).

That's a fair point, definitely. Granted I still stick to my previously stated comparisons between the two, but yeah it's a huge material benefit for the ISA to be same between the two.

That does more for it on the CPU side vs the GPU side from the outset, however, where it seems Intel's new ISA extensions (and their approachability) will play a key role.
 
ON TOPIC: Larrabee won't be in the next XBox, purely for business reasons; Microsoft won't make the same mistake they did with the first XBox relying on external single suppliers for silicon, they will want to own the IP.

I think this is a really good point. Microsoft basically owns the Xenon design. They paid IBM to design it and then IBM turned it over to them. It really was "contracted out" rather than being some sort of big collaboration. IBM isn't even fabricating all the Xenons; Microsoft has a second-source supplier. So, this is one of the reasons that IBM isn't hyping up the Xenon chip: they have nothing to gain from it! If Microsoft sells more Xenons, only Microsoft makes more money. I think IBM gets a cut on each Cell chip sold...

So, this goes back to my point about the Xenon being a reasonable part. It is three cores on 90nm (165 million transistors total). For the next generation XBox it would be on at least 45nm but more likely even 32nm. It seems like IBM could just put several cores Xenon cores on a chip (8 or more) and make a pretty reasonable processor for the next XBox. I would still assume a companion GPU chip (Xenon is not a GPU).

In contrast, Larrabee (like it or hate it) is targeting the GPU space.
 
AP I can think of no better way to start this response off that to say plainly that I disagree.

I respect that you disagree. Let me try to convince you anyway... :) I also see I've now got myself in heated debate in two threads, which means I'm unlikely to keep up for long...

How many cores until it rivals Cells theoretical performance?

The three processors in Xenon are 3.2 Ghz with two-wide 128-SIMD units. That is basically the same as a Cell SPE (in fact, the Xenon cores is actually more flexible with the dual-issue, so Xenon has a high peek rate than a Cell SPU). So, it seems like if Xenon has 8 or so cores, it would have similar raw FLOPS as Cell. Xenon has 165 million transistors to Cell's 234 million transistors (on the same 90nm IBM SOI process). So, at the same transistor budget, Xenon could have four cores (instead of just three).

How many more until it matches its real world performance?

I dunno. Xenon has two threads per core (unlike the Cell SPEs), so having 6 threads can help tolerate some memory latency and keep the system fed. This doesn't mean it will hit its peak, but it should help the CPU utilization.

Keep in mind we're talking externally to gaming right now...

I was thinking the non-GPU computations of gaming actually. As that is what Xenon was solely designed for.

And within all that you are faced with very real thermal, yield, and die size concerns.

One thing that is masterful about the Cell is its physical implementation. The datapaths have lots of full-custom layout, and they really banged on it to make it fast and low power (each of the SPEs is only a few watts!) The actual low-level circuit and layout implementation of Cell seems pretty solid.

In contrast, the Xenon chip had a much smaller design team, and thus much less custom layout (and more standard cell synthesis).

Certainly we are not seeing the Xenon design proliferate outside of the specific use in the XBox; do you consider that a mistake on IBM's part, vs the institutional support Cell has gained in a number of academic/HPC sectors?

This is mostly due to contracts and economic incentive. Xenon was bought and paid for by Microsoft. IBM design the chip and just turn the design over to Microsoft. It wouldn't surprise me if IBM can't even sell Xenon chips if it wanted to.

In contrast, Wikipedia estimates $400 million was spent on the R&D for Cell. I think IBM gets a cut on each Cell chip sold (as they still own IP on the design). You bet IBM is trying to hype up and milk Cell for all it is worth. And, for some applications it is by far the cheapest FLOPs around (perhaps ignoring GPGPU stuff).

One more reason I'm not impressed with Cell: when Cell started out, it was going to be the *GPU* and *CPU* for PS3. It was more like AMD Fusion (or something) in that regard. In the end, Sony realized it was going to really suck as a GPU, so they quickly (in a huge panic) talk to NVIDIA to get them out of a tight spot. Between Cell and the NVIDIA RSX chip, the end results was one of the most expensive main-stream gaming console in history.

It's strange to me that given your views in the quote above, you're not in the other Larrabee thread extolling an octal-core Nehalem or something vs Larrabee itself, since it would seem more in line with the position you're taking here.

I guess I don't see the inconsistency. But I reserve the right to be inconsistent, I guess.

Obviously an eight-core Nehalem isn't going to sip power. Yet, if PC games can take advantage of the multiple CPUs, it might be a perfect companion to a high-end GPU (be that a chip by AMD/ATI, NVIDIA, or Larrabee).

One thing that Cell did get right (IMHO) is the big-core/small-core thing. I think heterogeneous chips (same ISA or different ISA) is the way to go in the future. AMD Fusion is along this direction, and I think we'll see more chips like that going forward (but not Larrabee's first incarnation).

You have to understand that your statements in this thread are off-topic, and re-open many an old debate from the past. I'm ok with that as long as the quality is high, but we may need to spin it off into the CellPerformance section or something depending on where it goes.

I apologize for going off topic. My main point wasn't to trash Cell (or, just to trash Cell), but to say that Microsoft was happy with Xenon, why wouldn't they just go back to IBM again?
 
Back
Top