ARS Technica: Introducing the Cell

j^aws · Feb 8, 2005

http://arstechnica.com/articles/paedia/cpu/cell-1.ars

Good article! 8)

Titanio · Feb 8, 2005

Yes, interesting indeed.

Although I'm confused slightly - if it's dual issue, does that not allow for out-of-order execution, even in a simple form?

London Geezer · Feb 8, 2005

I was gonna reply with a loud "NOOOOOOOOOO ANOTHER THREAD ON CELL!!!!!!!"... But that was quite interesting.

Deepak · Feb 8, 2005

Damn, Jaws. I was about to post same article.

Megadrive1988 · Feb 8, 2005

I actually DID post a link to that article, twice. but then i had the thread one erased. all that remains is the link in one of the other Cell threads (forgot which one) probably ISSCC thread.

edit: yup, here: http://www.beyond3d.com/forum/viewtopic.php?p=458721&highlight=21+million#458721

8)

but glad you made a thread out of this one Jaws. afterall, one can NEVER have too many Cell-threads. whoa i made a pun

SiBoy · Feb 8, 2005

Titanio said:
Yes, interesting indeed.

Although I'm confused slightly - if it's dual issue, does that not allow for out-of-order execution, even in a simple form?

In this case, no. They opted for simplicity.

j^aws · Feb 8, 2005

Hannible said:
Originally posted by Hannibal:
I'm going to do another post on this a little later, but I wanted to make a few clarifications about things raised in this thread.

The PPE is not, as I thought before the session, a POWER5 derivative. It's a dual-issue inorder machine with VMX capabilities. It's actually a deriviative of a different project that apparently didn't go anywhere from a few years ago. I don't know details, but I overheard someone talking about it.

The places where I said "128 bytes" are indeed typos, and should be 128 bits. The editor is fixing those.

I recommend downloading the .doc from SCEE linked up above. If you read it, you'll know aobut 85% of what I know at this point. I have a really nice paper abstract that I can draw more information from.

Also, I have more info on the SPEs which I didn't include. In particular, I have pipeline diagrams and instruction latency tables for all the units. I can post that stuff tomorrow for those interested.

Finally, that Blanchford guy whose article I critiqued a while back has a pretty good "Clarifications" page that collects up much of the available info. Check it out, here. And especially be sure to read it before emailing me asking if I'm going to apologize for nitpicking about the "cache" language, like some wanker has already done.

If I had it to do over again, I would definitely have dropped the "monitor" analogy, but I do stand by the substance of my criticism of that aspect (and others) of the the article, and in fact the recent revelations have vindicated them.

On a related note, it's important to understand why they didn't go with VMX on the SPUs. The SPU execution hardware is just too stripped-down and barebones to support a feature-rich ISA extension like VMX. So there was no point in it. They just cooked up a custom, simple, SIMD ISA with load-store capabilities for reading/writing the LS and the channel interface.

Finally, IBM won't release performance benchmarks, but they do claim a 10X speedup over a PC in the same power envelope. Take this claim with a large grain of salt, however, because there's no context to it (i.e. on what type of application, vs. what kind of PC, etc. etc.).

Some clarifications from the above article by the author Hannibal. His article is only 'Part 1' and more should follow as details emerge...

j^aws · Feb 9, 2005

Part II: The Cell Architecture

More will follow from Hannibal...

j^aws · Feb 9, 2005

Above is Xenon/ Xbox2 patent. Focus on the L2 cache...a bit of imagination needed, but the L2<>GPU sharing is similar to CELL below with it's L2 sharing...

...the CELLs PPE sharing it's L2 cache with 8 SPEs looks similar...and there was an IBM patent describing the above also.

i.e.

CPU<=>L2 Cache<=>GPU

where,

1 Xe CPU core >>> PPE,
1 Xe GPU >>> 8 SPEs (R500 shading ALUs equivalent)

Above is just an analogy but it's strikingly similar as suggested by some IBM patents!

marconelly! · Feb 9, 2005

Can someone give a link to that SCEE PDF that Hannibal talks about?

one · Feb 9, 2005

marconelly! said:
Can someone give a link to that SCEE PDF that Hannibal talks about?

http://www.scei.co.jp/corporate/release/index_e.html

j^aws · Feb 9, 2005

Hannibal said:
Originally posted by Hannibal:
Regarding this 4GHz number, I have a question/comment that I'd like to throw out. Given IBM's track record on 90nm, their history of releasing optimistic clockspeed targets (at least if we take Jobs's 3GHz(?) claims to reflect IBM's assurances), and the CELL's die size, can we really expect this chip to debut in quantity at 4GHz? It seems to me that this number will likely be subject to downward revision in the next two years.

Also, I've seen Moab here and elsewhere talking up the idea that the SPEs aren't for use in rendering. In this he is most assuredly wrong. As Scott Wasson at TR has pointed out more than once, the SPEs are esentially pixel shaders and they will be used for the rendering pipeline. Furthermore, IBM themselves stated in the presentation that they consider the CELL to be a combination of a CPU and GPU. The IBM rep also answered a question about using these for rendering and discussed the fact that SPE peer-to-peer communication over the EIB, in combination with local storage, means that you can flexibly assign different SPEs to different parts of the rendering pipeline.

Moab's comments do bring out one important fact, though. It pays to remember that the CELL is a ways off, and that the PC will be that much more powerful when this new design finally hits the market. Furthermore, there's going to be a learning curve as developers figure out how to take advantage of this substantially different hardware. This learning curve won't be as steep as that for the PS2, but it will be enough to give the PC (which is already quite mature) even more time to increase in power before CELL reaches its full potential.

There are no miracles or magic bullets in microprocessor design. Expect the CELL to be impressive, but don't expect it to just lay waste to all the competition right out of the gate. In fact, expecting some kind of miracle architecture misunderstands the fundamental premise of this new design. The idea isn't so much to bring about an all-at-once radical leap forward in performance as it is to provide a forward-looking, scalable platform that will serve as the architectural basis for future performance increases that aren't tied to GHz numbers and that aren't as constrained by the Von Neumann bottlneck.

SPE's are essentially pixel shaders :?:

What does this mean for the NV5x GPU :?:

version · Feb 9, 2005

a pixelshaders on 4 GHZ , its fine

nAo · Feb 9, 2005

Jaws said:
SPE's are essentially pixel shaders What does this mean for the NV5x GPU

IMHO, it means Hannibal is wrong

j^aws · Feb 9, 2005

version said:
a pixelshaders on 4 GHZ , its fine

Was it not expected that the SPEs in CELL be vertex shaders and the NV5x GPU would be the pixel shaders (and maybe vertex shaders also)?

j^aws · Feb 9, 2005

nAo said:
Jaws said:

SPE's are essentially pixel shaders What does this mean for the NV5x GPU

Click to expand...

IMHO, it means Hannibal is wrong

You post there...go and tell him he's wrong!

...It's in the 'Part 1' thread...

version · Feb 9, 2005

Jaws said:
version said:

a pixelshaders on 4 GHZ , its fine

Click to expand...

Was it not expected that the SPEs in CELL be vertex shaders and the NV5x GPU would be the pixel shaders (and maybe vertex shaders also)?

possibility:

1 . SPE in GPU
2 . deferred rendering
3. peer to peer

Panajev2001a · Feb 9, 2005

Jaws said:
Hannibal said:

Originally posted by Hannibal:
Regarding this 4GHz number, I have a question/comment that I'd like to throw out. Given IBM's track record on 90nm, their history of releasing optimistic clockspeed targets (at least if we take Jobs's 3GHz(?) claims to reflect IBM's assurances), and the CELL's die size, can we really expect this chip to debut in quantity at 4GHz? It seems to me that this number will likely be subject to downward revision in the next two years.

Also, I've seen Moab here and elsewhere talking up the idea that the SPEs aren't for use in rendering. In this he is most assuredly wrong. As Scott Wasson at TR has pointed out more than once, the SPEs are esentially pixel shaders and they will be used for the rendering pipeline. Furthermore, IBM themselves stated in the presentation that they consider the CELL to be a combination of a CPU and GPU. The IBM rep also answered a question about using these for rendering and discussed the fact that SPE peer-to-peer communication over the EIB, in combination with local storage, means that you can flexibly assign different SPEs to different parts of the rendering pipeline.

Moab's comments do bring out one important fact, though. It pays to remember that the CELL is a ways off, and that the PC will be that much more powerful when this new design finally hits the market. Furthermore, there's going to be a learning curve as developers figure out how to take advantage of this substantially different hardware. This learning curve won't be as steep as that for the PS2, but it will be enough to give the PC (which is already quite mature) even more time to increase in power before CELL reaches its full potential.

There are no miracles or magic bullets in microprocessor design. Expect the CELL to be impressive, but don't expect it to just lay waste to all the competition right out of the gate. In fact, expecting some kind of miracle architecture misunderstands the fundamental premise of this new design. The idea isn't so much to bring about an all-at-once radical leap forward in performance as it is to provide a forward-looking, scalable platform that will serve as the architectural basis for future performance increases that aren't tied to GHz numbers and that aren't as constrained by the Von Neumann bottlneck.

Click to expand...

SPE's are essentially pixel shaders What does this mean for the NV5x GPU

I think Hannibal, with all due respect, is off when he says that SPE's are essentially Pixel Shaders like Wasson states.

When people say that SPE's are not going to be used too much for rendering (although I can see a system doing software rendering, just not obtaining the speed a CELL CPU + nVIDIA GPU/ATI GPU can reach), they mean some specific portion of the rendering pipeline.

You can use them as pixel shaders of course, you can get them to sample and filter textures: they just won't be as fast doing that as a dedicated multi-threaded ALU + close-by hardware Texture Management Units can be.

akira888 · Feb 9, 2005

EDIT3: Pana said it better above.

j^aws · Feb 9, 2005

Okay... all your explanations sound plausible...but something smells fishy!

ARS Technica: Introducing the Cell

j^aws

Titanio

London Geezer

Deepak

B3D Yoddha

Megadrive1988

SiBoy

j^aws

j^aws

j^aws

marconelly!

one

Unruly Member

j^aws

version

nAo

Nutella Nutellae

j^aws

j^aws

version

Panajev2001a

akira888

j^aws

Similar threads