ARS Technica: Introducing the Cell

Yes, interesting indeed.

Although I'm confused slightly - if it's dual issue, does that not allow for out-of-order execution, even in a simple form?
 
Titanio said:
Yes, interesting indeed.

Although I'm confused slightly - if it's dual issue, does that not allow for out-of-order execution, even in a simple form?

In this case, no. They opted for simplicity.
 
Hannible said:
Originally posted by Hannibal:
I'm going to do another post on this a little later, but I wanted to make a few clarifications about things raised in this thread.

The PPE is not, as I thought before the session, a POWER5 derivative. It's a dual-issue inorder machine with VMX capabilities. It's actually a deriviative of a different project that apparently didn't go anywhere from a few years ago. I don't know details, but I overheard someone talking about it.

The places where I said "128 bytes" are indeed typos, and should be 128 bits. The editor is fixing those.

I recommend downloading the .doc from SCEE linked up above. If you read it, you'll know aobut 85% of what I know at this point. I have a really nice paper abstract that I can draw more information from.

Also, I have more info on the SPEs which I didn't include. In particular, I have pipeline diagrams and instruction latency tables for all the units. I can post that stuff tomorrow for those interested.

Finally, that Blanchford guy whose article I critiqued a while back has a pretty good "Clarifications" page that collects up much of the available info. Check it out, here. And especially be sure to read it before emailing me asking if I'm going to apologize for nitpicking about the "cache" language, like some wanker has already done.

If I had it to do over again, I would definitely have dropped the "monitor" analogy, but I do stand by the substance of my criticism of that aspect (and others) of the the article, and in fact the recent revelations have vindicated them.

On a related note, it's important to understand why they didn't go with VMX on the SPUs. The SPU execution hardware is just too stripped-down and barebones to support a feature-rich ISA extension like VMX. So there was no point in it. They just cooked up a custom, simple, SIMD ISA with load-store capabilities for reading/writing the LS and the channel interface.

Finally, IBM won't release performance benchmarks, but they do claim a 10X speedup over a PC in the same power envelope. Take this claim with a large grain of salt, however, because there's no context to it (i.e. on what type of application, vs. what kind of PC, etc. etc.).

Some clarifications from the above article by the author Hannibal. His article is only 'Part 1' and more should follow as details emerge...
 
figure7.png


Part II: The Cell Architecture

More will follow from Hannibal...
 
xbox2patent_01.gif


Above is Xenon/ Xbox2 patent. Focus on the L2 cache...a bit of imagination needed, but the L2<>GPU sharing is similar to CELL below with it's L2 sharing...

figure7.png


...the CELLs PPE sharing it's L2 cache with 8 SPEs looks similar...and there was an IBM patent describing the above also.


i.e.

CPU<=>L2 Cache<=>GPU

where,

1 Xe CPU core >>> PPE,
1 Xe GPU >>> 8 SPEs (R500 shading ALUs equivalent)

Above is just an analogy but it's strikingly similar as suggested by some IBM patents! :D ;)
 
Hannibal said:
Originally posted by Hannibal:
Regarding this 4GHz number, I have a question/comment that I'd like to throw out. Given IBM's track record on 90nm, their history of releasing optimistic clockspeed targets (at least if we take Jobs's 3GHz(?) claims to reflect IBM's assurances), and the CELL's die size, can we really expect this chip to debut in quantity at 4GHz? It seems to me that this number will likely be subject to downward revision in the next two years.

Also, I've seen Moab here and elsewhere talking up the idea that the SPEs aren't for use in rendering. In this he is most assuredly wrong. As Scott Wasson at TR has pointed out more than once, the SPEs are esentially pixel shaders and they will be used for the rendering pipeline. Furthermore, IBM themselves stated in the presentation that they consider the CELL to be a combination of a CPU and GPU. The IBM rep also answered a question about using these for rendering and discussed the fact that SPE peer-to-peer communication over the EIB, in combination with local storage, means that you can flexibly assign different SPEs to different parts of the rendering pipeline.

Moab's comments do bring out one important fact, though. It pays to remember that the CELL is a ways off, and that the PC will be that much more powerful when this new design finally hits the market. Furthermore, there's going to be a learning curve as developers figure out how to take advantage of this substantially different hardware. This learning curve won't be as steep as that for the PS2, but it will be enough to give the PC (which is already quite mature) even more time to increase in power before CELL reaches its full potential.

There are no miracles or magic bullets in microprocessor design. Expect the CELL to be impressive, but don't expect it to just lay waste to all the competition right out of the gate. In fact, expecting some kind of miracle architecture misunderstands the fundamental premise of this new design. The idea isn't so much to bring about an all-at-once radical leap forward in performance as it is to provide a forward-looking, scalable platform that will serve as the architectural basis for future performance increases that aren't tied to GHz numbers and that aren't as constrained by the Von Neumann bottlneck.

SPE's are essentially pixel shaders :?: What does this mean for the NV5x GPU :?:
 
version said:
a pixelshaders on 4 GHZ , its fine

Was it not expected that the SPEs in CELL be vertex shaders and the NV5x GPU would be the pixel shaders (and maybe vertex shaders also)?
 
Jaws said:
version said:
a pixelshaders on 4 GHZ , its fine

Was it not expected that the SPEs in CELL be vertex shaders and the NV5x GPU would be the pixel shaders (and maybe vertex shaders also)?

possibility:

1 . SPE in GPU
2 . deferred rendering
3. peer to peer
 
Jaws said:
Hannibal said:
Originally posted by Hannibal:
Regarding this 4GHz number, I have a question/comment that I'd like to throw out. Given IBM's track record on 90nm, their history of releasing optimistic clockspeed targets (at least if we take Jobs's 3GHz(?) claims to reflect IBM's assurances), and the CELL's die size, can we really expect this chip to debut in quantity at 4GHz? It seems to me that this number will likely be subject to downward revision in the next two years.

Also, I've seen Moab here and elsewhere talking up the idea that the SPEs aren't for use in rendering. In this he is most assuredly wrong. As Scott Wasson at TR has pointed out more than once, the SPEs are esentially pixel shaders and they will be used for the rendering pipeline. Furthermore, IBM themselves stated in the presentation that they consider the CELL to be a combination of a CPU and GPU. The IBM rep also answered a question about using these for rendering and discussed the fact that SPE peer-to-peer communication over the EIB, in combination with local storage, means that you can flexibly assign different SPEs to different parts of the rendering pipeline.

Moab's comments do bring out one important fact, though. It pays to remember that the CELL is a ways off, and that the PC will be that much more powerful when this new design finally hits the market. Furthermore, there's going to be a learning curve as developers figure out how to take advantage of this substantially different hardware. This learning curve won't be as steep as that for the PS2, but it will be enough to give the PC (which is already quite mature) even more time to increase in power before CELL reaches its full potential.

There are no miracles or magic bullets in microprocessor design. Expect the CELL to be impressive, but don't expect it to just lay waste to all the competition right out of the gate. In fact, expecting some kind of miracle architecture misunderstands the fundamental premise of this new design. The idea isn't so much to bring about an all-at-once radical leap forward in performance as it is to provide a forward-looking, scalable platform that will serve as the architectural basis for future performance increases that aren't tied to GHz numbers and that aren't as constrained by the Von Neumann bottlneck.

SPE's are essentially pixel shaders :?: What does this mean for the NV5x GPU :?:

I think Hannibal, with all due respect, is off when he says that SPE's are essentially Pixel Shaders like Wasson states.

When people say that SPE's are not going to be used too much for rendering (although I can see a system doing software rendering, just not obtaining the speed a CELL CPU + nVIDIA GPU/ATI GPU can reach), they mean some specific portion of the rendering pipeline.

You can use them as pixel shaders of course, you can get them to sample and filter textures: they just won't be as fast doing that as a dedicated multi-threaded ALU + close-by hardware Texture Management Units can be.
 
Back
Top