Nicholas Blachford's article ripped already? (Cell info)

Josh378

Newcomer
Bringing the link from the PS2GB at IGN.com:

http://arstechnica.com/news.ars/post/20050124-4551.html

Cell "analysis" a mixed bag

1/24/2005 11:33:48 PM, by Hannibal

Last week, OS News published an analysis of IBM's Cell-related patents. This article presents some of the information in the patents in an easily digestible format, but it has some serious flaws, as well. And I'm not talking about Cell-specific flaws, though there are those, but what appear to be problems with the author's understanding of basic computer architecture.

For instance, the author, Nicholas Blachford, starts off with a fantastic and completely made-up benchmark estimate for how fast Cell will complete a SETI@Home work unit (i.e. 5 mins). In the footnotes, we find that this number is extrapolated from the SETI numbers for a 1.33GHz G4. The extrapolation is done using a combination of real (for the G4 )and hypothetical (for the Cell) FLOPS ratings, which are not only fairly meaningless as a cross-platform performance metric but also take no account of the kinds of platform-specific optimizations that are all-important for SETI performance. So this is pretty much hogwash.

In another part of the article, Blachford claims that the cell processing units have no "cache." Instead, they each have a "local memory" that fetches data from main memory in 1024-bit blocks. Well, that's sort of like saying that an iMac doesn't have a "monitor," but it does have a surface on which visual output is displayed. In other words, the Cell "local memories," which are roughly analogous to the vector units' "scratchpad RAM" on the PS2's Emotion Engine, function as caches for the PUs. What has thrown the author for a loop is that they're small, and the fact that they're tied to each cellular processing unit means that they don't function in the memory heirarchy in the exact same way that an L1 does in a traditional processor design. They do, however, cache things. But maybe I'm being nitpicky with this.

Blachford also declares that the longstanding problems inherent in code parallelism and multithreaded programming are now solved, because the Cell will just miraculously do all this stuff for you via fancy compiler and process scheduling tricks. Unfortunately, parallelization is a fundamental application design problem that's rooted in the inherently serial nature of many of the kinds of tasks that we ask computers to perform. There are good parallelizing compilers out there, but they can only extract parallelism that's already latent in the input code and in the algorithm that the code implements; they can't magically parallelize an inherently serial sequence of steps.

These are just three of the many basic flaws in this article. Furthermore, the article is chock full of wild-eyed and completely unsubstantiated claims about exactly how much butt, precisely measured in kilograms and centimeters squared, that the Cell will kick, and how hard, measured in decibels, that the Cell will rock. I'm as excited about the Cell as the next geek, but there's no need to go way over the top like this about hardware that won't even seen the light of day for a year. And it's especially ill-advised to compare it to existing hardware and declare that we have a hands-down winner.

Finally, to address something more specific to the Cell architecture itself, on page 1 we find this claim:

It has been speculated that the vector units are the same as the AltiVec units found in the PowerPC G4 and G5 processors. I consider this highly unlikely as there are several differences. Firstly the number of registers is 128 instead of AltiVec's 32, secondly the APUs use a local memory whereas AltiVec does not, thirdly Altivec is an add-on to the existing PowerPC instruction set and operates as part of a PowerPC processor, the APUs are completely independent processors.

The author appears to be confusing an instruction set with an implementation. The 128-register detail is a problem, because, as the author correctly points out, conventional Altivec has only 32 vector registers. So obviously it's a given that Cell won't be using straight-up Altivec. But it's entirely possible that it'll use some kind of 128-register derivative of the Altivec instruction set. The fact that the individual processing units have a local cache has little to do with whether or not the PUs themselves implement some hypothetical Altivec derivative. Finally, the statement, "Altivec is an add-on to the existing PowerPC instruction set," is correct, but the rest of that sentence--"and operates as part of a PowerPC processor"--doesn't make a whole lot of sense to me in this context. Altivec is an ISA extension that is implemented in different ways on different PowerPC processors. The Cell processor's PUs could very well implement a hypothetical 128-register Altivec2 ISA extension, or they could implement some other SIMD ISA extension. The fact that SIMD code, written to whatever ISA, is farmed out to individual PUs has nothing to do with it. (If what I just said confuses you, you might check out this article.)

Anyway, I could go on, but I'll stop here. You get the idea. Caveat lector and such.

I should note that the author has published some "Clarifications" on his website, and he does back off some of the wackier claims. For instance, in response the criticisms of his claims about magical code parallelization, he says, "This is not true. You still have to break up problems into software Cells." Um, yeah. Precisely.

At any rate, if you have some intermediate level of computer science knowledge and you read the article with a critical eye, throwing out things that are obviously bogus and/or overblown, then you can actually pick up some information on the architecture. Mind you, there are no new revelations in the article (except for the stuff that's made up (e.g. SETI) and/or wrong (e.g. "check it out! no cache!")), but Blachford did manage to pull together a lot of what's already known into one place.


Hmmm, interesting.....

-Josh378
 
While upon reading the article I felt it to be a bit on the over-enthusiastic side I have to strongly disagree with Hannibal on one point. "Cache" is a local, usually on chip, fast memory area that is accessed in the same address space as external memory; the transfer of external memory to and from the cache is also transparent to the application and is handled in hardware. "Scratchpad" on the other hand is also (usually) an on chip fast local memory but it's handled in its own address space with the transfer process to external memory under the explicit control of the application.

But it has been a while since I took undergrad CompArch (Spring 99) and grad CompArch (Fall 02) or taught CompArch undergrad lab (Fall 03) so I could be totally off the reservation here.
 
akira888 said:
While upon reading the article I felt it to be a bit on the over-enthusiastic

Yes, the original text was very enthusiastic, but I don't think it was ever portrayed as anything other than an extrapolation of the public information contained in the patent applications.

However, the arstechnica article sounds juvenile and petty. I think Hannibal is under the mistaken assumption that he is an authority figure on console hardware.
 
PC-Engine said:
I think Hannibal is under the mistaken assumption that he is an authority figure on console hardware.

Consoles are still based on computer architectures...

right. and still, the most Nicholas can be accused of IMHO is too much enthusiasm which is hardly a deadly sin. i really don't see what Hannibal got so upset about. does he have a grudge with Nicholas?
 
Back
Top