Titanio said:Yes, interesting indeed.
Although I'm confused slightly - if it's dual issue, does that not allow for out-of-order execution, even in a simple form?
Hannible said:Originally posted by Hannibal:
I'm going to do another post on this a little later, but I wanted to make a few clarifications about things raised in this thread.
The PPE is not, as I thought before the session, a POWER5 derivative. It's a dual-issue inorder machine with VMX capabilities. It's actually a deriviative of a different project that apparently didn't go anywhere from a few years ago. I don't know details, but I overheard someone talking about it.
The places where I said "128 bytes" are indeed typos, and should be 128 bits. The editor is fixing those.
I recommend downloading the .doc from SCEE linked up above. If you read it, you'll know aobut 85% of what I know at this point. I have a really nice paper abstract that I can draw more information from.
Also, I have more info on the SPEs which I didn't include. In particular, I have pipeline diagrams and instruction latency tables for all the units. I can post that stuff tomorrow for those interested.
Finally, that Blanchford guy whose article I critiqued a while back has a pretty good "Clarifications" page that collects up much of the available info. Check it out, here. And especially be sure to read it before emailing me asking if I'm going to apologize for nitpicking about the "cache" language, like some wanker has already done.
If I had it to do over again, I would definitely have dropped the "monitor" analogy, but I do stand by the substance of my criticism of that aspect (and others) of the the article, and in fact the recent revelations have vindicated them.
On a related note, it's important to understand why they didn't go with VMX on the SPUs. The SPU execution hardware is just too stripped-down and barebones to support a feature-rich ISA extension like VMX. So there was no point in it. They just cooked up a custom, simple, SIMD ISA with load-store capabilities for reading/writing the LS and the channel interface.
Finally, IBM won't release performance benchmarks, but they do claim a 10X speedup over a PC in the same power envelope. Take this claim with a large grain of salt, however, because there's no context to it (i.e. on what type of application, vs. what kind of PC, etc. etc.).
marconelly! said:Can someone give a link to that SCEE PDF that Hannibal talks about?
Hannibal said:Originally posted by Hannibal:
Regarding this 4GHz number, I have a question/comment that I'd like to throw out. Given IBM's track record on 90nm, their history of releasing optimistic clockspeed targets (at least if we take Jobs's 3GHz(?) claims to reflect IBM's assurances), and the CELL's die size, can we really expect this chip to debut in quantity at 4GHz? It seems to me that this number will likely be subject to downward revision in the next two years.
Also, I've seen Moab here and elsewhere talking up the idea that the SPEs aren't for use in rendering. In this he is most assuredly wrong. As Scott Wasson at TR has pointed out more than once, the SPEs are esentially pixel shaders and they will be used for the rendering pipeline. Furthermore, IBM themselves stated in the presentation that they consider the CELL to be a combination of a CPU and GPU. The IBM rep also answered a question about using these for rendering and discussed the fact that SPE peer-to-peer communication over the EIB, in combination with local storage, means that you can flexibly assign different SPEs to different parts of the rendering pipeline.
Moab's comments do bring out one important fact, though. It pays to remember that the CELL is a ways off, and that the PC will be that much more powerful when this new design finally hits the market. Furthermore, there's going to be a learning curve as developers figure out how to take advantage of this substantially different hardware. This learning curve won't be as steep as that for the PS2, but it will be enough to give the PC (which is already quite mature) even more time to increase in power before CELL reaches its full potential.
There are no miracles or magic bullets in microprocessor design. Expect the CELL to be impressive, but don't expect it to just lay waste to all the competition right out of the gate. In fact, expecting some kind of miracle architecture misunderstands the fundamental premise of this new design. The idea isn't so much to bring about an all-at-once radical leap forward in performance as it is to provide a forward-looking, scalable platform that will serve as the architectural basis for future performance increases that aren't tied to GHz numbers and that aren't as constrained by the Von Neumann bottlneck.
IMHO, it means Hannibal is wrongJaws said:SPE's are essentially pixel shaders What does this mean for the NV5x GPU
version said:a pixelshaders on 4 GHZ , its fine
nAo said:IMHO, it means Hannibal is wrongJaws said:SPE's are essentially pixel shaders What does this mean for the NV5x GPU
Jaws said:version said:a pixelshaders on 4 GHZ , its fine
Was it not expected that the SPEs in CELL be vertex shaders and the NV5x GPU would be the pixel shaders (and maybe vertex shaders also)?
Jaws said:Hannibal said:Originally posted by Hannibal:
Regarding this 4GHz number, I have a question/comment that I'd like to throw out. Given IBM's track record on 90nm, their history of releasing optimistic clockspeed targets (at least if we take Jobs's 3GHz(?) claims to reflect IBM's assurances), and the CELL's die size, can we really expect this chip to debut in quantity at 4GHz? It seems to me that this number will likely be subject to downward revision in the next two years.
Also, I've seen Moab here and elsewhere talking up the idea that the SPEs aren't for use in rendering. In this he is most assuredly wrong. As Scott Wasson at TR has pointed out more than once, the SPEs are esentially pixel shaders and they will be used for the rendering pipeline. Furthermore, IBM themselves stated in the presentation that they consider the CELL to be a combination of a CPU and GPU. The IBM rep also answered a question about using these for rendering and discussed the fact that SPE peer-to-peer communication over the EIB, in combination with local storage, means that you can flexibly assign different SPEs to different parts of the rendering pipeline.
Moab's comments do bring out one important fact, though. It pays to remember that the CELL is a ways off, and that the PC will be that much more powerful when this new design finally hits the market. Furthermore, there's going to be a learning curve as developers figure out how to take advantage of this substantially different hardware. This learning curve won't be as steep as that for the PS2, but it will be enough to give the PC (which is already quite mature) even more time to increase in power before CELL reaches its full potential.
There are no miracles or magic bullets in microprocessor design. Expect the CELL to be impressive, but don't expect it to just lay waste to all the competition right out of the gate. In fact, expecting some kind of miracle architecture misunderstands the fundamental premise of this new design. The idea isn't so much to bring about an all-at-once radical leap forward in performance as it is to provide a forward-looking, scalable platform that will serve as the architectural basis for future performance increases that aren't tied to GHz numbers and that aren't as constrained by the Von Neumann bottlneck.
SPE's are essentially pixel shaders What does this mean for the NV5x GPU