To all the PS2 programmers that responded to my rant:
Yeah, I intentionally chose inflammatory wording in my description. It was a rant after all... It can be great fun unravelling the puzzle of the PS2. But while I enjoy the puzzles, I enjoy getting results a lot more.
Case in point:
Think back to the very first time you went through the process I described. Remember reading through the docs, samples and newsgroups to figure out what to do? How long did it take between when you started reading until you had correctly pushed a polygon from the EE through the DMA, the VIF, the VU, the GIF and finally the GS? Recently a friend of mine needed to draw some simple polys on the Xbox. Up until that point he had only done graphics on the PS2, the PS1 and earlier consoles. No DirectX and no OpenGL experience. He was handed an Xbox app that was initialized and clearing the screen and the SDK help file. In half an hour of working alone he had textured triangles on the screen using a VertexBuffer and DrawPrimitive. It wasn't a 100% efficient, ship-quality implementation, but it was pretty much the right way to do it and pretty close to as fast as it could possibly be on the first try. With this done, he was able to get back to the real work of making better tools to make better games.
To passerby:
I expect that the PS3 will be a lot more developer-friendly than the PS2, but the help won't come from Sony. It will be IBM's compilers and Nvidia's toolset. I honestly believe that Sony's English-speaking PS2 developer support group is doing the best they can given an extremely limited budget. They know that there are lots of problems because they spend most of their time helping developers through them, but they don't have the authority or manpower to do anything to permanently solve those problems.
Case in point:
Sony of Japan provides a USB keyboard driver to developers. We have been using it and it seemed to suck hard. It seemed to be a big performance drain, it seemed to miss or drop keypresses frequently, it seemed to be totally unusable in a shipping game. The guy who wrote our keyboard implementation using it insisted that the driver was poor an there was nothing he could do about it. That guy recently left and I re-wrote the keyboard implementation. After reading the docs and trying out many dead-end implementations I had one that I thought was the best I could do. Then I realized that I had almost exactly recreated the other guy's code! At that point I decided to ignore the docs and started black-box testing the driver. Very quickly it became apparent that the driver could be used effectively, but the right way to use it was very awkward and poorly explained in the docs. The only way it makes sense is that I can see how it would have been much easier for the driver writer for it to be set it up that way.
So what does this have to do with Sony of Europe/America's dev support? Well, given that they receive complaints about the driver on a regular basis they decided to help by creating an alternate driver and providing the source to developers. Their alternative almost exactly reproduces the interface and features of the original. The point is to give you something to reference so that you can write your own driver from scratch that hopefully won't suck. They would love to make a not-sucking driver themselves, but they honestly don't have the time. [Disclaimer]Everything in this message is wild speculation. It does not reflect the official position of Sony, my company or anyone else including myself (just to be safe)[/Disclaimer]
<More for passerby> Regarding automatic load-balancing:
Except for the "embarrasingly parallel" class of problems (vertex tranformation, simple particle systems) I doubt that the compilers will be able to do much to automatically subdivide tasks across APUs. Although 128K per APU is a hell of a lot better than 16K in the VU1, it will still be the determining factor in how streams are pipelined. To keep things moving smoothly, we are going to have to at least triple-buffer that local memory. 1 buffer for the incoming-DMA packet, 1 for the packet currently being processed and 1 for the outgoing-DMA packet. If you size your packets at 32K each that leaves 32K for the persistant data to be used by the whole stream. Subdividing a program into parallel sub-tasks that fit in 32K will be based around analyzing the data dependancies within that program. Automatically predetermining the runtime data requirements for a given program is only going to be feasible in extremely specialized situations. Reacting to it and adjusting on the fly is not likely to be feasible in any situation.
Case in point:
Raytracing is largely regarded as the poster child of "embarrasingly parallel" problems, but that claim assumes that each thread has access to the whole scene's data. Unless your entire scene description fits in 32K, you are not going to get very far trying to upload it to a bunch of APUs and then streaming rays through them in parallel. Perhaps you could reverse the problem and upload 32K sets of rays to each APU and then stream your entire scene through each of them in parallel. Unfortunately, that means every ray must be checked against every triangle which eliminates the primary advantage of raytracing (logarithmic performance-to-problem-size ratio due to heirarchical scene traversal) over rasterization (linear scaling). The real solution is a lot more complicated than that and it will require a lot of careful design and setup from the human. The compiler might help you with your inner loop, but it's the preparation for that inner loop that is the hard part.
When I first read the speculation about the Cell, it scared the bejeezus out of me. But since then I have had a lot of time to practice multithreaded programming and stream-based processing. Now I am looking forward to it! I'd like to thank everyone who contributes to this forum for giving me the forewarning to adequately prepare.
Also: Wow, that took a long time to write! I don't know how you guys hold such long conversations on a continuous basis, but thank you all for doing it. Its a lot of fun to read.