Speculation on PS3 dev kits

On the topic of compilers and such...

APUs - Asuming these are and function as next-gen VPUs, may not have a traditional compiler. But a VCL2 ready at launch will be nice.

PUs - Speculation points to an unanounced PPC varient. Assume that a PS3 development licence comes with IBM's compiler.

GPU - A Cg/OpenGL API? Or just the hw interface only?



On the topic of programming art...

Assuming the PS3 1:8 configuration is true, it's interesting that Xenon actually ends up having the more 'traditional parallel approach' - multi-threading and OpenMP likely. Cell papers describe a 'pipeline approach for APUs'. Think about it a little - this can have very high throughput, but load-balancing is a problem. Can the system automatically load-balance? Even if it can, can it do it fast or well enough? I suspect devs end up manually doing load-balancing by themselves.
 
To all the PS2 programmers that responded to my rant:
Yeah, I intentionally chose inflammatory wording in my description. It was a rant after all... It can be great fun unravelling the puzzle of the PS2. But while I enjoy the puzzles, I enjoy getting results a lot more.

Case in point:
Think back to the very first time you went through the process I described. Remember reading through the docs, samples and newsgroups to figure out what to do? How long did it take between when you started reading until you had correctly pushed a polygon from the EE through the DMA, the VIF, the VU, the GIF and finally the GS? Recently a friend of mine needed to draw some simple polys on the Xbox. Up until that point he had only done graphics on the PS2, the PS1 and earlier consoles. No DirectX and no OpenGL experience. He was handed an Xbox app that was initialized and clearing the screen and the SDK help file. In half an hour of working alone he had textured triangles on the screen using a VertexBuffer and DrawPrimitive. It wasn't a 100% efficient, ship-quality implementation, but it was pretty much the right way to do it and pretty close to as fast as it could possibly be on the first try. With this done, he was able to get back to the real work of making better tools to make better games.

To passerby:
I expect that the PS3 will be a lot more developer-friendly than the PS2, but the help won't come from Sony. It will be IBM's compilers and Nvidia's toolset. I honestly believe that Sony's English-speaking PS2 developer support group is doing the best they can given an extremely limited budget. They know that there are lots of problems because they spend most of their time helping developers through them, but they don't have the authority or manpower to do anything to permanently solve those problems.

Case in point:
Sony of Japan provides a USB keyboard driver to developers. We have been using it and it seemed to suck hard. It seemed to be a big performance drain, it seemed to miss or drop keypresses frequently, it seemed to be totally unusable in a shipping game. The guy who wrote our keyboard implementation using it insisted that the driver was poor an there was nothing he could do about it. That guy recently left and I re-wrote the keyboard implementation. After reading the docs and trying out many dead-end implementations I had one that I thought was the best I could do. Then I realized that I had almost exactly recreated the other guy's code! At that point I decided to ignore the docs and started black-box testing the driver. Very quickly it became apparent that the driver could be used effectively, but the right way to use it was very awkward and poorly explained in the docs. The only way it makes sense is that I can see how it would have been much easier for the driver writer for it to be set it up that way.

So what does this have to do with Sony of Europe/America's dev support? Well, given that they receive complaints about the driver on a regular basis they decided to help by creating an alternate driver and providing the source to developers. Their alternative almost exactly reproduces the interface and features of the original. The point is to give you something to reference so that you can write your own driver from scratch that hopefully won't suck. They would love to make a not-sucking driver themselves, but they honestly don't have the time. [Disclaimer]Everything in this message is wild speculation. It does not reflect the official position of Sony, my company or anyone else including myself (just to be safe)[/Disclaimer]

<More for passerby> Regarding automatic load-balancing:
Except for the "embarrasingly parallel" class of problems (vertex tranformation, simple particle systems) I doubt that the compilers will be able to do much to automatically subdivide tasks across APUs. Although 128K per APU is a hell of a lot better than 16K in the VU1, it will still be the determining factor in how streams are pipelined. To keep things moving smoothly, we are going to have to at least triple-buffer that local memory. 1 buffer for the incoming-DMA packet, 1 for the packet currently being processed and 1 for the outgoing-DMA packet. If you size your packets at 32K each that leaves 32K for the persistant data to be used by the whole stream. Subdividing a program into parallel sub-tasks that fit in 32K will be based around analyzing the data dependancies within that program. Automatically predetermining the runtime data requirements for a given program is only going to be feasible in extremely specialized situations. Reacting to it and adjusting on the fly is not likely to be feasible in any situation.

Case in point:
Raytracing is largely regarded as the poster child of "embarrasingly parallel" problems, but that claim assumes that each thread has access to the whole scene's data. Unless your entire scene description fits in 32K, you are not going to get very far trying to upload it to a bunch of APUs and then streaming rays through them in parallel. Perhaps you could reverse the problem and upload 32K sets of rays to each APU and then stream your entire scene through each of them in parallel. Unfortunately, that means every ray must be checked against every triangle which eliminates the primary advantage of raytracing (logarithmic performance-to-problem-size ratio due to heirarchical scene traversal) over rasterization (linear scaling). The real solution is a lot more complicated than that and it will require a lot of careful design and setup from the human. The compiler might help you with your inner loop, but it's the preparation for that inner loop that is the hard part.


When I first read the speculation about the Cell, it scared the bejeezus out of me. But since then I have had a lot of time to practice multithreaded programming and stream-based processing. Now I am looking forward to it! I'd like to thank everyone who contributes to this forum for giving me the forewarning to adequately prepare. :)

Also: Wow, that took a long time to write! I don't know how you guys hold such long conversations on a continuous basis, but thank you all for doing it. Its a lot of fun to read. :D
 
corysama said:
Case in point:
Think back to the very first time you went through the process I described. Remember reading through the docs, samples and newsgroups to figure out what to do? How long did it take between when you started reading until you had correctly pushed a polygon from the EE through the DMA, the VIF, the VU, the GIF and finally the GS? Recently a friend of mine needed to draw some simple polys on the Xbox. Up until that point he had only done graphics on the PS2, the PS1 and earlier consoles. No DirectX and no OpenGL experience. He was handed an Xbox app that was initialized and clearing the screen and the SDK help file. In half an hour of working alone he had textured triangles on the screen using a VertexBuffer and DrawPrimitive. It wasn't a 100% efficient, ship-quality implementation, but it was pretty much the right way to do it and pretty close to as fast as it could possibly be on the first try. With this done, he was able to get back to the real work of making better tools to make better games.

And had you handed him similar example code for the PS2 he might have had similar results. Such code and libraries do exist.

The process you described doesn't really match what you really do, so it's a bit hard to think back to it. In reality:

You write some code, in C, to build a DMA list. This is made of simple data-unpacking instructions which upload code and bits of geometry to the VU. You write the VU code in a macro assembler and link it into your project. The VU code reads data, transforms it, and writes it out again before kicking it and starting over again in another buffer. For textures you could put them in the same list, but for best efficiency you write a second DMA list that just uploads the textures, and you write a bit of code to hand synchronisation (or use automatic stuff, but that ends up being overly complex, potentially unstable, and IMO less efficient and so unnecessary). Thats it - at it's most complex, thats how you deal with it. There's some fluff involved to make it double buffered correctly, but to suggest there's a mountain of "processors you have to program without any language at all" is just not true.

Worse than just a few simple OpenGL calls? Absolutely. Spawn of all evil? Hardly.

Besides which, that describes the most efficient process possible. To get started you can skip most of that and just play with one of the supplied samples. Or you can poke stuff straight into the FIFO registers instead writing DMA lists at all and see the results immediately.

My first experience was not so painful. I was lucky enough to have very early access to PS2 hardware, before there was much of an SDK and only early translations of the manuals, and inside a day I had some bump-mapping* going using EE+GS and a colleague had written a VU renderer. Ok, we had graphics experience, but it really wasn't as hard as people make out.

I've seen junior programmers with little prior experience acheive great things on the PS2 relatively quickly. I've also seen supposedly senior people flounder around.

Note that I'm not in any way claiming it couldn't be improved - it could be, and by a significant margin. I also think Sony realise this and you may be surprised (pleasantly I would hope) when their next-gen SDK contains a lot more in the way of tools and higher-level support. There are clues already, such as the Collada project which surely seems like a lot of effort to go to on their part if they aren't getting something out of it. Seems to me like the only reason for that project would be to back up any internal tools projects with broad compatibility to 3rd party tools.

The biggest change in this last generation must be the addition of Microsoft. They significantly raised the bar on development environment by bringing to bear years and years of development in the PC realm. They have what is arguably the defacto standard in IDEs, not to mention a host of other tools and documentation. Combine that with a platform which was very familiar to a lot of coders, you have a development environment that no other manufacturer could possibly compete with.

This time around they still have a lot of those advantages (though the rumoured architecture change will break a few). However to suggest that Sony is blind to this is insane. They won't haemorage money (which they'd have to do) in order to compete, but you can bet they'll try to improve on their past record. There have been developer surveys with very leading questions... The job adverts (occasionally posted here, and visible on their websites) point at them recruiting heavily in support and R&D... it all points at significant effort being made to make developers happier this time around.

I think they're taking the threat from Microsoft very seriously indeed.


*when I say bump-mapping, it was just some 2-pass emboss mapping stuff. I was researching multi-pass shading, as that seemed appropriate to the architecture at the time.
 
MrWibble said:
corysama said:
Case in point:
Think back to the very first time you went through the process I described. Remember reading through the docs, samples and newsgroups to figure out what to do? How long did it take between when you started reading until you had correctly pushed a polygon from the EE through the DMA, the VIF, the VU, the GIF and finally the GS? Recently a friend of mine needed to draw some simple polys on the Xbox. Up until that point he had only done graphics on the PS2, the PS1 and earlier consoles. No DirectX and no OpenGL experience. He was handed an Xbox app that was initialized and clearing the screen and the SDK help file. In half an hour of working alone he had textured triangles on the screen using a VertexBuffer and DrawPrimitive. It wasn't a 100% efficient, ship-quality implementation, but it was pretty much the right way to do it and pretty close to as fast as it could possibly be on the first try. With this done, he was able to get back to the real work of making better tools to make better games.

And had you handed him similar example code for the PS2 he might have had similar results. Such code and libraries do exist.

The process you described doesn't really match what you really do, so it's a bit hard to think back to it. In reality:

You write some code, in C, to build a DMA list. This is made of simple data-unpacking instructions which upload code and bits of geometry to the VU. You write the VU code in a macro assembler and link it into your project. The VU code reads data, transforms it, and writes it out again before kicking it and starting over again in another buffer. For textures you could put them in the same list, but for best efficiency you write a second DMA list that just uploads the textures, and you write a bit of code to hand synchronisation (or use automatic stuff, but that ends up being overly complex, potentially unstable, and IMO less efficient and so unnecessary). Thats it - at it's most complex, thats how you deal with it. There's some fluff involved to make it double buffered correctly, but to suggest there's a mountain of "processors you have to program without any language at all" is just not true.

MrWibble, you make it sound so simple hehe.

Seriously though after having done all you wrote above (I know this becaus ein my little project we have gone through that: DMA handling class with stitching to jump from 4 KB page to 4 KB page... [sps2 on ps2linux only allocates memory that is physically contiguous in 4 KB pages], quad buffered VU1 code [double buffering with BASE and OFFSET registers and XTOP as well as doing double buffering in the VU code dividing the given buffer in two parts], etc...) I am trying to force myself thinking that it is sooo hard to understand all of it.. it is not anymore... but it felt liek that at the beginning.

On one side you want to do things the PS2-way, like you hear pro's doing and you dive yourself in the docs and in the samples: I got a headache for quite a bit as at the beginning there seems to be just oo much stuff to keep track of (and understand how it connects together) at the same time.

With time it becomes a bit of a pointer nightmare sometimes, but that can happen on any hardware if you do not use automatic memory handling.

Still, some parts can be confusing... think about VU1 double buffering... step by step... data coming in the VIF and going in one buffer and how the XTOP register is updated... it cna be confusing if someone does not help you through.
 
Back
Top