MrWibble said:
You don't need massive amounts of bit-manipulation functions on the VU precisely *because* you have the VIF to take nice, ordinary, easy to output data, and pack it up for sending to the GS.
No, no no... the VIF is no excuse... I like VU's and all, but the instruction set would benefit (VU was good for 1999-2000, not for a 2005-2006 Streaming Processor like the SPU's/APU's). The code itself that you write for the VU's would benefit of more logic operations and more integer and FP instructions (wait... if I divide I can only save into the Q register ? Ok...
). Not to talk about VU instructions that are nto even commented/talked about in the oficial docs not to mention other bugs (like how far from the XGKICK to put the [e] bit). Would like to manipulate easily the GIF tag in the VU data Memory ? It would be nice and easy if you could access easily all the bits you need using the VU's integer instruction set.
VCL is a BIG help, but then one of the best thing even the author of VCL dreamt about (a macro to write GIF tags) cannot still be realized with it...
Still, I like the VU's and getting around the issues is a bit masochistic, but can be also fun and rewarding once you passed them.
This is no different to any other architecture, other than the fact that other architectures hide some of this behind a "driver". But you still generally compile lists of vertices, bits of shader code, and kick them off to external units.
If you can efficiently hide it behind a driver... you know... do it. I am glad they have given lots of flexibility, but in some cases they exagerated. People do it, but if the GS could do full 3D clipping, I would hear no weeping among the development community unless this came at ultra crippled GS performance, but it does not have to be this way).
The VU is certainly not a VLIW processor by any serious definition of the phrase. It has relatively high level instructions which manage a multi-stage pipeline. If it were truely VLIW you would be setting seperate instructions for each stage of that pipeline, each cycle.
What ? For each Stage of the pipeline ?!? That is not the requirement for a VLIW processor: at least not something have seriously seen applied. What are these real VLIW processors ?
You are taking the ideal behind VLIW a bit too far: VLIW moves back all the scheduling of instructions (as much as possible) and a lot of the conditional branching prediction (if conversion in IA-64) and schedules bundles of executions, VLIWs, that contain instructions for each functional unit and informations to help the CPU to schedule the instruction in the bundle and different bundles with as little logic as possible.
What modern VLIW processor makes you write code for fetching stages, decoding stages, issue stages, execution stages and retire stages ?
http://www.hotchips.org/archive/hc11/hc11pres_pdf/hc99.t2.s1.IA64tut.pdf
It's not like the bulk of your game has to be written in hex and entered on punch-cards of something. The vast majority will be written in C/C++ in a modern development environment (ok, the "official" SDK uses GCC, but it's relatively easy to plug that into whatever environment you normally use).
Only a few core programmers on any team will have to mess around at the really low-level, and there's a wealth of sample code and helper libraries in the SDK and on the official developer website.
Thsoe few core programmer HATE that job: lots of them would gladly renounce to PS 3.xx if they could have good General Purpose performance that allowed the use of C/C++ without having to re-write a good amount of code into CPU core friendly ASM code.
GCC for the R5900i is hardly a work of art, but the R5900i performance even with a better compiler would hardly be optimal as they generally ignore/cannot handle the SPRAM and if you rely on GCC you have to live with a nice 8 KB Data Cache.