Squeak said:
As I understand it, using the VU0 as a coprocessor isn’t very good utilization of it, SCEE even went so far as to call it evil, in on of their presentations.
Thats there job
The problem is that nobody (including them) can find a really good cost-effective use for it in seperate mode. We have all got/had ideas for some crazy things but schedules intrude.
In the constant arms race of games development it will get used more and more but it basically doesn't lend itself to being used as a seperate processor.
Squeak said:
Doesn’t vertex shaders, like in the gForce GPUs, have even less memory?
But vertex shaders work on streams, so there RAM doesn't include storing the vertices your currently working on and the results of the calculations. In a VU, you have to upload the data as well into that space (as well as constants etc), you may also have to double buffer the output.
VU1 is doing an easier job, it takes a bunch of objects and renders them, except for some matrix stuff and bounding calculation is largely isolated from the game code.
Squeak said:
If it really is a third of the total floating point power of the system, then why not use it to its fullest?
Because you get a fair whack of the performance in coprocessor mode and by the time you sorted out all the issues to use it, you used alot of dev time and will gain alot less 30% extra FPU power.
Squeak said:
Scratch pad require explicit control of the memory, therefore its not as easy to use as cache.
Maybe not as straight forward, but still more cost effective
Only in some situation with a fairly ordered access pattern, things like AI and general game code a good large cache helps ALOT.
Squeak said:
80Kb of RAM, not sure where you get that from unless your adding VU0 and VU1 RAM into the figure but they shouldn't count as there effectively seperate processors and need there own RAM.
Well, isn’t that almost a question of semantics? A question of how you define a separate processor? Then how about an FPU or Altivec units, are they also separate processors?
The VPUs couldn’t "live on their own" so to speak. For advanced applications they have to be hooked to some sort of CPU to work, or am I mistaken?
I define (and I think most people define) a processor as a unit that executes a series of instructions independent of anything else. So FPU's or Altivec aren't seperate there coprocessors, as the 'just' add extra instructions for a CPU. Whereas the VU's are complete seperate processors (except VU0 in coprocessor mode) they run totally seperate from the CPU. Indeed VU1 is hardly ever touched by the R5900 itself, the only connection is R5900 builds a display list that the VU1 consumes, but thats only because we want interaction, its possible for VU1 to work seperately.
Physically there all on one chip (AFAIK actual physical hardware isn't something I've ever looked at) but logically they (can) all operate seperately. Indeed people (for fun) have run entire little games on VU1... with the only outside access being setting flags for controller input.
Squeak said:
As far as I've been able to find out it still has more memory to die bandwidth, than other CPUs of it's time (for example xGPU has 1 Gb/s).
The need to supply VU1 with data hogs the memory bus ALOT. Its fairly hard to get a bus cycle for the CPU which because of small cache is needed more often than most other CPU's.
Lots of devs use it as a powerful programmable coprocesser (you can upload mini programs to it and call them MACRO like from the main CPU). Its very good when used like that
BTW Your Xbox figure is way out, its got a total of 6.4Gb/s memory speed. But without understanding the difference in memory architectures is unfair to compare them (Xbox shares its bus with framebuffer ops but has more CPU cache etc)