There just seems to be so much more information in these forums relative to Cell.
NO, because the argument was that OOOE because of transistor count limits clock speed; obviously it does not!
It's STILL a red herring because what you just wroyte there has nothing to do with OOOE/transistor count limiting clock speed.
I believe clock speed is much more limited to the 'critical path' of a chip but whatever. You're still following what is essentially a red herring argument. If you're arguing that power consumption and heat output is the limit of clock speed - then DO SO. You have to pick one or the other.
OOOE is not a limit to clock speed any more than any other feature of a microchip - properly implemented anyway. This is an entirely different issue compared to power draw.
SPUs don't have VMX units, even if SPU ISA borrows from VMX ISA here and there
dot product instructions are evil, real men don't use them, they use SOA + maddsThe dot product will make a difference as will any other speciality instructions.
10x more evil when they come with a latency that makes their advantage moot in cases where it's supposed to be most important (non loopy code).nAo said:dot product instructions are evil
10x more evil when they come with a latency that makes their advantage moot in cases where it's supposed to be most important (non loopy code).
There's nothing more evil then hw features that have more PR then practical value.
Would having the different VMX128 (units?) in addition to having 3 compared to 1 VMX for the Cell make it more difficult to develop games from X360 to PS3?
Since each VMX128 is tied to each core of Xenon, are they able to work independently of their cores ala SPU's?
I haven't read that this has really been an issue or that devs are really taking advantage of it. Would this have been something that MS could've just gotten away with 1 instead of 3?
My understanding is the advantage of the extra registers allows better performance per cycle.
dot product instructions are evil, real men don't use them, they use SOA + madds
Yes. That is because IBM put the whole documentation of the Cell in the public domain. That certainly helps!
Actually, I was looking for more information on the SPUs just now, and found that they have an additional 128x128bit register, called special purpose register. Here's IBM's full documentation of the SPUs specifically.
http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/76CA6C7304210F3987257060006F2C44/$file/SPU_ISA_v1.2_27Jan2007_pub.pdf
Better everything, in theory! All computing is shifting data around in the form of numbers and doing sums on it. The more numbers you can crunch, the more stuff you can work out. There's a complication in the movement of data too though, and if your algorithm uses lots of data that can't be crunched efficiently, the ability to do lots of maths is no good. However, as understanding improves, more and more functions are being mapped onto fast vector processors, such that eventually pretty much all areas should benefit." For vector based computations the PS3 outdoes the 360 by an order of magnitude"
What would be the primary benefit of this? Better physics, particle effects, etc?
The 3X VMX128 units exist to provide some level of vector processing for XENON’s cores. They are add on units & must share existing resources with each core – 32KB L1 & 1MB L2 cache shared between the cores. Each core consists of 2x VMX-128 register sets to support both threads on each core. What isn’t widely advertised is that each core contains only 1 execution unit & both threads therefore have to share this 1 execution unit.
As for output the theoretical peak performance of an Intel 3GHZ P4 using SSE instructions is 6GFLOPS.
This provides a ball park figure for the Xenons VMX units considering I was unable to uncover exact figures.
Each SPE on CELL is a dedicated high speed vector processor, they are not add-on units & they share no resources. They each have 256K of LS available bringing their combined total to 1.792MB (7X 256k).
Each SPE achieves around 25GFLOPS, consider the fact that there are 6x SPEs & its no surprise MS ignore the SPEs when discussing the vector processing abilities of their 3x addon VMX units. For vector based computations the PS3 outdoes the 360 by an order of magnitude
Courtesy of Cell Architecture Explained & Ebony’s breakdown of PS3 architecture.
OK, dumb question then and thanks BTW for breaking it down to something I can easily process.
What is the possibility of something like one PPU and 6 VMX128s designed to function independently ala SPEs?
VMXs are good, although not as good as SPEs, and are according to some articles great for 3D graphics acceleration and physics. What is to prevent IBM (or others) from make a quad core Power chip set with additional VMXs, say 8, 10, etc?
I'm going to slow you down for a minute BadTB, and ask you rather, why the high interest in these VMX units? I think you might be perceiving them to be something more than they are.
I'm going to slow you down for a minute BadTB, and ask you rather, why the high interest in these VMX units? I think you might be perceiving them to be something more than they are.
The primary reason being there's nothing to discuss! There's no info out there. Devs aren't talking about the hardware, in contrast to PS3 devs who give us things to chew on.I am also very interested on other parts of the X360 architecture such as Memexport, the EDRAM implementation and Xenos that has had comparatively less discussion on these boards.
Why not? There have been many interesting topics about how Cell's SPUs are being used for many things that people initially thought weren't practical. It would be nice to hear if the XCPU's VMX units are being (or can be) used for similar types of things, and to what degree they can use similar code/algorithms as the SPUs. At least, this is how I interpreted BadTB25's question.
There really hasn't been a lot of useful talk about the XCPU, so I commend BadTB25 for trying to initiate some. Especially since MS/IBM obviously felt that a butt load of floating point power was necessary for these consoles. How is it (or can it be) used?