DaveBaumann said:
Panajev2001a said:
Yes, and the ideology of the Cell and the associated funding has to take into account that its sole application is not PS3.
Does this really make a big difference though ?
CELL was designed to be modular, scalable and fast at multimedia workloads and network ready.
These are all areas in which the PlayStation 3 R&D team is interested: modularity and scalability is important as with advances in Semiconductor technology they can afford to pack more processing constructs ( more APUs, etc... ).
Well, when designing a stand alone console what the importance of having a CPU that’s fundamental structure is being network ready? Is multimedia functionality as important as its 3D processing abilities?
This point here is basically going back to the more dedicated vs generalised units discussion.
I see your point Dave and that is an interesting one.
I would not think of CELL as a traditional general purpose approach: if you ran SPEC on it it would do decently, but it would likely show single thread efficiency to be low and the processor would not be the best thing per cycle running Office suites, etc...
Well, of course if the benchmark was to run tons of instances of the same legacy application... well, things would be different: you always play to the strenghts of parallel architectures when you move in the multi-tasking realm.
When I mention multi-media as being the focus, I do include 3D processing in it ( games applications and what not ).
3D graphics on an architecture like CELL would fly quite fast as what is provided to you is geared towards the same needs 3D Graphics chips designers face.
Its multi-media functionality could be said it is a by-product of the strong 3D Processing capabilities the architecture shows ( in the patent's implementations ) or we could make the inverse case: the good 3D Processing capabilities are a by-product of the strong multi-media functionality the architecture shows.
I see the flexibility comes at a cost in the sense that I suspect the ATI VPUs to come with more silicon budget spent on some HW tricks and very optimized to the task at hand.
The EE's VUs peak at less than what NV2A's Vertex Shaders do, but still some developers do appreciate the flexibility the EE's VUs provided.
It is a trade-off: CELL and all the other parallel architectures understood that focusing on the needs of multi-media/very vectorizable applications was the best choice towards obtaining more bang for your transistor buck and get in those applications ( which are the ones that really neeed power ) the best performance compared to traditional General Purpose CPU approach.
In the same way VPUs focus on an even smaller sub-set of applications ( basically implementing the classical SGI pipeline faster and faster and faster yet ) and might very well peak at a higher performance in them.
I said... if they all release at approximately the same times they will be comparable with certain architectures doing one thing better and other doing another one better.
I am sure that the extra flexibility ( the APUs in themselves are still more like generalized DSPs than General Purpose CPUs ) of CELL will be put to good use... physics, A.I., different Rendering algorithms ( not only the SGI approach ), etc...
Like people are seeign on the PC front, the applications that require performance are mostly games and other multi-media applications ( video encoding and decoding, music encoding and decoding, Image Processing, etc... ) and if you had to think about an architecture that would be best at running them you would go towards parallel processors which is what GPUs are also becoming.
Call it CELL, call it IPF, call it POWER5, call it NV3X, call it R3XX, call it R4XX or NV4X... the idea is similar.
High bandwidth, vast Single Precision Floating Point processing power, very high exposed parallelism... the same concepts are being used now in VPUs as they are in these highly parallel micro-processor architectures.
Their efficiency at what we can call legacy appliation ( low ILP, low TLP [single threaded applications mostly]...full of conditional branches... applications that force to have strong scalar performance ) of these parallel architectures is LOW... some CPU designers might call it embarassingly low, but the point is that these days even that LOW is good enough, especially with the tendency of users to run several applications at the same time if they can.
These are flexible Vector Processors with strong multi-media related sources of inspiration, not general purpose processor as they have been intended so far.
They are more like DSPs that can also be used to do tasks that common CPUs do, but they are not oriented in optimizing for the same workload.