I think it got pointed out by cleverer people (than me) that the STI allaince kind of drove them-selves in a dead end when it comes to BC if the Broadband engine were to be modified.A one-legged driver wants to slit his wrists at the thought of driving manual, while the two-legged driver is asking what the problem is. (just kidding)
If the cancelled Cell2 from IBM was supposed to have direct memory instructions in addition to the local store DMA, would it have solved everyone's problem or is there more?
Take with a grain of salt (or search all the existing thread for first hand information with matching analysis and well constructed pov) but I tend to remember that the code is kind of bound to the latency to access the LS, if you change that, it's not good.
so if you kind of move away from the LS+DMA approach you pretty jeopardize existing code. Keeping existing code functioning could require quiet some legacy hardware and a pretty constrained design.
On top of all the other architectures short coming that has been pointed through since it has been released, I think it is a quiet bothering one, there is no real latency hiding mechanism, it's more the code is build on the low latency LS, if you change that...
I would also point out to nAo's posts about why STI vouched for only 4 wide SPU. He thought it is weird that they went with something that narrow (you can't get much narrower for vector processing).
Looking at Intel architecture and the jump from Westmere to Sandy bridge, I wonder my-self
so not to be put on the same level as nAo or other POV that you may find if you search the forum numerous threads on the matter (sometime discussed in relation to larrabee fyi)
That would have granted a neat gain in throughput.
Overall I believe that the design by committee killed the chip, it is not that clear to me if the chip had a single clear purpose in the same time it came with a pretty constraining memory model.
I wish they would have pulled a "wannabee Larrabee", that were not an option, too many different views about what the chip would/should be /use (wrt to the tech), IBM was adamant about the POWERPC being used somehow, others had their views too and so on.
Edit
More geek ranting... I would also think that Sony spread it-self a tad thin by working on both the Broadband engine and the cancelled GPU.
That more geeky sweet dream than anything else but I wished they would have vouched clearly for a software based approach and worked on a matching solution, most likely using two of the resulting chip head to head as in the Cell blades.
I don't know what was achievable I can only dream about it. Clearly the (silicon) budget for something like Larrabee was on there but I wonder (so putting aside all the constrains due to the allance of different parties with different pov) if they could have designed what I called earlier a "wannabee larrabee".
As an example of the worthless questions that plague my mind, could the Cell have been based on multi-threaded (4 way) VPUs with a width matching avx units, L1 instruction and data cache, no standard CPU at all, and a share pool of scratch pad memory? IBM did not use EDRAM in it CPU by that time (on their 90nm lithography) but could have they? (so more scratchpad memory).
The thing would have operate at pretty high clock speed.For some reason my mind insists in telling me that within Cell silicon budget (or close) there could have been 6 such VPUs and an unknown amount of scratch pad memory (when the internal broken guesstimator failed to deliver it is really a bad sign... and that imagination has gone out of control lol ).
I'm pretty proud of that one, it is really worthlessness at its peak... with enough if one could put Paris into a bottle...
Sorry but the name of thread called for that kind of crap (good one Alstrong).
Last edited by a moderator: