Hmm... I don't think the design goal is for Cell to become/replace a GPU. That wouldn't make sense. Kutaragi said so himself. It was meant to address the memory wall, and cover a wider range of tasks. Thus helping the GPU in areas where performance may fall short (as requirements increase), or implement entirely new graphics concepts (extremely tight integration between graphics and other OS functions).
ive seen no concrete evidence so its kind of confusing
early rumors of the ps3 indicate and dedicated chip for graphics, which may or may not have been a modified cell
however, glancing over my notes shows that the earliest reports of cell that contained specs indicated 72 cores on cell, if that were the case there wouldnt need to be a gpu
seems to me that cell was designed to do everything
That topic isn't just about vector instructions, but overall design philosophy.
The focus on vectors in my earlier comment is that there are assumptions made in their functionality that do not apply universally, so they are not as generic as scalar ops can be.
Other parts of the argument don't necessarily concern themselves with vector versus scalar. The discussion on texturing includes discussion on the layout of the cache, with regards to how it can hinder full scatter/gather throughput in the form that Larrabee most likely implemented it.
This actually has something to do with trying to shoehorn vector capability onto a memory pipeline that is has as its basis a scalar design. I'll expound on this more in a bit.
This is actually a criticism leveled at SIMD extensions in general, and x86 in particular (because it tended to be the worst offender).
The short vectors, the inflexible memory operations, the clunky permute capability are "vector" extensions for a design that emphasizes low-latency scalar performance.
Scatter/gather is not simple to perform at speed on the very same memory ports that the scalar ops use, and it is not simple to make it a first-class operation when there are some pretty hefty requirements imposed by the rules of operating on the scalar side (coherence, consistency, atomicity etc.) Very frequently, the vector operations tend to be more lax, but this also means they are not as generic as the scalar side.
Intel has had ~15 years to do it, but x86 is not a vector ISA. Its extensions were SIMD on the cheap, and they were roundly criticized for their lack of flexibility and restrictions in their use.
Each iteration has improved certain aspects, as transistor budgets expanded, but there are some strong constraints imposed by the scalar side.
Agreed too, hence why I believe they miss a possibility, with larrabee they acknowledged that "data parrallel" processing is becoming relevant to their business, but they took the wrong route again. They were too obsessed about X86, may be they were showing cold feet after Itanium experience?
This is also where there is a difference of opinion.
There is the position that there can be a core that can do everything as well as a throughput-oriented design while still being focused on latency-sensitive scalar performance, all while not blowing up the transistor and power budget.
This goes beyond a vector/scalar and concerns the overarching question of generality and specialization.
Agreed again. But to be clear again my pov is that Intel needed to push of a Cell than a Larrabee, not going for graphic but data parallel processing / numbers crushing.
The market below is far more power-conscious. It would take even longer to compete there. There are integrated designs that don't have unified shaders nor full programmability because the power usage was not acceptable.
When I say tackling the market by the low end, I meant low end laptop/desktop (not the mobile market), really 3d perfs are irrelevant, I'm not sure some vector processor cores would be at that much of a disadvantage vs tiny GPUs ceteris paribus. On the other side Intel could have been shipping "one chip" system which would have been a clear competitive advantage. Again I'm speaking of Intel doing its Cell but a more convenient Cell.
Even for the high end, the chip was massive, and no real numbers showed up to indicate it would be competitive at the time of release.
Part of the problem is that Intel needed a compelling advantage and a unified message. The design was too delayed to be compelling, and Intel's message was never coherent (and there were signs that various divisions were not trying too hard to help it).
Well that's about Larrabee, not really the point I tried to make, Intel could have done better choice.
In the low end market Vector processors (or as I spoke of narrow vector processors, why not throughput oriented cores). Proper vector /throughput oriented cores could have been a huge competitive advantage and I don't think Intel would have failed to deliver, by not aiming to "plain 3D rendering" software would have been less of a problem. What is accelerated now by GPUs using cuda/OpenCL/Dx Compute, would have been written in a standard language like C. Clearly I don't believe that Intel cares much for the graphic market they want to prevent GPUs to bite in their CPU revenue, clearly going straight ahead to GPUs prove a wrong strategy. Intel had a huge opportunity it could some years that your average PC/laptop cheap without proper GPU, with one chip. More most consumers doesn't need 4 cores for instance they would have had better return from 2 cores + some throughput oriented cores. Now it's a bit late still "GPU" are completely irrelevant to most of the PC market, ie I'm not sure people that by i3 "something" would care if they were not proper GPU in the computer, as long as it fit their needs (which doesn't imply 3D games).
Hmm... I don't think the design goal is for Cell to become/replace a GPU. That wouldn't make sense. Kutaragi said so himself. It was meant to address the memory wall, and cover a wider range of tasks. Thus helping the GPU in areas where performance may fall short (as requirements increase), or implement entirely new graphics concepts (extremely tight integration between graphics and other OS functions).
Not really my point. Intel is huge and people (most of the market actually) don't care for 3D performance. A Cell like model may have made a lot more sense for the need of their costumer and could have been a better defense against the (relative) trend of GPGPU. More importantly I don't think Intel would have failed to reach such goals. Developers (for games or not) may not have passed to use the cores.
To compare modern CPU and GPU with Cell, it would also be interesting to see what Cell elements IBM want in their next CPU.
Also, software and system architecture are key for a CPU. With an established software base that manages bandwidth carefully/explicitly, it would be interesting to see how an optical interconnect can benefit a network of Cells/cores approach. It is not so interesting to try to shoehorn Cell into a PC-like CPU-GPU setup, with a traditional CPU-GPU software library.
Not what I meant again by fighting Nvidia/ATI with flash games. Social gaming (and activities) are raising, there is a huge market, those people don't invest in potent GPU but editors may be willing to push stuff that looks a bit better, keeping in mind Intel impressive market shares, they could have start to use this generic vector/throughput oriented cores to push more sophisticated games keeping in mind too that my wife (just an example, I could have said her dad would play farmville) doesn't have the same requirement as me/you. Such cores would have advantage too, you would code in pretty standard language + some optimized library, ISA is "almost" set in stone investments done in coding are" perennial".
Keep in mind too that I start to diverge from (fair) remark done to Nick about why "such chips" don't exist. Intel best way to fight GPUs could have been, you don't need GPU as "core" gamers are such a tiny part of the spectrum of their costumers.
Agreed again. But to be clear again my pov is that Intel needed to push of a Cell than a Larrabee, not going for graphic but data parallel processing / numbers crushing.
There are aspects of Larrabee that are in the same vein as Cell. The most obvious graphics-specific parts of Larrabee were isolated in the TMU blocks. Aside from some number of graphics-related instructions for the x86 cores themselves, Larrabee was a data-parallel and throughput-oriented design.
It had a more straightforward instruction cache and relied on the traditional coherent memory space model.
Cell has been relegated to a very narrow niche, so I'm not certain copying it more than was absolutely necessary would have been a good idea.
When I say tackling the market by the low end, I meant low end laptop/desktop (not the mobile market), really 3d perfs are irrelevant, I'm not sure some vector processor cores would be at that much of a disadvantage vs tiny GPUs ceteris paribus.
Larrabee was a huge chip, cost-wise it would have been a non-starter. Outside of graphics, Larrabee would have had problems establishing itself. It would have been unable to run most software made since the introduction of the Pentium, and even if it could it would have run all existing software terribly.
3D graphics was one of the few areas where people would have paid money for a non-standard architecture, since they already do so for GPUs.
Intel did go through a phase where it was going to market Larrabee for the mainstream graphics market. Then again, it went from max performance to mainstream to max to power efficient to cancelled through its years of delays.
There are aspects of Larrabee that are in the same vein as Cell. The most obvious graphics-specific parts of Larrabee were isolated in the TMU blocks. Aside from some number of graphics-related instructions for the x86 cores themselves, Larrabee was a data-parallel and throughput-oriented design.
It had a more straightforward instruction cache and relied on the traditional coherent memory space model.
Cell or SPUs took pretty extreme design choices. Intel could have pulled out something more convenient to use (while sacrificing some perf per mm2). From a non techincal POV Intel and IBM doesn't have the same markets. Intel CPUs are used for plenty of things I believe quiet of these markets would have benefited from the extra juice provided by some vector processors (not Larrabee 1 cores). There is room for balancing too, Intel may not have chosen a 1/8 ratio between CPUs and VPUs. But from my POV the main thing is still volume, if Intel say along with nehalem release announced that every one of their processors were to include some VPU cores the impact would have been on another scale than IBM launching the Cell. Software developers would have not passed on it.
Larrabee was a huge chip, cost-wise it would have been a non-starter. Outside of graphics, Larrabee would have had problems establishing itself. It would have been unable to run most software made since the introduction of the Pentium, and even if it could it would have run all existing software terribly.
I don't agree as I believe Intel had a shot to include some VPU in its CPUs, they could have passed on "fusion" chips and be on the market earlier. My belief is that it make more sense for most consumers to by dual backed by some vectors processors than quad cores, even for gamers Intel market share would have ensure that devs would have used these new resources.
Intel did go through a phase where it was going to market Larrabee for the mainstream graphics market. Then again, it went from max performance to mainstream to max to power efficient to cancelled through its years of delays.
I agree with that, that's why it could have been an option to design vector processors from scratch.
Cell or SPUs took pretty extreme design choices. Intel could have pulled out something more convenient to use (while sacrificing some perf per mm2).
Well I would not give to the Cell the paternity of ring bus.
Anyway it's a bit late now, on the X86 market on insulation they could still do the move but now the mobility market is big, language like OpenCL are available on many systems, it makes sense for developers to consider it.
A better CPU would have helped that sure. Actually given usage done of Altivex unit (you gave info recently) I would go for an pretty narrow OoO PPC core with SMT without SIMD.
Not saying this is all they need to improve Cell. In its current form, it's difficult to go far.
Actually I'm not sure the whole concept of SPUs is valid, Spus does nothing to hide latency actually it relies on low latency LS access and double buffering / coders taking care to properly feed them. It offers impressive perfs per Watts and mm² but if adoption of the tech is any clue it's to much of a bother.
To bring it further it would need a completely shift in design philosophy.
If I were trying to picture what Intel could have push instead of larrabbe (and not for the same use by the way), I think of narrow vector processors (4 wide like SPUs handlling DP as the "Cell v.2") supporting a cheap form of multihtreading (barrel, or round robin), may be supporting so cheap form of OoO, good prefetching capability ,offering a coherent memory space, weid cache hierarchy L0 data cache, L1 cI$ and D$, no "L2", a dedicated part of the L3 (like in INtel SnB). Something dedicated to short vectors and data manipulation able on its own to hide wuiet some latency.