Return of Cell for RayTracing? *split*

That's true, but that doesn't prove it'll never be the best move in future.

True, but the difficulty hasn't been in designing or producing these in-between architectures. It's been in finding enough workloads that don't map well enough to one of the existing processing models to justify doing something in-between instead of just building out from one of those models.
 
Yep. Raytracing may be one such workload. ;) In the PowerVR comments on such matters said GPU SIMD was a poor fit for raytracing. It's certainly the case that rays can't be trusted to be spatially coherent to fit nice little quads of pixels to shade and the like.
 
Yes. The concept was small cores, lots of 'em, and heterogeneous. The reason useful stuff was stripped out for Cell 1 was to make them small enough to get more on a die. At smaller lithographies, more can be invested per core to make it better based on how people actually need to use it, while still providing a huge CPU core count of 60+ on a die.
IBM offered a more mainstream SMP multicore solution when working with Sony and Toshiba. The SPE and its incompatible ISA was an architecture made by Toshiba, which was the designer which did not have the experience or desire to implement those things. Further, the memory addressing model and ISA branch hints staked a strong position on the LS and conditional speculation.

Cell's final architecture was a compromise made by Sony for the two partners. Possibly, one reason for the compromise was to keep IBM on for the circuit and physical implementation, given the high clocks and leading-edge process work.

Intel can't because 80x86 is heavily laden with legacy requirements. Nobody researching quantum computer applications is working with hardware anywhere near as slow [in terms of clock frequencies] as anything Intel sell commercially. :nope: Cooling remains a challenge, but not impossibly so. :nope:

I may need a reference to what speeds are being discussed. As Intel even had "Terahertz" transistors back in those days, there's nothing special about VLSI transistors reaching the hundreds of GHz in isolation or in simple circuits. The very small number of devices in the largest quantum computer would be one way to have high speeds, although the examples I'm aware of have lifespans of microseconds before the processing state is lost, and the setup period to get to the next one is not shorter.

Ah the talk of high clock speed reminds me of the good old days (1992+) of Digital Equipment Corporation's (DEC) Alpha CPU using RISC (https://www.extremetech.com/computing/73096-64bit-cpus-alpha-sparc-mips-and-power and https://en.wikipedia.org/wiki/DEC_Alpha ). The king of high clock speed that outside of special cases failed to deliver comparable performance to much slower Intel architected CISC CPUs. IE - it was really good at very specific things but rubbish as a general purpose CPU.
Aside from some thorny areas such as the weakest memory model of any of the major architectures and an early lack of byte manipulation instructions, I've not seen complaints of Alpha's general-purpose performance. It was a rather straightforward architecture that logged good results in SPEC integer and floating point. If the software was compiled for it, the CPU had the memory subsystem and OoO engine to work through it.
If running the FX!32 translation software of native x86 binaries, it might be half as fast at the start.
Where it stumbled was the economic realities of having a low-volume product that needed a lot of physical optimization, plus corporate shenanigans and an unceremonious killing after being acquired.
 
Can anyone name me IBM's moniker for their Netburst-like high clock processor program that started around 2000 and sorta led to the Cell/Xenon PPE? Some acronym like GITS or GETS? GIgahertz.....[something][something]?
 
I personally view Cell as somewhat of a precurser to modern GPUs. GPUs nowadays are a lot more flexible than people give them credit for. Xeon Phi was mentioned as a multicore CPU, but it's pretty much identical to GPUs from AMD or Nvidia, except that it supports x86. Of course, "supports x86" is pretty misleading, since actually using x86 on phi is quite a slow path, somewhere around an order of magnitude slower than the "normal" codepath. It's basically equivalent to the pathological worst case for SIMD divergence.

Anyway, modern discrete GPUs can directly access CPU side memory (this goes through the page fault mechanism), perform memory allocations and deallocations, and schedule kernels and draw calls to be run, all through the shader/cuda/opencl/API-of-the-month cores. In theory, you could execute an entire OS entirely from within the GPU, using the CPU as a simple bridge to I/O, though I don't think there's much interest in actually doing this. A lot of work rewriting pretty much everything for a massively parallel environment, no public documentation for the low level inner workings, and why do you want the OS running on the GPU anyway?

Currently, APIs (with the exception of Cuda on Nvidia and the OpenCL/HIP/No-Windows-Port on AMD) are severely lacking. Vulkan and DirectX 12 are steps in the right direction, but they're both suffering from a bad case of lowest common denominator. They're also suffering from their HLSL/GLSL legacy, which is missing a lot of important features of C++ (as seen in Cuda on Nvidia or HCC on AMD). It's now possible to write your own compiler targeting Vulkan and DirectX 12 flavors of LLVM (DXIL and SPIR-V are both dialects of LLVM IR), and there are several projects doing just this (though Cuda is the big target with Vulkan being a "once the Cuda backend is mature" target), but until a major industry player gets on this, we're probably stuck with the old shading languages. Or Cuda.

I do a lot of GPU compute programming, so I do know the architecture about as well as it's possible from public documents.
 
Back
Top