Without a software model that supports running out of a cache, similar to what G80 can do internally, that's rather unlikely.So with a good cache hierarchy there's no great need for a large register set.
With a standard software model, the claim there's no need for a large register set is dubious. A register set on such a core would have a 0-cycle use penalty.
Going by x86, caches the best that can be hoped for at present is 3 or 4 cycles that must pass prior to an operand being available. With fine-grained threading, this can be somewhat hidden, but the penalty cannot be non-zero.
Unless a cache can magically match that, there will always be a need for register set that doesn't spill over all the time.
From the point of view of optimizing compilers, the small architectural register pool severely limits optimization options, and in-orders desperately need software help to get full utilization.
That would allow applications to transparently walk into a performance minefield. Performance will magically improve and degrade depending on where the thread is routed. Any program that existed prior to these minicores will stutter or zoom by, depending on whether there is a stronger IPC core that exists alongside the mini-cores.It doesn't have to set the world on fire in the sense you're thinking. We already have GPUs for the ultra-parallel workloads. There's no need to have GLU-like or SPE-like cores in a typical server or desktop. What we do need is CPUs that can work on many different general-purpose tasks simultaneously. For a server it can run different processes, for a game engine it can run every component on one or several threads while keeping compatibility, and for something like raytracing all threads could together be processing rays.
If there is a wide gap in performance between cores that are identical to the software, it is also likely that a fair amount of legacy systems code and a good number of other applications will probably break.
There's no way for a system to really know if that's the case, not with software that isn't made to be aware. Such a chip would be potentially unsafe for any multithreaded program made before the chip's release, and single-threaded programs would be highly vulnerable to performance upsets.
A conservative OS might just decide not to route anything to the minicores.
If the chip is only made of minicores, it is likely that the x86 variant would do worse than a CELL made only of SPEs. Unlike x86, the SPE's instruction set hasn't made the job of getting usable performance so difficult.
I'm unconvinced that this is true for x86. By not being differentiated from other cores, they ensnare old programs into situations where they falter.You can still run multiple processes, which is very important for the server market, which as I mentioned before is an important driver for the CPU market. For other legacy software it's no worse than mini-cores with a new ISA.
And because of the minicores' complete inferiority to any other architecture, you're going to need a huge number of them.Yes but because they are smaller you can have more. Hence the density of execution units is still higher.
I wouldn't be surprised if a core with as many ISA limitations as x86 would put forth half or less the per-clock performance of a core like an SPE or SPARC, something that will likely seriously impact the amount of usable throughput.
If you look at where throughput computing chips are targeted, you would see that the speed-up would be wildly inconsistent. Given the clunky nature of the ISA, the gains would be noticeably less than they could be.True, but that's a fair price for x86 compatibility. Specialized hardware would do great for one specific workload but poorly for another and is not even an option for a lot of other. If I look at GPGPU applications, even when run on the latest hardware, I see some applications with fantastic speedups and others where clearly the GPU is not efficient at all. That's not what we need for the CPU market. We need a fairly consistent speedup along the whole line, with minimal software effort, and consumer prices. I believe x86 mini-cores can offer that. Applications that still run faster on a GPU, can keep running on a GPU.
The mini-cores on a new ISA would at least be usable within the set of programs made for them. Most software will need to be completely redeveloped for such a shift in paradigm, so the pain of shifting to another ISA isn't as great.That's not an argument. Mini-cores with a new ISA wouldn't be compatible with anything at all. x86 mini-core would run legacy code (with or without SSE), and it would take little effort to make recent and future software make good use of them. With a new ISA you're starting from scratch. Besides, every five years there's a new ISA that would be more optimal, but it's really no option to rewrite all software every five years.
x86 is so klunky that it isn't just sub-optimal, it's performance poison.
Since the minicores are going to need special treatment anyway, why not go for a bigger change? It's better than getting stuck with x86 for another 20 years.
What do you think the addition of OoO, aggressive speculation, and superscalar issue were for? Apple didn't go Intel because of the Pentium classic.So it's better to stick to x86 and hide its imperfections. That's working fine so far; Intel is doing such a great job that even Apple goes x86.
Fast enough on a design that extracts more ILP than any desktop processor before it, one that isn't even multithreaded.The only reason this ISA switch works is because Apple offers most software itself and because the emulation is fast enough.
Mini-cores with a new ISA would have to be able to run x86 threads efficiently, on all operating systems, before they are widely accepted. But then it's more interesting to just make them x86 from the start and work around the limitations.
The ways for working around the limitations are either hardware or software. You've thrown out most of the hardware ones, and x86 keeps out a vast number of the software ones.
I think a chip with a bunch of cores with a highly extended or new ISA or slightly more complex (but still enhanced) x86 cores would be better than a bunch of minicores that would almost be actively trying to sabotage the code that's running on them.
Last edited by a moderator: