Assymetric multicore CPUs

A fast response to the opening post before I read the rest (which will probably happen tomorrow or so):

It's extremely expensive to develop a new processor, while it's relatively easy to put more existing ones on a single die and interconnect them. And you would really have to develop a number of different processors to fill all the needed performance areas.

x86 processors have evolved by cramming in many specialized sub-processors as execution units in that single, monolitic core. From an engineering point of view, it's much easier and cheaper (even while being a waste of transistors and die area) to add more monolithic cores of which the specialized parts can be used as required, and it's also a lot easier to develop when you have only one instruction architecture to take into account.

From that perspective, it makes more sense to offer hardware support for running virtual machines on the otherwise unused cores and have them all run the same programs, than making specialized units.

So, expect a hybrid in the medium future: something like 4 x86 cores, with some vector (stream) and other specialized units, an I/O, DMA and scheduling processor, and a GPU.
 
Asymmetric cores aren't useful for a "PC" scenario because what you do on a PC involves multitasking. There is no fixed workload mix so every application you'd eventually fire up would not be designed to be aware of the other stuff that's happening. Every application wants "the processor" and if you have just one of those all your asymmetric little helpers won't do you any good.

To become generally useful they must be able to execute the same code from the same memory space. The OS must be able to schedule a process on any core. IOW they should be symmetric.

The only thing that can break out of this is new software specifically written to take advantage of extra hardware that doesn't yet exist.
 
Once you get to two active applications, how many more are really productive? Sure, people might have several applications open, but isn't that more of a memory resources issue? How often do a web browser, Excel, Word, and Photoshop actually need the CPU at the same time? I completely understand two symmetric cores, especially when the OS can make use of one, to satisfy the cases where you have something running in the background and are actively using another application. But how often does the desktop user have multiple actively computing applications in the background while they are actively using yet another? If open applications cause that scenario, then that just seems like bad code/OS management.

I always assumed that past two cores, the problem with users seriously multitasking in the typical way (i.e., a lot of stuff "open") was a cache, and in general, a memory issue. Seems like multiapplication symmetric caches could/should solve that without needing fully blown multiple symmetric cores.

For the guy that actually wants to capture video, run batch photoshop filters, and play a game at the same time, there are multichip solutions that can address that need. I don't see why the typical desktop chip should head that direction.
 
Well, since one application can actively use two or more cores, there's definitely some benefit of multi-core CPUs for normal users. And there's definitely more usages of multi-core CPU than running games and video encoding programs at the same time. For example, virus scanning, file indexing, etc. can run along with more demanding programs in multi-core environment.

Another important issue is, making single thread performance better is getting harder and harder, especially considering the power usages. The magical scaling of CMOS is almost gone. Generally, if you have twice the power budget, you can at best make single thread performance 25% better. However, double the core number can generally bring 50% or better performance gain with well multi-threaded applications. It's not hard to see why everyone is going multi-core.

It can be seen in supercomputers too. A few decades ago there are still a few "single CPU" supercomputers. However, there's no such thing now. No one is trying to make a super-fast single CPU supercomputer, because it's impossible. It's much easier to pile thousands of the same cheap CPU together to make a powerful supercomputer. If desktop applications can be multi-threaded as supercomputing programs (it's hard but not impossible), the same trend can happen quickly on the desktop world.
 
I think there would have to be some balance to the threading approach. A somewhat multithreaded desktop app would in most cases perform better than an app that is spread out over dozens or hundreds of threads, even with a system designed to run that many threads.

The unless the problem set is equally huge, the number of threads should be within reason.
 
Well, since one application can actively use two or more cores, there's definitely some benefit of multi-core CPUs for normal users. And there's definitely more usages of multi-core CPU than running games and video encoding programs at the same time. For example, virus scanning, file indexing, etc. can run along with more demanding programs in multi-core environment.
Seems to apply to my generalized "two cores" description. After two, there will be significantly fewer cases of apps needing CPU power at the same time.

Another important issue is, making single thread performance better is getting harder and harder, especially considering the power usages. The magical scaling of CMOS is almost gone. Generally, if you have twice the power budget, you can at best make single thread performance 25% better. However, double the core number can generally bring 50% or better performance gain with well multi-threaded applications. It's not hard to see why everyone is going multi-core.
Seems to be an argument for multi core... not specifically symmetric multicore. I would immediately agree that symmetric is the quickest, easiest thing to do from an engineering and coding viewpoint, but my questions are really is it the most efficient use of transitor space, and more importantly is that the best long term direction to head for processor evolution. Are the multicore CPU's were's seeing on the desktop now a "quick and easy" short term crutch solution, with something more like I described on the horizon a few years out and extending much further down the road, or will we see multicore chips go from 4 to 8, 32, 128 or whatever for the next decade before anyone takes a serious look at a different approach?
 
Well, if you want the best performance and don't mind not being able to run legacy apps, many specialized cores is the way to go.

Then again, that's generally not what people want. Look at IBM: their current top of the bill stuff is still mostly used to run... 360 apps. It's probably idle more than 99.999999% of the time.

Then again, your average PC is idle about 99.9% of the time as well.

There are only a few specific fields in which computing power is really at a premium so much, that the users are willing to write their own, optimized programs for it. In all other cases, it's all about market penetration. The percentage that is able to buy and run your app.

Sure, it's a waste of transistors. But they sell.

CPUs have become big by being able to do everything average.
 
Asymmetric multi-core is attractive from technical points, but there are some practical problems.
The foremost is probably a mindset thing. PC is viewed as a general purpose machine. It can do everything average (stealing from DiGuru :) ). A special designed core in an asymmetric multi-core means you need to have a special purpose in mind. For example, CELL BE is designed with multi-media and maybe scientific usages (say, bioinformatics) in mind. Since most PC are still sold to office users, it's probably not the most cost effective way for a PC CPU vendor.

I think the current "CPU+GPU" trend is not a bad one. Office users don't need high multi-media performance, so they don't buy a fast GPU, and use on-board display instead. A gamer or multi-media application user will buy a fast GPU to accelerate the applications s/he runs. Furthermore, GPUs tend to have big bandwidth thanks to the on-board memory. If you use a asymmetric multi-core, feeding enough bandwidth to the CPU can be a serious problem.

In the long run, x86 CPU may go the asymmetric route (one can't really predict what will happen in computer world). If there are some serious progress in the automatic parallelization compiler technologies, it may be beneficial to build a CPU with a few complex cores and a lot of simple cores. You can run legacy applications on the complex cores, and get more performance by running on the simple cores with optimized applications.
 
It seems AMD wants to go follows the assymetric way.

Not in a cell fashion, they want design real dedicate cores for a given task (dsp like if I understand properly) who can be mixed together depending of the costumer needs.

I think more ans more that MS will shift toward AMD/ATI for its next system.
From what i've read in this forum, 3dnow was better simd implementation than SSEx, MS wanted a version of windows to run on the 360, Ms already works with ATI.

I shold be surprised to heard rumor about a deal in ~end 2007.
 
It seems AMD wants to go follows the assymetric way.

Not in a cell fashion, they want design real dedicate cores for a given task (dsp like if I understand properly) who can be mixed together depending of the costumer needs.
AMD would probably go asymmetric before Intel (not that Intel won't do it eventually) because of their process disadvantage. Intel can afford to bloat up on x86 cores because they've always had at least a 6-month lead in process transitions.
Even on the same node, AMD's cache density has been measurably worse than Intel's, something symmetric x86 multicore would be more hurt by.

AMD needs to get more out of its more limited transistor budgets, and hope for two things:
1) Intel thinks the simpler process of adding more symmetric cores is a better bet
2) it turns out it isn't

From what i've read in this forum, 3dnow was better simd implementation than SSEx, MS wanted a version of windows to run on the 360, Ms already works with ATI.
From most indications in the last ~4 years, 3dnow is irrelevant. SSE is the replacement for x87 floating point in 64-bit Windows, and I don't think 3dnow hasn't been really touched by AMD since since K7. Most of the other work AMD has done architecturally has been devoted to SSE.
 
assymetric is a killer strategy in the mobile market, where fixed function acceleration blocks offer unbeatable power performance. It may work in the server market for some selected applications, like crytography, or signal analysis. Many companies have made a business out of selling FPGA's coupled to commodity x86 hardware for specialized functions.

But the desktop, I dunno. For 3D and video , yes, assymetry makes sense. But for other aspects, like cryptography, parity computation, audio, etc, it's questionable. While it seems like a great idea to burn transistors to offload SSL, or CRC/Parity, or firewall, etc the reality is that today's x86 CPUs are so friggen fast that these functions in consumer applications consume a tiny percentage of CPU. Most CPU power in PCs goes wasted, and no consumer desktop has even been "SSL bound" for example, with their computer seriously slowed down by encrypted web traffic.

Physics and graphics might be drivers for consumer desktop, but most of the things I can imagine assymmetric cores for beyond mobility and gaming 3D/video decoding/encoding, is server-specific apps.

And to me, it doesn't make sense to burn transistors on a fixed design there unless its an HPC project with tens of millions in revenue. Even in the SSL market, SSL accelerators have basically moved from custom silicon to commodity x86+linux. In bioinformatics, people have even tired of FPGAs because x86 performance caught up and blew them away on price/performance. Once you max out your FPGA farm for reconfigurable computing, it's far more expensive to replace them with better performing arrays than to just go out and buy a bunch of quad core commodity linux boxes.
 
3dilletante thanks for your enlighting response ;)

I agree with you democoder assymetric seems to be a good strategy for mobile where performance per watt has to be hight.

On the desktop side, I guess we have to take in account devs only intel seems to have been able to push proprietary acceleration format (sse).
If AMD include some dsp like units in its cpu it has to be transparent for the devs at least on the desktop markets. Would the cpu have to dispatch the work to the dsp on his own?
Is there a risk of splitting the market?
 
Back
Top