*Hiroshige's Goto Weekly overseas news*
The trend "ヘテロジニアスマルチコア" whose CPU is new
--------------------------------------------------------------------------------
- ヘテロジニアスマルチコア in trend
CPU technical trend of multiple core age
As for PDF editionthis
The Cell processor is the multiple core CPU which loads 9 CPU cores. Those where it is important even among them 8 you take charge of transaction of operation "SPE (Synergistic Processor Element)" with are the processor group which is called. As for SPE, it is possible to do the processing for the plural data in 1 order "SIMD (Single Instruction and Multiple Data)" with the processor of type, powerful efficiency is shown with processing of stream type.
There is the memory of 256KB which is called Local Store in each SPE. This is not L2 cash, is the memory which generally known scratchpad, programmer side can use explicitly freely. The memory space of Local Store has become the local memory space of SPE possession. The load of SPE/store order is executed in this local address space, when SPE, it accesses system memor, by way of DMA.
Program execution with SPE, with DMA transferring the data & the program in Local, Store becomes the shape which it does. Originally, the basic thought of Cell is called the program software Cell, it handles in the program piece and the object form which houses the data. Because of that, it is seen that in 256KB is settled the program is supposed from beginning.
With Local Store, you do not take coherent unlike cash. Because of that, with Cell it is not necessary always to maintain キャッシ…コヒーレンシ over the whole tip/chip. This, is the important element which decreases the overhead of the multiple core.
With Cell, as for each element Element Interconnect Bus (EIB) with it is connected with the ring model bus of the wide band which is called. SPE via EIB, accesses system memor and the like with DMA transfer. Being maximum, to issue 16 transfers with the on-the-fly it is possible DMA request.
- Equipping the special mode which protects the contents
Importance of the Cell processor, is not just the point that simply, it is loaded onto also PlayStation 3 with new CPU of the SONY group + IBM + Toshiba. Being very important with respect to architecture as for, Cell, new technology trend "ヘテロジニアス of CPU (Heterogeneous: Various mixture) it is the point which anticipates the multiple core ".
ヘテロジニアスマルチコア CPU is the multiple core CPU which the CPU core of the type which differs plural is loaded. This differs from the present leading multiple core CPU which the CPU core of the same architecture plural is loaded largely. And, Cell, in the general-purpose type CPU which aims for the computer market of mass, is only ヘテロジニアスマルチコア CPU under present conditions.
Though, naming, really, ヘテロジニアスマルチコア fixed is not something which sticks. There is no name which still becomes settled in multiple core CPU of the asymmetric type like Cell. But, because naming, ヘテロジニアスマルチコア which appears recently has shown essence just, here we would like to do to the thing which is called the hetero ジ near ス multiple core. By the way, if Cell ヘテロジニアスマルチコア, "Pentium 4 8xx of Intel (Smithfield: Smith field)"and it seems like Power4/5 of IBM, CPU which arranges the CPU core of the same type in symmetrical type" homogeneous (Homogeneous) means the multiple core ".
ヘテロジニアスマルチコア CPU, coming to here, started receiving attention suddenly. Cell appeared not only, the various researcher developers, produced to talk suddenly, concerning ヘテロジニアスマルチコア CPU. That it is typical we have assumed that Intel in the future actualizes among several generations, "メニイコア (Many-core) CPU" is. Many-core of Intel, the big scalar type CPU core of conventional type and, being simple, is seen that small (perhaps you focused to vectoring type) it becomes the typical ヘテロジニアスマルチコア type which combines the CPU core group. Intel it links the time when it starts talking concerning Many-core, with the time when one end of the dissertation of Cell in industry starts becoming clear. As a composition, it means that Intel chases Cell.
- Why being ヘテロジニアスマルチコア?
Why, coming to here, ヘテロジニアスマルチコア becoming the trend of CPU? You can think two reasons and directivity to that. If (1) ヘテロジニアスマルチコア while maintaining singles lead efficiency, it is possible to raise multithread operation efficiency substantially. (2) by the fact that the respective CPU core is optimized in use, with the homogeneous multiple core the high efficiency which cannot be actualized can be actualized. As for former on the route which Intel and the like thinks, latter is the technique which Cell takes.
As for Cell, you suppose that OS can be sent and the CPU core "PPE which was focused to control type processing (POWER Processor Element)" 1 and, it optimized "SPE (Synergistic Processor Element)" 8 you load mainly in data processing of stream type.
"Cell, has 1 control point processor and the plural data point processors. As for this combination, it is possible efficiently to be able to send the program very. Because the SPE engine group which is the data point processor is optimized as much as possible. The data point processor in order to be able to send OS, we did not design, to however, we specialized in being able to send application powerfully. As a result, in the data point processor, it was possible to make smaller efficiently, without designating efficiency as sacrifice. We specialized in application execution, the very efficient processor was made.
On the one hand, as for the control point processor in order OS send, コーディネイト to do program execution, you designed. コーディネイト doing the resource, by the fact that it allots work to the other data point processor, it can send the whole machine efficiently. Combination of the processor of these 2 types works to be good very ", that Jim Kahle of IBM which takes charge of Cell development (the gym kale) the person (IBM Fellow) you talk.
PPE Hyper-Threading like with multithread operation CPU, thread change is fast, it is designed to for the environment which has the frequent thread change like OS. SPE of one side, SIMD (Single Instruction and Multiple Data) is optimized in operation, supposes the stream data and either excessive cash class does not have. But we suppose that SPE thread change is heavy, continuing the processing of the small-numbered thread, does we do not suppose that OS can be sent.
In case of Cell, this way it was not to make the CPU core all uses correspond, it is by the fact that it differentiates to the core of 2 types which are optimized, while maintaining efficiency, it converted the CPU core to use thoroughly simply. As a result, the floating point arithmetic efficiency which does not have either the way, where with 90nm process the die/di 200 square mm (the semiconductor itself) it becomes possible, 256GFLOPS at the time of 4GHz to accumulate 9 cores, could be actualized. With the same 90nm process, in the same die/di, only 2 CPU cores which pass it can load Intel and AMD.
With Cell, it is not to make the single CPU core which chases two rabbits unreasonably, the CPU core which owes just one rabbit 2 types is prepared. Perhaps Cell it can modify the fact that the rabbit is obtained securely with that, without letting escape also two rabbits.
- The efficiency of CPU reaches the limits, the multiple core necessary
There is a circumstance that in the background where ヘテロジニアスマルチコア surfaces the efficiency improvement of CPU reached the limits.
So far CPU, in order to pull up efficiency with the single core, (1) improvement of operational frequency and, (2) IPC (the number of orders which can be executed in the instruction per cycle:1 cycle) focused to improvement. The pipeline was subdivided for improving the frequency, for the improving IPC, dynamically parallelism of order level (Instruction-Level Parallelism:ILP) the out-of-order type execution which is raised the various acceleration technologies which are annexed to that were introduced.
Even in the past it explained with this corner, but efficiency improvement with such technique had accompanied the cost, complication of CPU. As for present leading edge CPU, the scheduling control section and the like for ILP improvement occupies the enormous area. Because of that, efficiency improvement becomes non efficiency, die/di size (the area of the semiconductor itself) increases to 2 times, square root amount of the die/di which was increased (approximately 1.4 times) only performance rises. In other words, with CPU you call to every process generation 2 times, to it is the case that the efficiency improvement which parallels "Moore's law" is not obtained. As a result, as for up-to-date CPU performance /watt and the performance/die/di area deteriorated, non efficiency it became the thing. Intel calls this rule of thumb "law of ポラック".
Efficiency improvement of point CPU this way the reason which becomes non efficiency must pull up the scalar operational efficiency of the singles lead-lead, because there was the spell. In case of the instruction set like x86, many software property is held. In order and, to pull up the efficiency of existing application, with the singles lead-lead, it is necessary to increase the efficiency of scalar operation mainly.
But as for here several years as for the clock improvement of CPU it blunted, from increase of electric power consumption, it reached the point where the efficiency improvement whose efficiency is better is required. Then, CPU industry started facing to the improvement of multithread operation efficiency. If "thread level parallelism (TLP:Thread-Level Parallelism)" it improves with multiple core conversion, it reaches the point where it can increase CPU efficiency more efficiently than the so far.
But, there was a problem even here. Intel and AMD in order to maintain also singles lead efficiency, reusing the core of former single core CPU, actualized the multiple core. Because of that, performance per electric power consumption and the die/di area is not that much good as still. With present condition 90nm process, also multiple core conversion above 2way is difficult.
If But, solution of this problem being simple, makes the CPU core simple, you can evade. If complicated control mechanism is excluded, performance with approximately the directly small CPU core can be maintained. If of law of ポラック is thought conversely, efficiency in only 1/2 means not to fall with the die/di area of the CPU core as 1/4. If the simple CPU core because large number it can load, multithread operation efficiency directly becomes high. It is possible to make the multiple core CPU where the die/di & electrical efficiency are good simply.
How to think the multiple threading of Cell
As for PDF editionthis
- The efficiency of CPU is raised with ヘテロジニアスマルチコア
But there is a trade-off even in this technique. When (1) it makes the simple core, by any means scalar efficiency of the singles lead-lead falls. (2) it is difficult to make many control type central task and the simple core which can be processed in stream type processing and this both high speed of multimedia system in OS and the like.
It is the case that then the idea which comes out, was the multiple core of ヘテロジニアス type.
For example, if the small-sized CPU core which pursues efficiency to a large-sized CPU core and the simple structure which pursue singles lead efficiency is combined, while maintaining singles lead efficiency, it can actualize high parallel multithread operation. It seems that takes the method Many-core of Intel taking this method, combining 2 - 4 large-sized CPU cores and several dozen - 100 simple CPU cores. As for the program where singles lead efficiency is needed, by the fact that until recently it can send full function with the large-sized CPU core which is in extension of architecture, with until recently sort relatively high performance can be maintained high ILP. On the one hand, to multithread operation was converted new application to be allotted by the plural simple CPU core groups, high multithread operation efficiency can be enjoyed.
On the one hand, like Cell, the CPU core, there is also the approach which is carved in the control type CPU core and the data type CPU core. As explained already, with Cell, while maintaining by the fact that the respective CPU core is made to specialize, small, it achieves the high performance.
By the way, perhaps, PPE of Cell, from the word, POWER instruction set interchangeability, it receives the impression of the rich core which pursues singles lead efficiency, but, really so is not.
A certain Intel authorized personnel from before the ISSCC announcing has grasped "concerning Cell summary, it had researched even inside the company. The PowerPC core of Cell thinks that it is not the CPU core which had that much high singles lead efficiency. It is different from Many-core which we think, essentially ", you say.
PPE, is not about SPE, but relatively it is the simple CPU core. With CPU of in order type execution, you do not take either the technique of レジスタリネーミング and the like. It seems like PowerPC 970, when you compare with leading edge CPU, control system is simplified. In case of Cell, as for PPE the allotment etc. of task which can send OS thing and vis-a-vis SPE you take charge of the control. In other words, processor power is taken to also SPE control. Like Intel, it is not the case that you suppose that the existing application which requires singles lead efficiency with this core can be sent.
This difference is presumed that it has come from the difference of the standpoint, Cell which is started from Intel and zero which are bound in succession of existing software property. Does again constructing also the ecosystem of the software from zero radical reform could designate with Cell, as prerequisite. This is visible, as become the big strength of Cell. Though, perhaps even with Cell the constitution which it is raises singles lead efficiency with the CPU core for control more as rich, possibility. In addition, to Intel, to pull out efficiency with ヘテロジニアスマルチコア, correspondence of considerable level of software side becomes necessary.