* Hiroshige's Goto Weekly overseas news *
Natural shape of the heart " Cell " of PlayStation 3 of SCEI
- Processing efficiency of 1teracFlops
- Excessively the powerful Cell processor
It is presumed CPU " Cell " of PlayStation 3 of 2005 years loads 32 sub processors onto the one tip/chip perhaps, it is possible to do the operation of maximum of 1,024 32bit data in parallel, with the operational unit 256. Processing efficiency, the largest 1teracFlops (the Floating Operations per Second) with, probably will reach to like the ち ょ and the supercomputer before with floating point arithmetic. With operational efficiency, it is thought x86 CPU of the same time, for example " the Nehalem (the ネハレム)" of the Intel and so on it surpasses much. However, also the number of transistors of the tip/chip, 10 hundred million level and, probably will largely rise from several hundred million.
The graphic tip/chip of PlayStation 3 and perhaps the Cell technology base, external memory of the Cell is supposed " the Yellowstone (the yellow stone)" of the Rambus inter- connecting between the DRAM and the tip/chip " the Redwood (the redwood)" of the Rambus technology is used. In addition, core tip/chip set " Emotion Engine of PlayStation 2 (the emotion engine)" " the Graphics Synthesizer (the graphic synthesizer)" it is presumed the compatible chip which makes into one chip, the I/O (excludes network) the tip/chip is loaded.
The SONY * computer entertainment (the SCEI), still concerning the technical summary of the next generation PlayStation you have not revealed altogether. But, the SCEI several of the patent which is applied in Japan and America pertaining to the Cell has started being released, from the contents PlayStation the contour of the Cell network which surrounds 3 and PlayStation 3 is becoming clear.
If you look at the patent document, the Cell is scalable architecture extremely. As said from the time before, not only PlayStation 3, wide field can be covered from the PDA to the server. And, it has become the architecture which can make the Cell of the constitution which differs to for the respective use.
Because of that, from the patent document, constitution of the Cell for PlayStation 3 it is difficult to specify completely. But, when the constitution which is assumed the ideal (preferred embodiment), in other words it is desirable in patent, is supposed one for PlayStation 3, the part where coherence is agreeable is many. Because of that, here so supposing, we would like to try presuming the structure of PlayStation 3.
- Like the マトリョーシカ the Cell architecture
Feature in the hardware surface of architecture of the Cell processor is ' nest '. Like the マトリョーシカ, the processor is housed with layered structure.
Becoming the basis of the Cell " the Processor Element (the PE)" with the CPU core which is called. This, becomes the smallest unit which can be operated as a single unit. With the example where the Cell processor which is the patent document is desirable, this PE 4 units is included in the one tip/chip. The possibility also the Cell of PlayStation 3 being the structure is high. If the normality CPU you think, you should have thought it is the multichip constitution which loads four CPU cores. Of course, the Cell processor of the constitution 1, with the one for portable equipment and the like you can think the PE. Each PE inside the Cell " the Broadband Engine (the BE) is connected with the bus which is called the Bus ".
As for the Cell differing from the usual CPU, furthermore subordinate processor unit in this PE " Attached Processing Unit (the APU)" plural it is the point which is loaded. These other things, controls each APU " the Processing Unit (the PU)" with, takes charge of memory access " the Direct Memory Access Controller (the DMAC)" is included in the PE. The PU " the APU remote procedure call (the ARPC)" with using the order inside it calls, controls the APU group. As PC thought, each APU processes the individual thread, with thread parallel conversion CPU, the PU can also see as thread scheduling unit.
The quantity of APU which is built in to 1 PE is not fixed. But, it is assumed with example of the patent document it is desirable to build in 8 APU to 1 PE. The possibility also the Cell of PlayStation 3 being this structure is high. However, the APU 4 and 6 calls also constitution is thought depending upon the equipment. The PU and the DMAC and the APU in the PE inside it is called the " Local PE bus " are connected with the bus.
Plural operational units are included in each APU. When it is example of the patent document, the floating point arithmetic unit 4 and the integer arithmetic unit 4 are loaded. It is assumed both the SIMD (the Single Instruction, the Multiple Data) is desirable to be processing unit. These other things, in each APU 128 128bit registers (with floating point and integer common? With local memory of the 128KB is loaded.
By the way, as for the operational unit group and the bus between the register, the register -> the operational unit group the 384bit and the operational unit group -> the register the 128bit. In other words, it is the case that 3 it can read out four operational units, do + 1 entry in parallel vis-a-vis the 128bit register. This is same the floating point unit group and the integer unit group.
By the way, from the fact that bus width and register width are the 128bit, as for the APU both the floating point / integer it is found that the SIMD of 128bit width is supposed. If typical the 32bit×4, in other words, the floating point, it is presumed it can calculate single precision data 4 SIMD. Quick story is the same type as the SSE2 unit of x86 type CPU and the Programable Shader of the GPU.
Processor Element (PE)
As for PDF edition this way PlayStation 3 Block Diagram
As for PDF edition this way AttachedProcessing Unit (APU)
As for PDF edition this way
PlayStation 3 Main Chip
As for PDF edition this way PlayStation 3 Main Chipset
As for PDF edition this way
- The operation 512 is done in parallel the Cell
So, when you suppose the Cell of PlayStation 3, is the same constitution as the ideal example of the patent document, as for processing efficiency how becoming? First, you will try looking at the degree of parallel of operation.
Each Cell is formed with four PE, each PE has 8 APU, we assume each APU each operational unit the SIMD can calculate usually 4 data with the floating point unit 4 and the integer unit 4. So when it does, it becomes as follows.
PE quantity inside Cell 4
APU quantity inside Cell 32
Floating point unit number inside Cell 128
Integer unit number inside Cell 128
32bit floating point arithmetic parallel of Cell 512
32bit integer arithmetic operational parallel of Cell 512
It becomes the operation 4 data ×4 operational unit ×8apu×4pe = 512. Therefore, the Cell per 1 clock, being maximum, is the case that it is the ability to do the 32bit floating point arithmetic 512 and the 32bit integer arithmetic 512 simultaneously.
So, operational frequency of the Cell designating the around which as the target? According to the patent document, floating point unit of the Cell, is assumed efficiency of the 32GFLOPS is desirable. When it is the Japanese patent document, this even like efficiency of the floating point unit 1 you can read, but when it is the American document, because it becomes plurals, as for this when it is the 32GFLOPS at total of the unit 4, it is understood clearly. So when it does, when it calculates backward, as for operational frequency of the Cell it is found that the 2gHz is anticipated.
So, with parallel processing of 512 data with 2gHz operation efficiency how becoming? With the 512×2gHz, as for floating point arithmetic peak becomes the 1teracFlops. Also integer arithmetic is the same. By the way, with the announcement data of the SCEI of the time before, it was assumed " one Cell achieves the operational efficiency of TeracFlops class ". In other words, efficiency of the Cell with constitution of the patent document and, it is the case that it agrees exactly. It is presumed even from this the Cell of PlayStation 3, has been similar to the constitution example where the patent document is desirable.
Concerning the Cell, it can presume the extent many thing which is surprised to these in addition to from the patent document. Network interface of the constitution of software object and on-chip, interesting architecture such as APU and the DRAM control which makes the memory bank coincide is fully loaded to the Cell to in addition to. It is close to the on parade of the architecture which pierces unexpectedness.
But, there is many also a thing which is not found yet from among patent. For example, in just the patent document, you do not know whether main memory of the Cell being the Embedded DRAM whether it is the external DRAM. But, believes with external, the Yellowstone DRAM of the Rambus is used concerning this the reason which is enough is several. In the future, we would like to report also another side of architecture of such PlayStation 3, consecutively.
□ back number
(May 29th of 2003)
[ Hiroshige Reported by Goto (Hiroshige Goto) ]
This is the translation of the article...
Courtesy of Babelfish...