PS3's CPU, GPU, RAM and eDRAM configuration?

So which option would be good for PS3, A, B, C or D?

  • B

    Votes: 0 0.0%
  • C

    Votes: 0 0.0%
  • D

    Votes: 0 0.0%
  • Other

    Votes: 0 0.0%

  • Total voters
    71
Jaws said:
Shinjisan said:
Just a question for you tech experts.

Each Cell has two XDR channel which support up to 4 DRAM chips.
Is it possibile to have just two DRAM chips,one for channel?

Sorry, I don't really understand the question...

XIO memory controller = 2 XDR interfaces

XDR interface = 2 Channels

1 Channel = 16 bit

Therefore XIO controller = 4 channels * 16 bit = 4 * XDR RAM chips @ 64 bit bus

I suppose they can vary these...but this is what Rambus state IIRC...

Thank you, it's all clear now.
I didn't know each interface has two channels so I made a bit of confusion.
So each Cell must be connected to 4 XDR chips for a total bandwidth of 25,6GB/s.
So for example we could have a 65nm 2 CELL CPU with 512MB (256MB per Cell using 512Mb modules) and a total bandwidth of 50GB/s.
Right?
And if they're really using 512Mb chips this automatically excludes a 4 Cell chip configuration since it would require 1 GB of Ram.
 
Any change to get only eDRAM (e.g.256MG) in any console?, prices go down fast , would give BW, no need to buy memory, the bigest problem would be production right?

In the begining would be expensive, but later...
 
Im just hoping xbox2 and ps3 both ship with 512 but the more Im hearing both are looking like 256. XDR is more pricey than gddr and you dont see any 512mg Vid Cards yet on the PC.

Cell chip+BR drive+Nvidia GPU+512 Rambus= Loan Application. (unless its a late 2006/Early 2007 launch in which case Xenon would have sold alot of units)

Im guessing Sony is gonna try to launch ASAP and maybe a little more ahead Hardware wise than they want everyone to believe. Look for early 2006 launch with 256. But thats a guess. I think Sony's plan is 6 months of hype machine to scare off Xenon buyers than a quick launch. Similar to Dreamcast strategy and it may work unless MS has some great software hidden away. Dont think Pefect Dark 0 and a bunch of Higher Res EA ports are gonna do it alone.
 
i mean ps3's cpu has 256 MB ram,

GPU will be similar than cell, possible 1 PPE (control , api), some(4) SPE, and 4-8 renderprocessor every with 4-16MB edram,and 256 MB ram
in sony patent APU=SPE or RPE(renderproc)

this is my speculate , what you mean?
 
version said:
i mean ps3's cpu has 256 MB ram,

GPU will be similar than cell, possible 1 PPE (control , api), some(4) SPE, and 4-8 renderprocessor every with 4-16MB edram,and 256 MB ram
in sony patent APU=SPE or RPE(renderproc)

this is my speculate , what you mean?

I think the only similarity the GPU will have with Cell is the multicore approach.
Considering what they have done with SLI, I expect Nvidia's next generation to feature multicore support.
So we could have a two core GPU,each core with,let's say,24 pipelines.
The GPU will only have shaders caring about lighting,since all the vertex trasform calculations will be performed by the Cell which is relly efficient for this kind of things.The vertex shaders' place will be replaced by embedded Ram (I hope 64MB).
I also think and hope that Sony will go for a 65nm 2 Cell CPU with 512MB of main ram.
Honestly 256MB and 256Gigaflops is not that impressive,isn't the ATI R5XX supposed to have more than 200Gigaflops power?And that's only the GPU for T&L.
If Sony wants to amaze surely it won't do it with a single Cell processor and 256MB of main ram.
 
sure, GPU has a FlexIO because communicate with cpu, and a XIO communicate with XDR
what bus in gpu do with FlexIO and XIO? 100GB/s? EIB maybe

logically correct ?
 
version said:
sure, GPU has a FlexIO because communicate with cpu, and a XIO communicate with XDR
what bus in gpu do with FlexIO and XIO? 100GB/s? EIB maybe

logically correct ?

CPU and GPU will communicate with FLEXIO which is a 76,8GB/s interface, (also I don't know if that doubles in case they use for example 2 Cell in the CPU and 2 cores in the GPU).It remains to be seen how the GPU will interface itself with XDRam,if it can directly access to the main Ram or having it's own XDR pool.
 
Acert93 said:
Total guess...

F.

CPU => 256MB
GPU => ~40MB eDRAM

1 CELL (1 : 8) and 1 GPU (~600MHz, 16x1 or 16x2; I would be shocked at something over 32 unless there is no eDRAM). CELL SPEs do vertex shading (although I would think the GPU would be more effecient at this per-transistor) and GPU will be able to read from CPU cache. GPU will mainly focus mainly on pixel related functions. Possibly a slower 2ndary pool for RAM (similar to GCN).

Again just a guess (not what I want... that would be 512MB of shared XDR, 2 CELLS or even 4 1:4 CELLS, and a massive GPU... a BR cherry on top).
...

Sorry, I missed this earlier, please could you clarify how this is a new 'F.' and not the existing 'C.'?
 
Sorry, I missed this earlier, please could you clarify how this is a new 'F.' and not the existing 'C.'?

C. has 512MB, not 256MB. ;)

After reading a link someone posted today it seems there is a chance the GPU will have an XDR memory controller. If this is the case my guess is wrong on the GPU. Without knowing the cost of XDR, I would take a stab and guess it would either have 128MB (4 x 256mb chips). This amount would not be a great issue due to a fast link to the CELL/main memory. But it is a guess... it really depends on what Sony/nVidia decide is a better use of transistors. Do they dedicate more transistors to logic or memory with eDRAM?
 
Shinjisan said:
Jaws said:
Shinjisan said:
Just a question for you tech experts.

Each Cell has two XDR channel which support up to 4 DRAM chips.
Is it possibile to have just two DRAM chips,one for channel?

Sorry, I don't really understand the question...

XIO memory controller = 2 XDR interfaces

XDR interface = 2 Channels

1 Channel = 16 bit

Therefore XIO controller = 4 channels * 16 bit = 4 * XDR RAM chips @ 64 bit bus

I suppose they can vary these...but this is what Rambus state IIRC...

Thank you, it's all clear now.
I didn't know each interface has two channels so I made a bit of confusion.
So each Cell must be connected to 4 XDR chips for a total bandwidth of 25,6GB/s.
So for example we could have a 65nm 2 CELL CPU with 512MB (256MB per Cell using 512Mb modules) and a total bandwidth of 50GB/s.
Right?
And if they're really using 512Mb chips this automatically excludes a 4 Cell chip configuration since it would require 1 GB of Ram.

No problems, in fact I also need further clarification for my own benefit here (please correct if wrong), and referring to the RWT article,

cell-10.gif


http://www.realworldtech.com/page.cfm?ArticleID=RWT021005084318&p=11

The RWT diagram shows 4 XDR DRAM chips in a 2-channel, 2*2 config....(with 1 XDR interface)

What I showed above, in my calc, was a 4 XDR DRAM in a 4-channel, 1*4 config...


You should be able to scale in a simple rule of thumb,

Code:
'X' Chips with 'N'-channel, 'M*N' config....

where integers,

X=M*N

and

Memory bus bandwidth B,

B = X*16bit

Obviously there will be absolute limits of X and N...

So a CELL processor isn't limited to 4 XDR chips and therefore not limited to 256 MByte with 64 bit Bandwidth of ~ 25 GByte/s (4*512Mbit chips @ 3.2 GHz). You can increase bandwidth to a CELL processor by,

1. Increase X number of XDR RAM chips
2. Increase frequency (currently 3.2 GHz but can be upto 6.4 GHz)
 
Acert93 said:
Sorry, I missed this earlier, please could you clarify how this is a new 'F.' and not the existing 'C.'?

C. has 512MB, not 256MB. ;)
...

Read my assumptions again from first page. ;)

Jaws said:
2. I've chosen a total 512MB RAM here but you can chooses what you feel is expected...

So how is it still not C?

Acert93 said:
...
After reading a link someone posted today it seems there is a chance the GPU will have an XDR memory controller.
...

Do you have that link please?
 
DeanoC said:
http://www.beyond3d.com/forum/viewtopic.php?p=462030#462030

Also pipelines in the way most people talk about them don't exist...

I do agree with the above. The thing is that both CELL and future GPUs are trying to meet in the 'middle' so to speak of general and specialised processing.

So which side of the fence is CELL? It will always be on the general side of the fence, IMO, and GPU always on the specialised side. And noone can actually sit on that proverbial fence! :p

A question...from what we know of CELL so far, would it be better at pixel or vertex processing, in your opinion?
 
Jaws said:
A question...from what we know of CELL so far, would it be better at pixel or vertex processing, in your opinion?

As a vertex shader, SPUs will be fairly good. As I've mentioned before except for the limited vertex texture cababilties, they have most of whats required.

If I had to write a standard vertex pipe today (knowing very little low level details about SPU), I'd suggest an architecture similar to either current VU1 or a SSE type pipe. The VU1 type pipe either ignores vertex caching or keeps a small in place cache, every N triangles they are kicked to the card. A SSE type pipe, block transforms a bank of vertices, then either de-indices manually or relys on the video card to do it.

As pixel shaders (with prehaps exception for totally procedural shaders), SPUs just aren't upto the job. Basic texture capabilties is the major problem, but also the lack of things like free LERPs, swizzles and a seperate scalar unit.
 
DeanoC said:
Jaws said:
A question...from what we know of CELL so far, would it be better at pixel or vertex processing, in your opinion?

As a vertex shader, SPUs will be fairly good. As I've mentioned before except for the limited vertex texture cababilties, they have most of whats required.

If I had to write a standard vertex pipe today (knowing very little low level details about SPU), I'd suggest an architecture similar to either current VU1 or a SSE type pipe. The VU1 type pipe either ignores vertex caching or keeps a small in place cache, every N triangles they are kicked to the card. A SSE type pipe, block transforms a bank of vertices, then either de-indices manually or relys on the video card to do it.

As pixel shaders (with prehaps exception for totally procedural shaders), SPUs just aren't upto the job. Basic texture capabilties is the major problem, but also the lack of things like free LERPs, swizzles and a seperate scalar unit.

Thanks Deano...would the introduction of this EIB ring bus allow you to approach things differently for either pipelines now perhaps?

Also on a side note, a theory:

GS = 48 GB/s bandwidth

Therefore in-order to emulate PS2, XDR RAM bandwidth > 48 GB/s true?
 
Jaws said:
Acert93 said:
Sorry, I missed this earlier, please could you clarify how this is a new 'F.' and not the existing 'C.'?

C. has 512MB, not 256MB. ;)
...

Read my assumptions again from first page. ;)

Jaws said:
2. I've chosen a total 512MB RAM here but you can chooses what you feel is expected...

So how is it still not C?

Well, my idea was that there may be an external memory pool does not follow C and is "other" ;)

The other reason is I did see your note about assumptions... The thread title is on PS3 configurations--your poll is extremely limiting and makes (too) many assumptions even if you take into consideration the escape claus. e.g. stating there is 512MB of RAM (even if it is a proportional deal) indicates 2 CELLs, not 1 which I outlined in my thoughts. I did not feel right voting in a poll on face value for a configuration that I do not think will appear, even with an escape clause ;) I thought it best just to list what I think will appear. Secondly I think there will be a chance for a backup pool of memory so C did not work anyhow. While I did read your exception, I think it would have been best to do something like this if you wanted certain assumotions to remain neutral (and not imply things other than more assumptions!) and focus on the general configuration thoroughput:

[CPU(s)] <===> [GPU<>eDRAM] -----> Output
|
[MEMORY]

Buy putting specific memory amounts in your diagrams there were intrinsicaly certain assumptions implied ontop of the other assumptions like the number of CELLs (memory amounts could indicate the minimum numbers, e.g. 512MB requires at least 2 CELLs), memory amounts (who is not to say 2 CELLs with 256MB of memory?), proprotions of memory (e.g. in B the GPU and CPU have the same proportions of memory), and no alternative for alternative memory pools.

Just way too many assumptions! :) But we can all assume this: I have very little clue what the final specs will be. Not trying to give you a hard time or mess up your poll. Just assume I fall under other because of the memory pool and all will be fine ;)

Acert93 said:
...
After reading a link someone posted today it seems there is a chance the GPU will have an XDR memory controller.
...

Do you have that link please?

http://www.xbitlabs.com/news/multimedia/display/20041228125957.html
 
Acert93 said:
Jaws said:
Acert93 said:
Sorry, I missed this earlier, please could you clarify how this is a new 'F.' and not the existing 'C.'?

C. has 512MB, not 256MB. ;)
...

Read my assumptions again from first page. ;)

Jaws said:
2. I've chosen a total 512MB RAM here but you can chooses what you feel is expected...

So how is it still not C?

Well, my idea was that there may be an external memory pool does not follow C and is "other" ;)

Thanks for explaining 'other' and why it was chosen. :)

Now I want to clarify a few things, for the benefit of the poll...

Would you have chosen 'C' if it allowed for multiple 'CELLs', irrespective of the memory amount of those CELLs?

If true, the poll already allows for this because Phil raised this point on the first page,

http://www.beyond3d.com/forum/viewtopic.php?p=461235#461235

[CPU] != 1 CELL

But I can see why you thought this as will be explained below...

---------

Okay, I want to answer some of your points made as they are valid. I've deliberately offered A, B, C and D as presented because I'm relying on people to tell me otherwise in a poll, as you've done yourself. And of course, if you don't agree with the options then you should always choose 'other' with an explanation if possible. Now if the options presented were way off or not clear then hopefully i would've got lots of posters saying so or the 'other' option would be a high percentage. In this case, 'D' seems to be a clear majority! :)

Also you mention assumptions but I want to clear some of those up...

Acert93 said:
...
The other reason is I did see your note about assumptions... The thread title is on PS3 configurations--your poll is extremely limiting and makes (too) many assumptions even if you take into consideration the escape claus. e.g. stating there is 512MB of RAM (even if it is a proportional deal) indicates 2 CELLs, not 1 which I outlined in my thoughts. I did not feel right voting in a poll on face value for a configuration that I do not think will appear, even with an escape clause ;) I thought it best just to list what I think will appear.
....

As asked by Phil and linked above, multiple CELLs are allowed in the poll.

A 512MB memory pool does not imply 2 CELLs. That's YOUR assumption not MINE. I did not assume this but it seems it's a common assumption which I wasn't trying to impose.

[256MB] != 1 CELL

Here's an explanation why,

http://www.beyond3d.com/forum/viewtopic.php?p=462167#462167

You can make this assumption if you want but I'm not forcing you.

Acert93 said:
...
Secondly I think there will be a chance for a backup pool of memory so C did not work anyhow.
...

When you say 'backup pool', do you mean XDR DRAM coming off the GPU?

Acert93 said:
...
While I did read your exception, I think it would have been best to do something like this if you wanted certain assumotions to remain neutral (and not imply things other than more assumptions!) and focus on the general configuration thoroughput:

[CPU(s)] <===> [GPU<>eDRAM] -----> Output
|
[MEMORY]
...

Multiple CELLs are allowed as explained earlier. I deliberately avoided this method because it's already an option as C and only focusing on the above would exclude NUMA configs as in B and D or would be difficult to present in the poll.

Acert93 said:
...
Buy putting specific memory amounts in your diagrams there were intrinsicaly certain assumptions implied ontop of the other assumptions like the number of CELLs (memory amounts could indicate the minimum numbers, e.g. 512MB requires at least 2 CELLs), memory amounts (who is not to say 2 CELLs with 256MB of memory?), proprotions of memory (e.g. in B the GPU and CPU have the same proportions of memory), and no alternative for alternative memory pools.
...

Two CELLs != 512 MB as mentioned above.

I've explicitely chosen memory configs and the 'equal' ratio assumption but allowed you to choose your own.

E.g. for A and C, 512MB but you can choose your own amount. For B and D, 256MB for CPU and 256MB for GPU as an equal ratio but you can choose your amount. If you don't agree with this equal ratio then 'other' option is valid.


Acert93 said:
...
Just way too many assumptions! :)
...

Nope...IMHO, you're the one making more assumptions as stated above... ;)

Acert93 said:
...
But we can all assume this: I have very little clue what the final specs will be. Not trying to give you a hard time or mess up your poll. Just assume I fall under other because of the memory pool and all will be fine ;)
...

I know and the reason for the poll to 'see' what B3D thinks! ;)
Also I'm not trying to get specific specs, but a likely architecture, i.e. a NUMA or UMA, eDRAM or no eDRAM etc.

Acert93 said:
...
After reading a link someone posted today it seems there is a chance the GPU will have an XDR memory controller.
...

Do you have that link please?

http://www.xbitlabs.com/news/multimedia/display/20041228125957.html

Thanks...
 
The PS3 is not shipping with 512mb of XDR ram! Along with a Blu-ray player to boot? Wishful thinking gents, as if cost to Sony is somehow a negligible issue now when they're already operating in debt.

Cell chip+BR drive+Nvidia GPU+512 Rambus= Loan Application.

Exactly, someone's confused Sony for MS equivalent type expenditures.

If Sony wants to amaze surely it won't do it with a single Cell processor and 256MB of main ram.

Why is that exactly? They've already got the XBX 360 trumped, & the PS3 is not their only concern with the implementation of Cell still having to be introduced to much of their product line.

A)

Code:
[CPU]<==>[GPU]---> Output
|
[256MB]

With whatever accessible e-DRAM pool strictly for the GPU obviously.
 
*Hiroshige's Goto Weekly overseas news*
The trend "ヘテロジニアスマルチコア" whose CPU is new




--------------------------------------------------------------------------------

- ヘテロジニアスマルチコア in trend

CPU technical trend of multiple core age
As for PDF editionthis

The Cell processor is the multiple core CPU which loads 9 CPU cores. Those where it is important even among them 8 you take charge of transaction of operation "SPE (Synergistic Processor Element)" with are the processor group which is called. As for SPE, it is possible to do the processing for the plural data in 1 order "SIMD (Single Instruction and Multiple Data)" with the processor of type, powerful efficiency is shown with processing of stream type.

There is the memory of 256KB which is called Local Store in each SPE. This is not L2 cash, is the memory which generally known scratchpad, programmer side can use explicitly freely. The memory space of Local Store has become the local memory space of SPE possession. The load of SPE/store order is executed in this local address space, when SPE, it accesses system memor, by way of DMA.

Program execution with SPE, with DMA transferring the data & the program in Local, Store becomes the shape which it does. Originally, the basic thought of Cell is called the program software Cell, it handles in the program piece and the object form which houses the data. Because of that, it is seen that in 256KB is settled the program is supposed from beginning.

With Local Store, you do not take coherent unlike cash. Because of that, with Cell it is not necessary always to maintain キャッシƒâ€¦コヒーレンシ over the whole tip/chip. This, is the important element which decreases the overhead of the multiple core.

With Cell, as for each element Element Interconnect Bus (EIB) with it is connected with the ring model bus of the wide band which is called. SPE via EIB, accesses system memor and the like with DMA transfer. Being maximum, to issue 16 transfers with the on-the-fly it is possible DMA request.




- Equipping the special mode which protects the contents


Importance of the Cell processor, is not just the point that simply, it is loaded onto also PlayStation 3 with new CPU of the SONY group + IBM + Toshiba. Being very important with respect to architecture as for, Cell, new technology trend "ヘテロジニアス of CPU (Heterogeneous: Various mixture) it is the point which anticipates the multiple core ".

ヘテロジニアスマルチコア CPU is the multiple core CPU which the CPU core of the type which differs plural is loaded. This differs from the present leading multiple core CPU which the CPU core of the same architecture plural is loaded largely. And, Cell, in the general-purpose type CPU which aims for the computer market of mass, is only ヘテロジニアスマルチコア CPU under present conditions.

Though, naming, really, ヘテロジニアスマルチコア fixed is not something which sticks. There is no name which still becomes settled in multiple core CPU of the asymmetric type like Cell. But, because naming, ヘテロジニアスマルチコア which appears recently has shown essence just, here we would like to do to the thing which is called the hetero ジ near ス multiple core. By the way, if Cell ヘテロジニアスマルチコア, "Pentium 4 8xx of Intel (Smithfield: Smith field)"and it seems like Power4/5 of IBM, CPU which arranges the CPU core of the same type in symmetrical type" homogeneous (Homogeneous) means the multiple core ".

ヘテロジニアスマルチコア CPU, coming to here, started receiving attention suddenly. Cell appeared not only, the various researcher developers, produced to talk suddenly, concerning ヘテロジニアスマルチコア CPU. That it is typical we have assumed that Intel in the future actualizes among several generations, "メニイコア (Many-core) CPU" is. Many-core of Intel, the big scalar type CPU core of conventional type and, being simple, is seen that small (perhaps you focused to vectoring type) it becomes the typical ヘテロジニアスマルチコア type which combines the CPU core group. Intel it links the time when it starts talking concerning Many-core, with the time when one end of the dissertation of Cell in industry starts becoming clear. As a composition, it means that Intel chases Cell.


- Why being ヘテロジニアスマルチコア?

Why, coming to here, ヘテロジニアスマルチコア becoming the trend of CPU? You can think two reasons and directivity to that. If (1) ヘテロジニアスマルチコア while maintaining singles lead efficiency, it is possible to raise multithread operation efficiency substantially. (2) by the fact that the respective CPU core is optimized in use, with the homogeneous multiple core the high efficiency which cannot be actualized can be actualized. As for former on the route which Intel and the like thinks, latter is the technique which Cell takes.

As for Cell, you suppose that OS can be sent and the CPU core "PPE which was focused to control type processing (POWER Processor Element)" 1 and, it optimized "SPE (Synergistic Processor Element)" 8 you load mainly in data processing of stream type.

"Cell, has 1 control point processor and the plural data point processors. As for this combination, it is possible efficiently to be able to send the program very. Because the SPE engine group which is the data point processor is optimized as much as possible. The data point processor in order to be able to send OS, we did not design, to however, we specialized in being able to send application powerfully. As a result, in the data point processor, it was possible to make smaller efficiently, without designating efficiency as sacrifice. We specialized in application execution, the very efficient processor was made.

On the one hand, as for the control point processor in order OS send, コーディネイト to do program execution, you designed. コーディネイト doing the resource, by the fact that it allots work to the other data point processor, it can send the whole machine efficiently. Combination of the processor of these 2 types works to be good very ", that Jim Kahle of IBM which takes charge of Cell development (the gym kale) the person (IBM Fellow) you talk.

PPE Hyper-Threading like with multithread operation CPU, thread change is fast, it is designed to for the environment which has the frequent thread change like OS. SPE of one side, SIMD (Single Instruction and Multiple Data) is optimized in operation, supposes the stream data and either excessive cash class does not have. But we suppose that SPE thread change is heavy, continuing the processing of the small-numbered thread, does we do not suppose that OS can be sent.

In case of Cell, this way it was not to make the CPU core all uses correspond, it is by the fact that it differentiates to the core of 2 types which are optimized, while maintaining efficiency, it converted the CPU core to use thoroughly simply. As a result, the floating point arithmetic efficiency which does not have either the way, where with 90nm process the die/di 200 square mm (the semiconductor itself) it becomes possible, 256GFLOPS at the time of 4GHz to accumulate 9 cores, could be actualized. With the same 90nm process, in the same die/di, only 2 CPU cores which pass it can load Intel and AMD.

With Cell, it is not to make the single CPU core which chases two rabbits unreasonably, the CPU core which owes just one rabbit 2 types is prepared. Perhaps Cell it can modify the fact that the rabbit is obtained securely with that, without letting escape also two rabbits.


- The efficiency of CPU reaches the limits, the multiple core necessary

There is a circumstance that in the background where ヘテロジニアスマルチコア surfaces the efficiency improvement of CPU reached the limits.

So far CPU, in order to pull up efficiency with the single core, (1) improvement of operational frequency and, (2) IPC (the number of orders which can be executed in the instruction per cycle:1 cycle) focused to improvement. The pipeline was subdivided for improving the frequency, for the improving IPC, dynamically parallelism of order level (Instruction-Level Parallelism:ILP) the out-of-order type execution which is raised the various acceleration technologies which are annexed to that were introduced.

Even in the past it explained with this corner, but efficiency improvement with such technique had accompanied the cost, complication of CPU. As for present leading edge CPU, the scheduling control section and the like for ILP improvement occupies the enormous area. Because of that, efficiency improvement becomes non efficiency, die/di size (the area of the semiconductor itself) increases to 2 times, square root amount of the die/di which was increased (approximately 1.4 times) only performance rises. In other words, with CPU you call to every process generation 2 times, to it is the case that the efficiency improvement which parallels "Moore's law" is not obtained. As a result, as for up-to-date CPU performance /watt and the performance/die/di area deteriorated, non efficiency it became the thing. Intel calls this rule of thumb "law of ポラック".

Efficiency improvement of point CPU this way the reason which becomes non efficiency must pull up the scalar operational efficiency of the singles lead-lead, because there was the spell. In case of the instruction set like x86, many software property is held. In order and, to pull up the efficiency of existing application, with the singles lead-lead, it is necessary to increase the efficiency of scalar operation mainly.

But as for here several years as for the clock improvement of CPU it blunted, from increase of electric power consumption, it reached the point where the efficiency improvement whose efficiency is better is required. Then, CPU industry started facing to the improvement of multithread operation efficiency. If "thread level parallelism (TLP:Thread-Level Parallelism)" it improves with multiple core conversion, it reaches the point where it can increase CPU efficiency more efficiently than the so far.

But, there was a problem even here. Intel and AMD in order to maintain also singles lead efficiency, reusing the core of former single core CPU, actualized the multiple core. Because of that, performance per electric power consumption and the die/di area is not that much good as still. With present condition 90nm process, also multiple core conversion above 2way is difficult.

If But, solution of this problem being simple, makes the CPU core simple, you can evade. If complicated control mechanism is excluded, performance with approximately the directly small CPU core can be maintained. If of law of ポラック is thought conversely, efficiency in only 1/2 means not to fall with the die/di area of the CPU core as 1/4. If the simple CPU core because large number it can load, multithread operation efficiency directly becomes high. It is possible to make the multiple core CPU where the die/di & electrical efficiency are good simply.


How to think the multiple threading of Cell
As for PDF editionthis



- The efficiency of CPU is raised with ヘテロジニアスマルチコア

But there is a trade-off even in this technique. When (1) it makes the simple core, by any means scalar efficiency of the singles lead-lead falls. (2) it is difficult to make many control type central task and the simple core which can be processed in stream type processing and this both high speed of multimedia system in OS and the like.

It is the case that then the idea which comes out, was the multiple core of ヘテロジニアス type.

For example, if the small-sized CPU core which pursues efficiency to a large-sized CPU core and the simple structure which pursue singles lead efficiency is combined, while maintaining singles lead efficiency, it can actualize high parallel multithread operation. It seems that takes the method Many-core of Intel taking this method, combining 2 - 4 large-sized CPU cores and several dozen - 100 simple CPU cores. As for the program where singles lead efficiency is needed, by the fact that until recently it can send full function with the large-sized CPU core which is in extension of architecture, with until recently sort relatively high performance can be maintained high ILP. On the one hand, to multithread operation was converted new application to be allotted by the plural simple CPU core groups, high multithread operation efficiency can be enjoyed.

On the one hand, like Cell, the CPU core, there is also the approach which is carved in the control type CPU core and the data type CPU core. As explained already, with Cell, while maintaining by the fact that the respective CPU core is made to specialize, small, it achieves the high performance.

By the way, perhaps, PPE of Cell, from the word, POWER instruction set interchangeability, it receives the impression of the rich core which pursues singles lead efficiency, but, really so is not.

A certain Intel authorized personnel from before the ISSCC announcing has grasped "concerning Cell summary, it had researched even inside the company. The PowerPC core of Cell thinks that it is not the CPU core which had that much high singles lead efficiency. It is different from Many-core which we think, essentially ", you say.

PPE, is not about SPE, but relatively it is the simple CPU core. With CPU of in order type execution, you do not take either the technique of レジスタリネーミング and the like. It seems like PowerPC 970, when you compare with leading edge CPU, control system is simplified. In case of Cell, as for PPE the allotment etc. of task which can send OS thing and vis-a-vis SPE you take charge of the control. In other words, processor power is taken to also SPE control. Like Intel, it is not the case that you suppose that the existing application which requires singles lead efficiency with this core can be sent.

This difference is presumed that it has come from the difference of the standpoint, Cell which is started from Intel and zero which are bound in succession of existing software property. Does again constructing also the ecosystem of the software from zero radical reform could designate with Cell, as prerequisite. This is visible, as become the big strength of Cell. Though, perhaps even with Cell the constitution which it is raises singles lead efficiency with the CPU core for control more as rich, possibility. In addition, to Intel, to pull out efficiency with ヘテロジニアスマルチコア, correspondence of considerable level of software side becomes necessary.
 
Back
Top