So there's no way PS3 CPU is getting more than one PE~Cell?

Status
Not open for further replies.
iapetus said:
Assuming the multiple units would be on a single chip, the price would more than double. Should still be within an order of magnitude for the chip itself, though, let alone the full system.


Well, i'm not sure why going from 1 to 2 PEs on the same chip should cost more than double. I could be wrong but if anything, given economies of scale it should cost marginally less, depending on how many units they fab, but as i said, it's just what i was thinking, i could be very wrong :D
 
Because not only are you doubling the cost of the raw resource, but you're increasing the likelihood of a given chip containing a defect. That's what I'm lead to believe, anyway. I may be totally wrong, though.

Either way, no orders of magnitude are involved. :)
 
PC-Engine said:
Whether the BE contains 1 PE or 2 PEs, PS3 would still be losing money. With 2 PEs it would be losing orders of magnitude more than a single PE.

You say more is often better, but later your explaination doesn't support that hypothesis.

Performance wise my friend, performance wise...

Besides with 65nm the chip is gonna be smaller as has been stated by others, and they've bigger wafers(which, IIRC, allow for either more chips of smaller size or same number of chips only of greater size.)

I mean gpus have about that number of transistors on 130nm process, and they're expected to transition to smaller processes relatively well(while nvidia, ati are not even benefitting from having their own fabs), that is increasing performance(which some say would be even higher than 256Gflops, at 65nm and below). A highly parallel and scalable architecture like cell being unnable to keep up with'em is kinda strange, especially since it's expected to be performing a similar task at least in ps3(VS).
 
oh my, GI.Biz is still hold on to 'teh dream' also

Indeed, it's widely expected that the PlayStation 3 could boast as many as four Cell chips, which would give a theoretical CPU performance of over 1000 gigaflops, or one teraflop - a very theoretical measure, admittedly, but still enough to earn the PS3 a place on the supercomputer list.
 
zidane1strife said:
A highly parallel and scalable architecture like cell being unnable to keep up with'em is kinda strange, especially since it's expected to be performing a similar task at least in ps3(VS).
As I pointed out in another post, Cell isn't highly parallel or scalable compared to a GPU.

An advanced (say next-gen(ish)) might have 16 ALU (capable of the same (or more) maths per cycle as a SPU) connected to a single instruction decoder. It might have 3-4 of these. So it has the math capability (per cycle) of say 6-8 PE but with massive restrictions with branching (i.e. the entire array of ALU follow the same path) etc.

SPU are SIMD with regard single numbers (4 way float for example), GPU are SIMD with regard larger data elements (entire vertices or pixels). A GPU is essentially a SIMD version of an SPU (of course the actual ALU in a GPU are also SIMD just to confuse matters).

The reason an SPU would make a fairly good vertex shader is because
a)branching is quite handy in vertex shader which kills much of the advantages of an array processor architecture. Also things like texture access (latency hiding) are less used in a vertex shader.
b)We don't need that many vertices compared to pixel, so a more flexible but slower architecture might be a good idea.
c)The SPU have a massive clock advantage (~8 times higher) that makes up for a fair bit of the lack of parallism compared to a GPU.

Just to put things in comparision, current top end GPU have 24 ALUs compared to the 9 in Cell... If ATI/NVIDIA ever start to clock there chips at CPU speeds then the speed advantage of an array processor will be obvious (Serious question: why don't GPU ALUs clock much higher? I really don't know why we won't see 2GHz ALU in near future GPUs?)
 
DeanoC said:
(Serious question: why don't GPU ALUs clock much higher? I really don't know why we won't see 2GHz ALU in near future GPUs?)

Meybe that explain those Tensica and Fast 14 licenses?
 
(Serious question: why don't GPU ALUs clock much higher? I really don't know why we won't see 2GHz ALU in near future GPUs?)

GPU maker, doesn't have the technology or time. And doing it the way they're doing it, is alot cheaper and gets the result, needed so far. When it stops giving the the result they want, I expect them to start thinking about clock speed too.
 
DeanoC said:
Just to put things in comparision, current top end GPU have 24 ALUs compared to the 9 in Cell...

Either 22 Vector and 6 Scalar or 38 Vector and 6 Scalar actually (dependant on how you view them)! :p
 
DaveBaumann said:
DeanoC said:
Just to put things in comparision, current top end GPU have 24 ALUs compared to the 9 in Cell...

Either 22 Vector and 6 Scalar or 38 Vector and 6 Scalar actually (dependant on how you view them)! :p

I meant 22 (6+16)(god my maths can be bad at times...), I basically decided to ignore the whole what a ALU is on a GPU (hence the quick (or more) in the maths per cycle).

As it happens my hypothetical "next-gen" GPU would have a 4D vector and 1D scalar as a single ALU...
 
Well, but aren't gpus expected to become ever more flexible? In terms of h/w designed specifically for 3d gaming, wouldn't it be more apt to place a slightly beefy cpu, and stick a giant gpu?

edited2:

To clarify, given that the cpu can't easily scale further. Could they not instead ask nvidia to scale-up the gpu side of the system?(assuming they use 65nm)

On another note,


http://www.extremetech.com/article2/0,1558,1761407,00.asp
"The FlexIO technology will be used to connect the various chips on a Cell-based motherboard, according to Rich Warmke, marketing director of the memory interface division at Rambus. A multicore Cell processor, by contrast, will use its own internal bus to connect multiple cores."

So a multi PE chip is planned at least.-McFly

Fredi
What would be the purpose of that(assuming it's not a misunderstanding)? At least for workstations, is it not cheaper to simply add multiple single PE chips?(how exactly can they get 16Tflops in future workstations? Is it "far off" future then?)
 
DeanoC said:
Intel could make a X86 processor with as many FLOPs as transistors allow, they CHOOSE not to. For them general processing performance is far more important. For the high end they have a monster Itanium2, so far they haven't bothered with a FLOPs monster, but if they really need to they could design one.

If you want to be depressed, consider that ISSCC Cell is only as complex as a high end GPU. Array processors (GPUs) will be the first single chip to hit a programmable TFLOP. The first 200+ GFLOP GPU come off the production line recently, we are still slightly ahead of Moores Law in GPU land, so mid 2006 should hit 500 GFLOP, with a TFLOP in 2007
It looks like Cell is going to be a benchmark monster but in the real world how often and how much will scalar operations constrict performance? In what scenarios would all that floating point power be utilized where a more traditional GPU could not do likewise?


(Serious question: why don't GPU ALUs clock much higher? I really don't know why we won't see 2GHz ALU in near future GPUs?)
Inquiring minds want to know. :)
 
(Serious question: why don't GPU ALUs clock much higher? I really don't know why we won't see 2GHz ALU in near future GPUs?)

This ones relatively esay to answer....
It's cheaper to go wider than it is to increase clock. And as you mentioned graphics is so embarassingly parallel that it basically amounts to the same performance gain.

To some extent it's the same reason that X2 and PS3 aren't putting in a single processor clocked at 10+GHz.
 
ERP said:
[
This ones relatively esay to answer....
It's cheaper to go wider than it is to increase clock. And as you mentioned graphics is so embarassingly parallel that it basically amounts to the same performance gain.

To some extent it's the same reason that X2 and PS3 aren't putting in a single processor clocked at 10+GHz.

Given that massively parellel is relatively new in this PC/console space, its easy to forget its actually easier for the hardware to just duplicate units. Until now hardware guys have worked hard making software easy to program, I guess now its payback time ;-)
 
ERP said:
(Serious question: why don't GPU ALUs clock much higher? I really don't know why we won't see 2GHz ALU in near future GPUs?)

This ones relatively esay to answer....
It's cheaper to go wider than it is to increase clock. And as you mentioned graphics is so embarassingly parallel that it basically amounts to the same performance gain.

To some extent it's the same reason that X2 and PS3 aren't putting in a single processor clocked at 10+GHz.
Then why would it not be better to have a R300 @ 2GHz instead of a R420@520MHz?
 
I should also clarify that I'm still hopeful for this

Sony and Toshiba signed the joint development agreement in Tokyo and it calls for completion of the project by late 2005, with the ultimate goal of being first to market with 45nm know-how.
(old I know...)
http://www.ciol.com/content/search/showarticle.asp?artid=54371
Even if they achieved this, are you saying it would still be impossible to include more than one PE, let alone a BE?
 
nelg said:
Then why would it not be better to have a R300 @ 2GHz instead of a R420@520MHz?
Maybe I don't understand your question, but I think ERP already answered that:
ERP said:
This ones relatively esay to answer....
It's cheaper to go wider than it is to increase clock. And as you mentioned graphics is so embarassingly parallel that it basically amounts to the same performance gain.
 
Vysez said:
nelg said:
Then why would it not be better to have a R300 @ 2GHz instead of a R420@520MHz?
Maybe I don't understand your question, but I think ERP already answered that:
ERP said:
This ones relatively esay to answer....
It's cheaper to go wider than it is to increase clock. And as you mentioned graphics is so embarassingly parallel that it basically amounts to the same performance gain.
Can Intel and AMD clock so high due to better wafers or process tech. Intel seems to be able to produce >1GHz chips cost effectively. So what is it specifically about a 13nm GPU that prevents it from being clocked the same as a 13nm CPU?
 
Status
Not open for further replies.
Back
Top