ISSCC 2005

nAo said:
If we render a triangle strip we have a 4 GPoly/s figure! are you happy now?
No, I want MORE :p

*getting silly now*
If people's assumptions about DIV being replaced by one of those reciprocal approximates are right, you may very well have faster throughput for divisions then one every 8 cycles. How about 5cycles - so we can just stash through one vertex in 5cycles :p
*getting sillier*
But wait, why even bother with perspective divide when the GPU can do that(hey, NVidia can do it, right, Right?) for us, we just give it the H-Space coordinates. Whoa, 8GPoly/s :LOL:
 
Fafalada said:
Megadrive said:
the VMX unit attached to the PU would be like the equivalent of the FPU attached to the MIPS core in Emotion Engine, would it not? yeah it would add to the total flops performance, but not significantly increase it from ~256 Gflops.
Actually, the announced EE GFlops numbers counted all the FMAC+FDIV units spread over the system (when you consider all the FPU units are basically the same, this makes a lot of sense - the FMAC&FDIV making up the R5900 FPU are equivalent to those found in VUs).
So you have:
10FMACs + 4FDIVs = (10*2Flop/cycle + 4*0.16flop/cycle) * 300mhz ~ 6.2GFlops.
R5900 core FPU thus contributed ~ 10% of overall FPU rating, which may not be a lot, but it's definately something.

In case of this Cell, VMX would potentially increase the rating another 32GFlops - which is also just over 10% of the cumulative rating, amusingly enough :p

Faf, I am shocked that noone of your calculations had the no. 294 in it! :devilish:
 
Fafalada said:
But wait, why even bother with perspective divide when the GPU can do that(hey, NVidia can do it, right, Right?) for us, we just give it the H-Space coordinates. Whoa, 8GPoly/s :LOL:
emh..that's too good to be true..unfurtunately nvidia 'attach' a rcp instruction at the end of the shader..emh.. :LOL:
 
Fafalada said:
nAo said:
If we render a triangle strip we have a 4 GPoly/s figure! are you happy now?
No, I want MORE :p

*getting silly now*
If people's assumptions about DIV being replaced by one of those reciprocal approximates are right, you may very well have faster throughput for divisions then one every 8 cycles. How about 5cycles - so we can just stash through one vertex in 5cycles :p
*getting sillier*
But wait, why even bother with perspective divide when the GPU can do that(hey, NVidia can do it, right, Right?) for us, we just give it the H-Space coordinates. Whoa, 8GPoly/s :LOL:


you compute how much polygons datas can send to GPU at 100 GB/s
i mean max 6gigapoly
 
Jaws, you're quite right, let me revise (anyone that likes binary numbers will like this).

Since PS3 Cell won't be exactly the same, it will yield a nominal frequency of 4096 Mhz! :p At this, each VMX/SPU will yield 32,768 GFlop, for a grand total of 294.912 GFlops.

Am I good or what? :LOL:

nAo said:
emh..that's too good to be true..unfurtunately nvidia 'attach' a rcp instruction at the end of the shader..emh..
I just kinda like the idea of sending H-Space data to GPU, makes it feel more - unified :p Don't really care who takes the cost of that divide much.
 
Fafalada said:
Jaws, you're quite right, let me revise (anyone that likes binary numbers will like this).

Since PS3 Cell won't be exactly the same, it will yield a nominal frequency of 4096 Mhz! :p At this, each VMX/SPU will yield 32,768 Flop/cycle, for a grand total of 294.912 GFlops.

Am I good or what? :LOL:

Good but I'll reserve that cigar! ;)
 
hmmmm ~97 Gflop GSCube ( 16 * Emotion Engine) produces 1.2 giga polys
peak.

256 Gflop Cell Processor, which is no doubt MUCH tighter than 16 EEs, should do roughly 3x as much, maybe more because of the tightness. ~4 Gpolys ?
 
Megadrive1988 said:
hmmmm ~97 Gflop GSCube ( 16 * Emotion Engine) produces 1.2 giga polys
peak.

256 Gflop Cell Processor, which is no doubt MUCH tighter than 16 EEs, should do roughly 3x as much, maybe more because of the tightness. ~4 Gpolys ?

But can we really correlate pure GFLOP rating with polygon performance?
And what u mean by... errr... tightness...? :oops: ;)
 
Fafalada said:
Megadrive said:
the VMX unit attached to the PU would be like the equivalent of the FPU attached to the MIPS core in Emotion Engine, would it not? yeah it would add to the total flops performance, but not significantly increase it from ~256 Gflops.
Actually, the announced EE GFlops numbers counted all the FMAC+FDIV units spread over the system (when you consider all the FPU units are basically the same, this makes sense - the FMAC&FDIV making up the R5900 FPU are equivalent to those found in VUs).
So you have:
10FMACs + 4FDIVs = (10*2Flop/cycle + 4*0.16flop/cycle) * 300mhz ~ 6.2GFlops.
R5900 core FPU thus contributed ~ 10% of overall FPU rating, which may not be a lot, but it's definately something.

In case of this Cell, VMX would potentially increase the rating another 32GFlops - which is also just over 10% of the cumulative rating, amusingly enough :p


hey, I stand corrected. slightly. so the FPU of the MIPS core in EE provided ~10 of the total flops rating in EE.

and then a ~32 Gflops VMX is nothing to sneeze at. actually isnt that like an extra APU`SPU`SPE's worth of flops ?
 
london-boy said:
Megadrive1988 said:
hmmmm ~97 Gflop GSCube ( 16 * Emotion Engine) produces 1.2 giga polys
peak.

256 Gflop Cell Processor, which is no doubt MUCH tighter than 16 EEs, should do roughly 3x as much, maybe more because of the tightness. ~4 Gpolys ?

But can we really correlate pure GFLOP rating with polygon performance?
And what u mean by... errr... tightness...? :oops: ;)


what I meant by tightness is this: the 16 Emotion Engines in GSCube are all seperate CPUs, spread across the system. so I'll bet its ~97 Gflops are much harder to obtain or come close to, compared to a single Cell PE's ~256 Gflops since everything (all of its computing resources) are located on one chip, unlike 16 seperate EEs. it would probably be more difficult to extract ~97 Gflops from GSCube than it would be to extract ~256 Gflops from a Cell PE. not that either system would reach its peak theorectical performance, but on a 'tighter' (now you see?) single chip Cell PE it should be easier and come closer than a huge complex system like GSCube with dozens of VUs and FPUs spread across 16 chips.
 
Yes and that VMX unit is right there attached to the PU.

Even a program that does not touch the APU's at all should run faster on such a CELL chip than on PlayStation 2 and by quite a nice margin too :D.
 
Megadrive1988 said:
what I meant by tightness is this: the 16 Emotion Engines in GSCube are all seperate CPUs, spread across the system. so I'll bet it's ~97 Gflops are much harder to obtain or come close to, compared to a single Cell PE's ~256 Gflops since everything (All the computing resources) are located on one chip.

What i thought... Still, the first question was, can we really make a direct guess/comparison of polygon performance from the FLOP rating of a chip? Genuine question.
I mean, i'm sure we can guestimate, but i'm also quite sure there's more to it than a simple FLOP->Polys calculation.
 
london-boy said:
Megadrive1988 said:
what I meant by tightness is this: the 16 Emotion Engines in GSCube are all seperate CPUs, spread across the system. so I'll bet it's ~97 Gflops are much harder to obtain or come close to, compared to a single Cell PE's ~256 Gflops since everything (All the computing resources) are located on one chip.

What i thought... Still, the first question was, can we really make a direct guess/comparison of polygon performance from the FLOP rating of a chip? Genuine question.
I mean, i'm sure we can guestimate, but i'm also quite sure there's more to it than a simple FLOP->Polys calculation.
\


no doubt you are correct. probably cannot make a direct flops-to-polygons calculation. or can we? no i think we can't. although only a ballpark~guestimate.

techies, what say you?
 
london-boy said:
Megadrive1988 said:
what I meant by tightness is this: the 16 Emotion Engines in GSCube are all seperate CPUs, spread across the system. so I'll bet it's ~97 Gflops are much harder to obtain or come close to, compared to a single Cell PE's ~256 Gflops since everything (All the computing resources) are located on one chip.

What i thought... Still, the first question was, can we really make a direct guess/comparison of polygon performance from the FLOP rating of a chip? Genuine question.
I mean, i'm sure we can guestimate, but i'm also quite sure there's more to it than a simple FLOP->Polys calculation.

That calculation can be done, but it assumes infinite bandwidth and no communication overhead (as well as no resource contention, other sources of stalls, etc...) if the only thing you look at is theoretical peak Floating-Point performance.
 
Cell Q&A (totally 'meh' though)

http://business.timesonline.co.uk/article/0,,9075-1475581,00.html

Q&A: the superchip
Holden Frith answers some of the questions raised by the launch of the Chip

How soon will it be before I can buy a product that uses the Cell?

Its first scheduled appearance will be in the Sony PlayStation 3 games console, which should go on sale in early 2006. A prototype is expected to feature at the E3 computer fair in Los Angeles this May.

What difference will it make to the equipment?

The Cell chip is faster than existing microchips because it can work on several tasks at once. Computers with Cell chips can also share processing power so that if one computer is not working at full speed, another connected to it by a network or the internet can make use of its spare computing capacity.
*


What difference will Cell users notice?

In games consoles, faster microchips will allow game designers to employ higher quality sound and smoother, more realistic graphics. There is already suggestions that greater use of graphics from Hollywood movies may be possible.

In PCs, the main use will be multimedia applications as Cell chips will be better able to process the vast amount of information delivered by ever-faster broadband internet connections.

According to IBM, today's microchips were created with word processors and spreadsheets in mind and therefore struggle to cope with tasks such as downloading music and displaying video. The Cell chip has been designed to address this shortcoming.

What difference might the Cell make to the price of it make to the price PCs and games consoles?

It will make them more expensive, at least in the early stages of production. Financial analysts have suggested that the high cost of the PlayStation 3, which they predict will have to be priced at between $500 and $750 when it is launched in the United States, will deter potential buyers. As always with computers, though, the price is likely to fall quite rapidly with time.

What is meant by "clock speed" and "flash memory"?

The clock speed of a microchip is a measure of how quickly it can perform calculations and therefore how powerful it is. The 4 Gigahertz Cell chip will be able to perform four billion calculations per second.

Flash memory refers to the ability of a chip to store information so that it doesn't have to send it to another part of the computer for safe keeping. The more information it can store, the faster it will complete its work. Flash memory is familiar to many people who own digital cameras, which use Compact Flash cards to store photographs.

Will the Cell make my existing home PC or games console obsolete, or can I upgrade them?

The PlayStation 3 is likely to replace the PS2 just as that replaced the initial PlayStation in 2000, but there will probably be a crossover period when new game releases will continue for the older model.

Upgrading existing PCs is unlikely to be possible as the new processor requires completely different software. However, since many people do not push their PCs to the limits, the mass desertion of traditional microchips is unlikely in the immediate future.
 
figure2.gif

figure3.gif


IBM's PowerPC 970FX's voltage/clock scaling chart. PowerPC970FX is fabbed on same fab as CELL and CELL uses 970 core as its CPU. This chart is IBM's admission that PowerPC 970FX burns 100 watts at 2.5 Ghz.

I hit the bull's eye on CELL's clockspeed and FLOPS rating, 64 GFLOPS @ 1 Ghz. Too be honest, even I was shocked I got it years before.

Right now, CELL was designed for 1 Ghz operation and tested successfully upto 1.4 Ghz according to SCEI's released material. The final clockspeed depends on how much loss per hardware Kutaragi Ken is willing to take on the hardware.
 
Back
Top