Cell details (Nikkei Electronics)

version said:
if you read vertexstream (16bito_O,y,z,normalX.Y,Z,uv1,uv2=128bit ,16 byte)
and rotated, projected vertexs that about 10 cycle on SPE
then 1 SPE load 6.4 GB/s data from memory, 4 SPE kill the xdr bandwith
what doing other 4 SPE and PPE ?
Decent vertex shaders will take much more than 10 cycles, moreover vertices are not unique, most vertices are going to be re-used.
 
nAo said:
version said:
if you read vertexstream (16bito_O,y,z,normalX.Y,Z,uv1,uv2=128bit ,16 byte)
and rotated, projected vertexs that about 10 cycle on SPE
then 1 SPE load 6.4 GB/s data from memory, 4 SPE kill the xdr bandwith
what doing other 4 SPE and PPE ?
Decent vertex shaders will take much more than 10 cycles, moreover vertices are not unique, most vertices are going to be re-used.

25GB/s peak bandwith that will be 15-20 GB/s real bandwith (bottleneck, latency etc...)
if 8 SPE has 25GB/s memory bandwith it is totally suxx, xbox2 will be 10times faster :)
 
version said:
if 8 SPE has 25GB/s memory bandwith it is totally suxx, xbox2 will be 10times faster :)
Xbox will have about the same bandwith..shared with the GPU (+ edram bw) IIRC. (and maybe even CELL CPU will share that bw with the GPU..)
Even if it would be nice to have more bandwith do you have to understand that with just 256/512 Mb of main ram you would not have the space to store enough unique content to be transferred in a single frame from main memory! We are going to re use a lof of stuff (instancing) or to generate a lot of content (procedural geometry generation..). That's the way we'll use all the fp power. Console never had the memory to store huge datasets per frame, and never will.
 
Noooooooo...think Onions! :p

Each SPE is a small onion (small pipeline) with concentric rings. Eight small onions (8 SPEs) work independantly...and...CELL is one big onion with concentric rings (big pipeline encompassing 8 SPEs)...these bigger concentric rings (large pipeline) link the small onions together (small pipelines)...

...I know...I've gone mad.... :p
 
nAo said:
version said:
if 8 SPE has 25GB/s memory bandwith it is totally suxx, xbox2 will be 10times faster :)
Xbox will have about the same bandwith..shared with the GPU (+ edram bw) IIRC. (and maybe even CELL CPU will share that bw with the GPU..)
Even if it would be nice to have more bandwith do you have to understand that with just 256/512 Mb of main ram you would not have the space to store enough unique content to be transferred in a single frame from main memory! We are going to re use a lof of stuff (instancing) or to generate a lot of content (procedural geometry generation..). That's the way we'll use all the fp power. Console never had the memory to store huge datasets per frame, and never will.

x2 use UMA with 75 GB/s bandwith and 756MB ram, i mean
 
Jaws said:
Noooooooo...think Onions! :p

Each SPE is a small onion (small pipeline) with concentric rings. Eight small onions (8 SPEs) work independantly...and...CELL is one big onion with concentric rings (big pipeline encompassing 8 SPEs)...these bigger concentric rings (large pipeline) link the small onions together (small pipelines)...

...I know...I've gone mad.... :p

Onions... Concentric Rings.. Damn it now ive the desire to order a Pizza and atleast a 1:8 configuration.
 
Jaws said:
Noooooooo...think Onions! :p

Each SPE is a small onion (small pipeline) with concentric rings. Eight small onions (8 SPEs) work independantly...and...CELL is one big onion with concentric rings (big pipeline encompassing 8 SPEs)...these bigger concentric rings (large pipeline) link the small onions together (small pipelines)...

...I know...I've gone mad.... :p
Ogres are like onions, too.[/Shrek]

Anyway, stream processing is a great solution if you can break your problem into 8*N consecutive stages of similar processing time. That is, if you're wanting to tap all your SPEs at the same time. You could run multiple streams if not. I don't think this is going to happen for most anything outside of Naughty Dog's and Polyphony's games, though.
 
I hope the PS3 is like a parfait. Everybody loves parfait! A parfait may be the most delicious console in the world.
 
Npl said:
Jaws said:
Noooooooo...think Onions! :p

Each SPE is a small onion (small pipeline) with concentric rings. Eight small onions (8 SPEs) work independantly...and...CELL is one big onion with concentric rings (big pipeline encompassing 8 SPEs)...these bigger concentric rings (large pipeline) link the small onions together (small pipelines)...

...I know...I've gone mad.... :p

Onions... Concentric Rings.. Damn it now ive the desire to order a Pizza and atleast a 1:8 configuration.

Are you sure it was't because of nAo's Italian sausage? :p :oops:
 
Inane_Dork said:
Jaws said:
Noooooooo...think Onions! :p

Each SPE is a small onion (small pipeline) with concentric rings. Eight small onions (8 SPEs) work independantly...and...CELL is one big onion with concentric rings (big pipeline encompassing 8 SPEs)...these bigger concentric rings (large pipeline) link the small onions together (small pipelines)...

...I know...I've gone mad.... :p
Ogres are like onions, too.[/Shrek]

Anyway, stream processing is a great solution if you can break your problem into 8*N consecutive stages of similar processing time. That is, if you're wanting to tap all your SPEs at the same time. You could run multiple streams if not. I don't think this is going to happen for most anything outside of Naughty Dog's and Polyphony's games, though.

Well, the ability to create multiple dynamic streaming pipelines is one of CELLs core/key abilities, so I'm hoping STI have had the foresight to create appropriate tools to take advantage of this so that more than Naughty Dog, Polyphony et al can take advantage of it and make it sing.
 
More info from Nikkei Electronics:

FlexIO has a function to enable NUMA(non-uniform memory access).
An outside chip(another Cell or GPU?) can access main memory
(which is connected directly with Cell) via FlexIO.
 
PCEngine said:
Are you sure it was't because of nAo's Italian sausage?
Are you sure it's safe to mention sausages in the same thread London Boy is posting in? :oops:

version said:
if you read vertexstream (16bito_O,y,z,normalX.Y,Z,uv1,uv2=128bit ,16 byte)
How about I read a 7:1 compressed multiresolution mesh on one SPU, and transmit 7x data to all other SPUs on their internal bus, only using 1GB/s of main memory bus. :p
 
Back
Top