Details trickle out on CELL processor...

one said:
The current 1st-gen 90nm Cell processor runs at 4.60 GHz @ 1.3V / 85°C !!!!!!

Japanese article about ISSCC 2005 highlights

011bl.jpg
nice 8)
 
A stream processor has a memory hierarchy that is optomized for streaming. If CELL is a stream processor, 6.4 gb/s would be 3 times that of Stanfords Imagine stream processor.

Memory System

As described above, all Imagine memory references are made using stream load and store instructions that transfer an entire stream between memory and the SRF. This stream load/store architecture is similar in concept to the scalar load/store architecture of contemporary RISC processors. It simplifies programming and allows the memory system to be optimized for stream throughput, rather than the throughput of individual, independent accesses. The memory system provides 2.1GB/s of bandwidth to off-chip SDRAM storage via four independent 32-bit wide SDRAM banks operating at 143MHz. The system can perform two simultaneous stream memory transfers. To support these simultaneous transfers, four streams (two index streams and two data streams) connect the memory system to the SRF. Imagine addressing modes support sequential, constant stride, indexed (scatter/gather), and bit-reversed accesses on a record-by-record basis.
 
Brimstone said:
A stream processor has a memory hierarchy that is optomized for streaming. If CELL is a stream processor, 6.4 gb/s would be 3 times that of Stanfords Imagine stream processor.

Memory System

As described above, all Imagine memory references are made using stream load and store instructions that transfer an entire stream between memory and the SRF. This stream load/store architecture is similar in concept to the scalar load/store architecture of contemporary RISC processors. It simplifies programming and allows the memory system to be optimized for stream throughput, rather than the throughput of individual, independent accesses. The memory system provides 2.1GB/s of bandwidth to off-chip SDRAM storage via four independent 32-bit wide SDRAM banks operating at 143MHz. The system can perform two simultaneous stream memory transfers. To support these simultaneous transfers, four streams (two index streams and two data streams) connect the memory system to the SRF. Imagine addressing modes support sequential, constant stride, indexed (scatter/gather), and bit-reversed accesses on a record-by-record basis.

Sure but what's the clock speed of the Imagine? How many FLOP can it do per clock?
 
Gubbi said:
It's probably 6.4Gbit/s per pin. Even though that does sound a bit high

Yellowstone/XDR with an 800MHz base clock should yeild what I believe is 6.4Gbit per pin. I could be wrong, but I feel it's right.
 
400Mhz at the time that was written.


The trick to stream processing is the stream register file (memory hierarchy) and how it allows for a very high bandwidth. A stream processor is different from a general purpose CPU.
 
Brimstone said:
400Mhz at the time that was written.


The trick to stream processing is the stream register file (memory hierarchy) and how it allows for a very high bandwidth. A stream processor is different from a general purpose CPU.

In other words at 400MHz it need 2.1GB/s.

At 4.6GHz it would need what?

Edit: Ok at 4.6GHz it would need about 24GB/s which is doable since the XDR will provide at least 25GB/s. Now the question is whether or not the CELL chip is doing a lot more FLOP/clock than the Imagine chip. Also how much SRAM does the Imagine contain?
 
FatherJohn said:
Yeah, I don't think PS3 will have a full GPU. There's certainly no reason to have any vertex processors. And depending on how the second cell's APUs are configured they may be able to use them as some sort of renderer. (The Stanford Imagine group tried an experiment where they configured their stream processor as a Reyes-style renderer. It worked, but it was abysmally slow -- 20 time slower than a contemporary Z-buffer-based renderer, but using 3x the transistors. The dirty secret of stream-based processors is that they are very hard to get useful work out of.).

Why are we jumping on them for not having a Pixel Shading Rasterizer... shouldn't we wait for more details :devilish: ?
 
Panajev2001a said:
FatherJohn said:
Yeah, I don't think PS3 will have a full GPU. There's certainly no reason to have any vertex processors. And depending on how the second cell's APUs are configured they may be able to use them as some sort of renderer. (The Stanford Imagine group tried an experiment where they configured their stream processor as a Reyes-style renderer. It worked, but it was abysmally slow -- 20 time slower than a contemporary Z-buffer-based renderer, but using 3x the transistors. The dirty secret of stream-based processors is that they are very hard to get useful work out of.).

Why are we jumping on them for nto having a Pixel Shading Rasterizer... shouldn't we wait for more details :devilish: ?


realtime raytracing the key
no more fucking pixel shader
 
Its seems unlikely (silly even) that the overhead is referring to "per-pin" bandwidth.

And I would like to hear something more solid than conjecture about "stream-processors" as right now they are sounding like they are right up there with Supeman and Batman ;)
 
version said:
Panajev2001a said:
FatherJohn said:
Yeah, I don't think PS3 will have a full GPU. There's certainly no reason to have any vertex processors. And depending on how the second cell's APUs are configured they may be able to use them as some sort of renderer. (The Stanford Imagine group tried an experiment where they configured their stream processor as a Reyes-style renderer. It worked, but it was abysmally slow -- 20 time slower than a contemporary Z-buffer-based renderer, but using 3x the transistors. The dirty secret of stream-based processors is that they are very hard to get useful work out of.).

Why are we jumping on them for nto having a Pixel Shading Rasterizer... shouldn't we wait for more details :devilish: ?


realtime raytracing the key
no more fucking pixel shader

AMEN to THAT! :oops: :devilish:
 
Bohdy said:
Its seems unlikely (silly even) that the overhead is referring to "per-pin" bandwidth.

Um, well, correct me if I'm wrong, but how likely is it that Cell's off-die communication (6.4Gbit/sec) is less than the infamous EE -> GS bus in the PlayStation2?

For a 4.6GHz processor, I'm going to guess that the fact that Yellowstone/XDR just happens to be 6.4Gbit/pin when the base clock is 800MHz is the more likely scenario.
 
Vince said:
Bohdy said:
Its seems unlikely (silly even) that the overhead is referring to "per-pin" bandwidth.

Um, well, correct me if I'm wrong, but how likely is it that Cell's off-die communication (6.4Gbit/sec) is less than the infamous EE -> GS bus in the PlayStation2?

For a 4.6GHz processor, I'm going to guess that the fact that Yellowstone/XDR just happens to be 6.4Gbit/pin when the base clock is 800MHz is the more likely scenario.

EE to GS is less than 6.4.
Can't remember but it's either 1.2 or 3.2.
 
6.4 GHz effective data signalling rate can be achieved with an external 800 MHz XDR clock using a 4x PLL multiplier factor or with an external 400 MHz XDR clock using a 8x PLL multiplier factor (the PLL is "programmable").

Such signalling rate would push 6.4 Gbps with 2 pins per bit (differential signalling).
 
Vince said:
Bohdy said:
Its seems unlikely (silly even) that the overhead is referring to "per-pin" bandwidth.

Um, well, correct me if I'm wrong, but how likely is it that Cell's off-die communication (6.4Gbit/sec) is less than the infamous EE -> GS bus in the PlayStation2?

For a 4.6GHz processor, I'm going to guess that the fact that Yellowstone/XDR just happens to be 6.4Gbit/pin when the base clock is 800MHz is the more likely scenario.


between cell and gpu will 128 pin
6.4gb/pin*128= 102.4 Gbyte/s
 
Back
Top