ISSCC 2005

Hmmm, on a second thought, are there different upper and lower Instructionsets
I am pretty sure that's what it is - as mentioned, this is quite like the approach taken on VUs in PS2, and what you said about execution units being split across pipelines pretty much confirms it.
 
Re: hahahahaha

nAo said:
The second pipeline is probably devoted to load/store, dma queues, branching, etc..
If we factor in even those operations we can inflate the 256 GFlops/s figure ;)
Those instructions can't really be said to be flops now can they... They sound more like ints to me, unless there's a way that I missed learning of to branch to a fractional address or somesuch of course! ;)
 
Re: hahahahaha

Guden Oden said:
Those instructions can't really be said to be flops now can they... They sound more like ints to me, unless there's a way that I missed learning of to branch to a fractional address or somesuch of course! ;)
I left out div or other complex fp instructions (thanks Faf!) :)
 
Re: hahahahaha

AutomatedMech said:
Something is not right. Each CELL APU burns only 1 watt @ 0.9 V at 2 Ghz???? 11 watts at 5 Ghz??? If IBM had such technology, it can forget about making chips for a living, license that tech to Intel and make billions/year.

Read. Learn. Post. I suggest you repeat the first two:

P ~= CFVV

F ~= V

P ~= CF^3

5/2 = 2.5

2.5 * 2.5 * 2.5 ~= 15.

Most likely they are reaching their min functional voltage before they reach 2 Ghz. Which shifts the results somewhat.

Aaron Spink
speaking for myself inc.
 
Why is local storage divided into four banks? Can each be individualy addressed during a 128bit load/store and what does "permute" offer (beyond bit/byte permutations) for its large estate requirements...?
 
PiNkY said:
Why is local storage divided into four banks? Can each be individualy addressed during a 128bit load/store and what does "permute" offer (beyond bit/byte permutations) for its large estate requirements...?

So that you can DMA to/from local storage, all while running code which loads/stores from/to local memory ?

Cheers
Gubbi
 
Hmm that might sound totally stupid (as knowledge wise, this really is walking on thin ice...) but wouldn't you simply need a second access port on the memory (along with an arbiter) for simultanious/interleaved dma transfers?

P.S.: Shouldn't the 128 GPRs give you some flexibility in manual prefetching /caching anyways...
 
First CELL presentation should start in a few minutes.
I want the paper..I want the paper..I want...or the slides at least! ;)

ciao,
Marco
 
I thought DP was short for double precision?

Anyway so if the banks are for that purpose memory is single ported I assume? But yeah, even on VU they with single ported access they just arbitrate all DMA requests to wait for the VU.
 
Fafalada said:
Anyway so if the banks are for that purpose memory is single ported I assume.

Pure speculation on my part: I assume it's pseudo dual ported, like AMD's K7/8 (8 way interleaved) level 1 dcache.

Another possibility is that IBM's SRAM macro is 64KB and they just made 4 instances of it.

Cheers
Gubbi
 
No nitpicking intended but i think K7's as well as K8's l1-datacaches are only 2 way-set-associative, though both are pseudo dual-ported...
 
Athlon had 16-way L1 caches at least initially as I recall. That may have changed in later revisions though. 2-way though seem a much too drastic a change to be realistic however...
 
Back
Top