ISSCC 2005

version · Feb 7, 2005

http://www.scee.presscentre.com/imagelibrary/detail.asp?MediaDetailsID=25555

ChryZ · Feb 7, 2005

version said:
http://www.scee.presscentre.com/imagelibrary/detail.asp?MediaDetailsID=25555

I've already posted that a page ago, but thanks for the effort.

Fafalada · Feb 7, 2005

Hmmm, on a second thought, are there different upper and lower Instructionsets

I am pretty sure that's what it is - as mentioned, this is quite like the approach taken on VUs in PS2, and what you said about execution units being split across pipelines pretty much confirms it.

Guden Oden · Feb 7, 2005

Re: hahahahaha

nAo said:
The second pipeline is probably devoted to load/store, dma queues, branching, etc..
If we factor in even those operations we can inflate the 256 GFlops/s figure

Those instructions can't really be said to be flops now can they... They sound more like ints to me, unless there's a way that I missed learning of to branch to a fractional address or somesuch of course!

nAo · Feb 7, 2005

Re: hahahahaha

Guden Oden said:
Those instructions can't really be said to be flops now can they... They sound more like ints to me, unless there's a way that I missed learning of to branch to a fractional address or somesuch of course!

I left out div or other complex fp instructions (thanks Faf!)

aaronspink · Feb 7, 2005

Re: hahahahaha

AutomatedMech said:
Something is not right. Each CELL APU burns only 1 watt @ 0.9 V at 2 Ghz???? 11 watts at 5 Ghz??? If IBM had such technology, it can forget about making chips for a living, license that tech to Intel and make billions/year.

Read. Learn. Post. I suggest you repeat the first two:

P ~= CFVV

F ~= V

P ~= CF^3

5/2 = 2.5

2.5 * 2.5 * 2.5 ~= 15.

Most likely they are reaching their min functional voltage before they reach 2 Ghz. Which shifts the results somewhat.

Aaron Spink
speaking for myself inc.

version · Feb 7, 2005

what is it? not readable

archie4oz · Feb 7, 2005

channel

version · Feb 7, 2005

archie4oz said:
channel

thx, i am blind

where is DIVIDE ?

nAo · Feb 7, 2005

version said:
where is DIVIDE ?

DP? Who knows..

Gubbi · Feb 7, 2005

version said:
archie4oz said:

channel

Click to expand...

thx, i am blind
where is DIVIDE ?

Likely nowhere. Probably has a 1/x estimate instead. Use Newton Raphson to reach desired precision.

Cheers
Gubbi

PiNkY · Feb 7, 2005

Why is local storage divided into four banks? Can each be individualy addressed during a 128bit load/store and what does "permute" offer (beyond bit/byte permutations) for its large estate requirements...?

Gubbi · Feb 7, 2005

PiNkY said:
Why is local storage divided into four banks? Can each be individualy addressed during a 128bit load/store and what does "permute" offer (beyond bit/byte permutations) for its large estate requirements...?

So that you can DMA to/from local storage, all while running code which loads/stores from/to local memory ?

Cheers
Gubbi

PiNkY · Feb 7, 2005

Hmm that might sound totally stupid (as knowledge wise, this really is walking on thin ice...) but wouldn't you simply need a second access port on the memory (along with an arbiter) for simultanious/interleaved dma transfers?

P.S.: Shouldn't the 128 GPRs give you some flexibility in manual prefetching /caching anyways...

version · Feb 7, 2005

"DP" = duble precision?

possible DIV is in GPU ?

nAo · Feb 7, 2005

First CELL presentation should start in a few minutes.
I want the paper..I want the paper..I want...or the slides at least!

ciao,
Marco

Fafalada · Feb 7, 2005

I thought DP was short for double precision?

Anyway so if the banks are for that purpose memory is single ported I assume? But yeah, even on VU they with single ported access they just arbitrate all DMA requests to wait for the VU.

Gubbi · Feb 7, 2005

Fafalada said:
Anyway so if the banks are for that purpose memory is single ported I assume.

Pure speculation on my part: I assume it's pseudo dual ported, like AMD's K7/8 (8 way interleaved) level 1 dcache.

Another possibility is that IBM's SRAM macro is 64KB and they just made 4 instances of it.

Cheers
Gubbi

PiNkY · Feb 7, 2005

No nitpicking intended but i think K7's as well as K8's l1-datacaches are only 2 way-set-associative, though both are pseudo dual-ported...

Guden Oden · Feb 7, 2005

Athlon had 16-way L1 caches at least initially as I recall. That may have changed in later revisions though. 2-way though seem a much too drastic a change to be realistic however...

ISSCC 2005

version

ChryZ

Fafalada

Guden Oden

Senior Member

nAo

Nutella Nutellae

aaronspink

version

archie4oz

ea_spouse is H4WT!

version

nAo

Nutella Nutellae

Gubbi

PiNkY

Gubbi

PiNkY

version

nAo

Nutella Nutellae

Fafalada

Gubbi

PiNkY

Guden Oden

Senior Member

Similar threads