ISSCC 2005

Acert93 said:
100GB/s? Wow. Now we can hope for 512MB ;) But note: "The Rambus XDR memory interface, capable of data rates of 3.2GHz to 8.0GHz." The question is how much bandwidth will the PS3 CELL memory have.

From: http://www.extremetech.com/article2/0,1558,1761407,00.asp
The FlexIO interface runs at 6.4-GHz, while the XDR interface runs at half of that speed, or 3.2-GHz

Apparently FlexIO is the Interface between Chips, XDR Interface to RAM
 
The memory and processor bus interfaces designed by Rambus account for 90% of the Cell processor signal pins, providing an unprecedented aggregate processor I/O bandwidth of approximately 100 gigabytes-per-second.

100GB/s? Wow. Now we can hope for 512MB ;) But note: "The Rambus XDR memory interface, capable of data rates of 3.2GHz to 8.0GHz." The question is how much bandwidth will the PS3 CELL memory have.

PS3 is really shaping up to be a performance monster.

Small note: The 100GB/s is for I/O as noted above, the memory BW is ~50GB/s.

EDIT: NPL beat me. Thanks for the correction!
 
V3 said:
bad news, hm my brother just found out , they are available for download only AFTER the exhibition . his collegues will return with the book they had with all the details next week, he will copy the relevant pages for me. of maybe i can let him download it .. but then again, the news isn't that HOT anymore

That's still alright :)

But the papers are already handed out, so somebody can scanned that thing and give it to us ?
i will ask for scans , but he expect his collegeus return back monday... ....
 
i will ask for scans , but he expect his collegeus return back monday...

:D That'll be great :) I wouldn't complain if it is taken cell phone camera :LOL: As long as its readable ;)
 
ISSCC FACT SHEET FINAL 7 Feb 05.doc by courtesy of SCEE's press centre.
http://www.scee.presscentre.com/imagelibrary/detail.asp?MediaDetailsID=25555

ISSCC FACT SHEET FINAL 7 Feb 05.doc said:
CELL...bringing supercomputer power to everyday life with latest technology optimized for compute-intensive and broadband rich media applications

SUMMARY:

·Cell is a breakthrough architectural design -- featuring 8 Synergistic Processing Units (SPU) with Power-based core, with top clock speeds exceeding 4 GHz (as measured during initial laboratory testing).

·Cell is OS neutral - supporting multiple operating systems simultaneously

·Cell is a multicore chip comprising 8 SPUs and a 64-bit Power processor core capable of massive floating point processing

·Special circuit techniques, rules for modularity and reuse, customized clocking structures, and unique power and thermal management concepts were applied to optimize the design

CELL is a Multi-Core Architecture

·Contains 8 SPUs each containing a 128 entry 128-bit register file and 256KB Local Store

·Contains 64-bit Power ArchitectureTM with VMX that is a dual thread SMT design – views system memory as a 10-way coherent threaded machine

·2.5MB of on Chip memory (512KB L2 and 8 * 256KB)

·234 million transistors

·Prototype die size of 221mm2

·Fabricated with 90nanometer (nm) SOI process technology

·Cell is a modular architecture and floating point calculation capabilities can be adjusted by increasing or reducing the number of SPUs

CELL is a Broadband Architecture

·Compatible with 64b Power Architecture™

·SPU is a RISC architecture with SIMD organization and Local Store

·128+ concurrent transactions to memory per processor

·High speed internal element interconnect bus performing at 96B/cycle

CELL is a Real-Time Architecture

·Resource allocation (for Bandwidth Management)

·Locking caches (via Replacement Management Tables)

·Virtualization support with real time response characteristics across multiple operating systems running simultaneously

CELL is Security Enabled Architecture

·SPUs dynamically configurable as secure processors for flexible security programming

CELL is a Confluence of New Technologies

·Virtualization techniques to support conventional and real time applications

·Autonomic power management features

·Resource management for real time human interaction

·Smart memory flow controllers (DMA) to sustain bandwidth
 
Gubbi said:
SiBoy said:
Gubbi said:
So it uses an IOMMU to translate memory mappings for the SPU's DMA transfers. Is there any info on how many of these translations the IOMMU can serve per cycle (ie. how many DMA transfers can be started per cycle) ?

Only one started per cycle, 16 can be outstanding at any one time.

Sorry, I meant for the entire chip (all SPUs). Can it start eight per cycle ? or just one ?

If it's eight there's a fair bit of logic going into the DMA/switch engine.

Cheers
Gubbi

1 per cycle per SPU, or 8 per cycle total (8 CPU's), 128 total pending (per chip).

So yes, the interconnect network is big.
 
McFly said:
PiNkY said:
Intel released additional details on its Montecito chip. 1.72 Billion transistors at 90 nm. Small scale availability is 4Q05

A one PE cell has 234M transistors, so a 6 PE chip is more than possible. ;)

Fredi

Montecito is 600mm2 in 90nm, 1PE8SPU Cell is > 200mm2 in 90nm.

The 26.5MByte of cache in Montecito kind of throws your transistor comparison off :)
 
hahahahaha

Something is not right. Each CELL APU burns only 1 watt @ 0.9 V at 2 Ghz???? 11 watts at 5 Ghz??? If IBM had such technology, it can forget about making chips for a living, license that tech to Intel and make billions/year.

I will wait for the full set of slides posted to analyze CELL. Because this smells very fishy indeed.

http://www.electronicsweekly.co.uk/articles/article.asp?
liArticleID=38754&liArticleTypeID=1&liCategoryID=1&liChannelID=114&liFlavourID=1&sSearch=&nPage=1
"The busses connect to the SPEs through local memory, 256kbyte for each SPE. The developers have tested the memories to 5.4GHz at 1.3V and 52°C."

4-5+ Ghz was the SRAM speed and not the ALU speed.

Now it makes perfect sense, CELL ALUs run at 1/4th the clock of XDR input signal. In other world, that 4 Ghz input = 1 Ghz internal operating clock.

This is funny as hell. The whole processor industry used upcloking(internal clock is X times higher than input clock) since 486, SCEI is the first to use downclocking in recent history.

In other word, 4 Ghz XDR clock = 1 Ghz CELL ALU operating clock.

Now it makes perfect sense, CELL ALUs run at 1/4th the clock of XDR input signal. In other world, that 4 Ghz input = 1 Ghz internal operating clock.

I have seen no evidence that suggests that CELL really runs at 5 Ghz. In fact, SCEI's refusal to claim 256 GFLOPS in press release would suggest it does not. Kutaragi Ken is the kind of person who would do such a shameless thing if it was possible on paper, but even he does not do it.

All the transistor and thermal information on CPU core of CELL suggests it is indeed a sub 1.4 Ghz design. You will have to wait until the slides are posted at Japanese sites sometime tomorrow to make it official. IBM's own CPUs fail to clock past 2.5 Ghz, so why should I believe that a 5 Ghz processor exists???
 
Re: hahahahaha

AutomatedMech said:
long useless Post

Each 2.5x5.81mm SPE can issue two instructions per cycle to seven execution units using two pipelines. There is no out of order execution.

2 Instructions * 4 Floats * 4GHz = 32 GFlops.
8 of them makes 256... sounds familiar?
 
Re: hahahahaha

Npl said:
2 Instructions * 4 Floats * 4GHz = 32 GFlops.
8 of them makes 256... sounds familiar?
Your final number is right, but the calculation is, imho, incorrect.
Even if a SPU can issue 2 instructions per clock, the 8 ops per clock figure come out from FMADD-like instructions (4 muls + 4 adds).
The second pipeline is probably devoted to load/store, dma queues, branching, etc..
If we factor in even those operations we can inflate the 256 GFlops/s figure ;)

ciao,
Marco
 
Yeah I second what nAo said, second pipe probably doesn't do much beyond housekeeping... maybe the DIVs are issued there too

The 8FLOPS@cycle definately comes from FMADD type of instructions.
 
Re: hahahahaha

nAo said:
If we factor in even those operations we can inflate the 256 GFlops/s figure ;)

If Nvidia would make this chip, you could bet they would do that. ;)

Fredi
 
How am I wrong???? Kutaragi Ken and his goons are trying to pull the wool over your eyes all again. CELL has come in well below what it should be, but they must hurry to get it out so the 1-1.3 Ghz clock is going to have to be used for ALU.
 
@nAo: It just seemed so convenient, I stand corrected :)

Hmmm, on a second thought, are there different upper and lower Instructionsets, or cant 2 FMADD be issued at once, maybe FMADDs take 2 effecitive cycles? I mean we are talking about PR-FLOPS anyway.
 
Back
Top