CELL configuration revisited....

...

cthellis42

If they are trying to reach a late-2005, early-2006 launch, Power5 could not remotely appear on a consumer device.
Remove L2 cache & L3 look-up table and a 100 mm2 die size is possible on a 90 nm process. Having a 4-way SMT means it will not suffer much from cache miss latency. I am not sure if IBM will include Altivec or not, but I am not expecting them to.(Power5 already has excellent FPUs and will not gain much from Altivec, PPC970 included one for backward compatibility reasons)

To Paul

Two words: Embedded DRAM.
1. I don't think there is any room of eDRAM on a 4 PE version.
2. The inefficiency of a message passing architecture like CELL stems from the coding difficulty. Having eDRAM won't solve the problem.


Infact, some of the worlds smallest. God knows what they have today.
Still much larger than dedicated DRAM cells. And DRAM chips aren't that small.
 
1. I don't think there is any room of eDRAM on a 4 PE version.
2. The inefficiency of a message passing architecture like CELL stems from the coding difficulty. Having eDRAM won't solve the problem.

SCEI fully wants DRAM on Cell.

You can see this from SCE Cell patent and this right here from June.

The cell will be an essential technology for the company's system-on-chip designs and will make it possible to integrate larger DRAM cells with the Cell processor, a joint development project of IBM, Sony and Toshiba targeting teraflops performance.

Granted yes, e-DRAM will not solve any issues regarding efficiency if coding Cell is a nightmare. But I dont' think it will be the doomsday you seem to think it will be.[/i]
 
I still find your transistor estimates far too conservative, Deadmeat.

Given what has been achieved to date, your numbers seem far lower than what they should be.
 
MfA said:
Dont expect miracles, look at the Opteron.

Look who had to help finish the design of Opteron - which is really overrated anyway... AMD isn't what I'd consider a powerhouse by any means, they should be thankful for IBM.

IBM has done studies on SOI which have tried to minimize the inherient differential between an SOI and Bulk design and the results were quite impressive. It was on 180nm IIRC, I can look for the report later if you can't find it.
 
DMGA said:
The best indicator of clockspeed/pipeline length is Xscale, which manages 1 Ghz @ 7-stage pipes. And no one has a better fab cabability than Intel does.

IBM's 750GX manages 1.1GHz on a 0.13um, 4-stage design...

DMGA said:
Nope. You just can't rev very fast with a short pipe design

Actually it's getting easier and easier these days. Process naturally makes faster switching speeds easier and with several of the more exotic technologies that are working their way more into MPUs from mixed signals, building complex, fast, short stage designs is getting easier. The main problem these days is leakage and wire delay...
 
nondescript said:
PC-Engine said:
UIUC has demonstrated a 509Ghz transistor

...and Intel and AMD has demonstrated THz transistors, however, put hundreds of millions of them on a die and they won't be running at that speed ;)

Um, the UIUC transistor is current record-holder. No, Intel and AMD do not have THz transistors. Intel is trying, as you can see here. They have designs and plans, but they have not demonstrated it. Your statement is false.

More info on UIUC: http://www.news.uiuc.edu/scitips/03/1106feng.html

I never said that they could make a chip out of these transistors, I was using this to show that we are far from the constraints of the laws of physics.

Were they running Linpack to get that percentage?

Probably.

AFAIK the ES's efficiency drops to 65% with realworld apps. That 86% figure is from the Linpack benchmark. I'm guess that G5 supercomputer's 58% is from Linpack also therefore running real apps would further drop that down to 20-30% efficiency.

Link please? Or did you make this up too?

Yes I made this stuff up. I even wrote two articles about it at EETimes :LOL:

http://www.eetimes.com/story/OEG20011206S0025

...AMD revealed it has developed a CMOS-based, 15nm gate length transistor handling switching speeds of 3.33THz. Intel recently claimed to have broken the record for the world's smallest and fastest transistor, with a 15nm device that could cope with 2.63 trillion switches per second.


The system is measured at 87.5 percent efficiency, providing 35 teraflops on the Linpack benchmark, but it actually delivers from 14 to 26 teraflops at 38 to 66 percent efficiency in real-world applications, Watanabe reported. That's still far beyond efficiency ratings as low as 15 percent for many of today's supercomputers, he said.

http://www.eetimes.com/issue/mn/OEG20030825S0020

Hmm...38-66% efficiency for the clean sheet designed ES. So what efficiency are we looking at now for the off the shelf G5 supercomputer?
 
...

IBM's 750GX manages 1.1GHz on a 0.13um, 4-stage design...

1. It is a 5-stage design, not 4.
2. G3 design has been optimized and tuned for almost 8 years now.(Since PPC603 days) In other word, it is an architecture at the end of its life cycle and 1.1 Ghz is the top rating IBM could extract; the majority yields below 1 Ghz.
3. Compare 750GX to 12-stage design 970; 970 already hits 2 Ghz on same fab process, and it's still in its early life-cycle.

Unlike PC and workstation processors, PSX3 cannot afford to have varying clockspeed rating for CELL; it is fixed from the beginning and will stay fixed until the end. Sony has to go with a clockspeed rating that will give them an acceptable yield from the first batch of wafers, or they do not obtain enough chips to make the launch. This forces SCEI to settle for a lower clockspeed.
 
Re: ...

DeadmeatGA said:
IBM's 750GX manages 1.1GHz on a 0.13um, 4-stage design...

1. It is a 5-stage design, not 4.
2. G3 design has been optimized and tuned for almost 8 years now.(Since PPC603 days) In other word, it is an architecture at the end of its life cycle and 1.1 Ghz is the top rating IBM could extract; the majority yields below 1 Ghz.
3. Compare 750GX to 12-stage design 970; 970 already hits 2 Ghz on same fab process, and it's still in its early life-cycle.

Unlike PC and workstation processors, PSX3 cannot afford to have varying clockspeed rating for CELL; it is fixed from the beginning and will stay fixed until the end. Sony has to go with a clockspeed rating that will give them an acceptable yield from the first batch of wafers, or they do not obtain enough chips to make the launch. This forces SCEI to settle for a lower clockspeed.

You did say it was not possible though and the real world disproved you. You compared to XScale (a low power design by default) as well.

These are just examples of what is possible in the real world.. ya know the physical real world rather than the one made of your dreams..I could give ya directions if ya wants.. ;)

It is PS3 not PSX3. ;)
 
Vince said:
Look who had to help finish the design of Opteron - which is really overrated anyway... AMD isn't what I'd consider a powerhouse by any means, they should be thankful for IBM.

Doesnt really matter much, it is about the ratio.

Compared to not using SOI at 65 nm it might provide larger gains because of static power, but the existing trend of increasing power consumption per mm2 was driven by dynamic power where it would give what ... ~35% savings? A huge amount, but not miraculous. For the existing problem it's just a small kink in the trend.
 
DMGA said:
1. It is a 5-stage design, not 4.

4. The 750s have all have had the same execution pipeline. Fetch, Dispatch, Execute, Write back. 3 stages for the FPU.

DMGA said:
2. G3 design has been optimized and tuned for almost 8 years now.(Since PPC603 days) In other word, it is an architecture at the end of its life cycle and 1.1 Ghz is the top rating IBM could extract; the majority yields below 1 Ghz.

6 years.. And the 603 is not a G3/750 core... And IBM is still developing it (the next core possibly achieving anywhere from 1.2-1.5GHz)

DMGA said:
3. Compare 750GX to 12-stage design 970; 970 already hits 2 Ghz on same fab process, and it's still in its early life-cycle.

The GigaProcessor core in the 970 didn't debut on the 970 it debuted on Power4 (which is 4 years old) and extended 2 stages to accomodate dispatch groups with AltiVec)...
 
PC-Engine said:
Yes I made this stuff up. I even wrote two articles about it at EETimes :LOL:

http://www.eetimes.com/story/OEG20011206S0025

...AMD revealed it has developed a CMOS-based, 15nm gate length transistor handling switching speeds of 3.33THz. Intel recently claimed to have broken the record for the world's smallest and fastest transistor, with a 15nm device that could cope with 2.63 trillion switches per second.


The system is measured at 87.5 percent efficiency, providing 35 teraflops on the Linpack benchmark, but it actually delivers from 14 to 26 teraflops at 38 to 66 percent efficiency in real-world applications, Watanabe reported. That's still far beyond efficiency ratings as low as 15 percent for many of today's supercomputers, he said.

http://www.eetimes.com/issue/mn/OEG20030825S0020

Hmm...38-66% efficiency for the clean sheet designed ES. So what efficiency are we looking at now for the off the shelf G5 supercomputer?

Close. But read the original press release:

http://www.amd.com/us-en/Corporate/VirtualPressRoom/0,,51_104_543_4493~13001,00.html

The 15-nm transistor, devised in AMD's Submicron Development Center, is a CMOS-based, 0.8-Volt device, designed to handle switching speeds of 0.3-ps, or 3.33 trillion switches per second. The development of the 15-nm transistor is a powerful indicator that transistor scaling will continue unabated for many years to come.

Designed, not demonstrated. They've developed a 15-nm gate length transistor, and they expect that it can reach 3.3THz, but it hasn't been actually tested. Until they do (and succeed), UIUC is the record holder. I'm pretty sure if AMD came out with a 3.3THz transistor, I would have read it in Applied Physics Letters. Maybe I'm splitting hairs here, but anyways, the point was to show that physics is not the barrier here. Kudos to AMD if they actually get a 3.3THz transistor.

Thanks for the Linpack link tho.
 
Re: ...

DeadmeatGA said:
Remove L2 cache & L3 look-up table and a 100 mm2 die size is possible on a 90 nm process. Having a 4-way SMT means it will not suffer much from cache miss latency. I am not sure if IBM will include Altivec or not, but I am not expecting them to.(Power5 already has excellent FPUs and will not gain much from Altivec, PPC970 included one for backward compatibility reasons)

Exactly. Downscaled versions of modern tech is what I expect. (R500 won't be precisely old either, after all.) "Power5+ dual-core" comments make me boggle. ;) Since IBM is planning this chip as well for a long run, I trust them enough to make the trade-offs they need to keep things peppy without busting a nut on price. Will be cool to see eventually what they end up with. It's rather looking like MS plans to take a bit hit on hardware costs again, though. Hehe...
 
Re: ...

DeadmeatGA said:
Unlike PC and workstation processors, PSX3 cannot afford to have varying clockspeed rating for CELL; it is fixed from the beginning and will stay fixed until the end. Sony has to go with a clockspeed rating that will give them an acceptable yield from the first batch of wafers, or they do not obtain enough chips to make the launch. This forces SCEI to settle for a lower clockspeed.

Depends what is acceptable yeld that leads to enough chips for the launch: this also depends on the launch date.

Toshiba's Oita #2 fab is going to be mass-producing in mid-to-late 2004 and Nagasaki #2 fab is going to be mass-producing in mid 2005.

For a Japanese ( maybe Japan + North America ) launch in Q4 2005 this means there is some time to ramp up the yelds and produce enough chips for the console's launch.

What if 2 GHz gives them a high enough yeld to be able to launch by Q4 2005 ?

You cannot exclude it would as you do not know the fiugures for 2 GHz or 1 GHz.

I expectg them to have e-DRAM as a 45 nm die-shrink would allow them to cut the costs massively ( using a kind of capacitor-less e-DRAM cell... much smaller e-DRAM cell than any bulk-CMOS e-DRAM cell at 45 nm ) if the 65 nm chip was using e-DRAM.

Either that or they can use 128 bits, 6.4 GHz ( signalling rate, 800 MHz external clock ) XDR that achieves 100+ GB/s, which does not sound bad.
 
Back
Top