On the Feasibility of the Broadband Engine

And how do you know this Deadmeat? Are you making assumptions or do you know for a fact after having gotten information on CELL before everyone else? If you are making an assumption please express that as an opinion. I think if you did this from now on it would make a lot of discussions go a lot smoother.
 
32 (superscalar!?) FPUs? What am I missing?
I thought a BE includes 128 FMACs...(4 FMACs x 32 APUs)
 
I think if they decide to go for a 2006 launch, depending on how easy their move from 65-45 really is, they might be able to use the latter process. With multi-gate, son, high-ks, etc... power issues should be sufficiently dealt with, no? Even in 65nm some of these techs will be present.

Toshiba has discovered a new dielectric material, called Nitrided Halfnium silicate. This material, the company claims, only allows 1/1000th the leakage of silicon dioxide.[to be used in 65nm]
If true that is an order of magnitude higher than intel's secret high-k material which achieves 100 times leakage reduction of silicon dioxide.
 
32 (superscalar!?) FPUs? What am I missing?
I thought a BE includes 128 FMACs...(4 FMACs x 32 APUs)

The patent said 4 Floating point units and 4 Integer units. It can turn out to be FMACs I suppose, but I'll treat it as FPUs for the time being.

Yes, but at times you might wish to leave the Transistor's width a little bit bigger and sometimes you might not to be able to scale wires as well as you would want. It depends on the design IMHO.

That maybe, but we don't know. If you want to beat Moore's Law, you need to at least kept it, as your assumption.
 
V3 said:
The patent said 4 Floating point units and 4 Integer units. It can turn out to be FMACs I suppose, but I'll treat it as FPUs for the time being.
FMACs or FPUs doesn't matter (at this time..). Why 32 instead of 128?
An APU operates on a 4-vector AFAIK.

ciao,
Marco
 
FMACs or FPUs doesn't matter (at this time..). Why 32 instead of 128?
An APU operates on a 4-vector AFAIK.

Hmm, you're right.

So, next we need to factor in the APUs, of which there’s a plurality (32, 8 per Power440) and which contain 4 FPUs, 4 FXUs, Registers, and all these assorted things we’ll get to in a bit. First let’s start with the FPUs.

There are 32 FPUs, of which the above type should suffice as they’ll yield the necessary performance and are a good rough indicator of IBM’s microarchitecture. So, 32 FPUs will yield 42.75mm2 in necessary area – bringing the grand total upto 67.65mm2 utilized, with 220.35mm2 of the area left.

I think Vince made a mistake, but I don't know he needs to answer that I guess.

If that's a mistake, than I think he meant 32 FPUs per Power440. Than you need to multiply that area by 4. And so is the area for the Interger units. And it won't be sub 300 mm2 chip anymore.
 
¿Could be a good idea the use of VMX/Altivec with some modifications for the APU?

I say it because it because an with 3-StagePipeline doing FMAC operations could be a good idea, and VMX is only 4 milion transistors with 4 Vint/op and 4 Vfp/op

PD: Sorry for my english
 
Whoa, everybody missed that. Good catch nAo! Thanks.

V3 said:
If that's a mistake, than I think he meant 32 FPUs per Power440. Than you need to multiply that area by 4. And so is the area for the Interger units. And it won't be sub 300 mm2 chip anymore.

Actually, realistically, it will still be around ~300mm2 with it. Because just thinking about it off hand, almost ~200mm2 of the IC is logic - logic which I scaled very conservatively. That further is composed of ~100mm2 of FPU/FXU logic. Scale it all correctly and add in 3X the FPU/FXUs you'll be ~50-60mm2 over.

And before you eat into my slush-buffer, find out the size of an actual FXU. I question if it's going to be 150% the size of a FPU and if not then you might actually be ahead slightly... relatively speaking.


Play around with it and see what you get. I would but its crunch time around here and I shouldn't even be posting as it is. I'll be back tonight when I'm off. Truth-be-told this is a much better fit, it would be close and that's how it should be.

EDIT: Yeah, I'm a dumbass. Yell at me later. Sorry about that... :)
 
I'll redo calculation quickly

Four PowerPC 440 with cache and FPU: 24.9mm2

128 FPUs : 128* 1.33mm2 = 170.24 mm2

128 FXU : 150%*170.24mm2 = 255.36 mm2

SRAM + Registers: 20.448 mm2

32MB eDRAM: 29.528 mm2

Before even considering anything else the total come to around 500mm2

If this is true, they'll probably go the MCM route. With each PE around 140 mm2. Sounds more plausible :?:
 
V3, scale the numbers theoretically for the logic - like how Entropy and you mentioned earlier. That's why I built in the conservative numbers - because I know I can be a retard. You'll cut the die size down really well and be hoovering around ~300mm2 if my mind isn't playing with me again. And if someone can find an FXU for an example, that would be tits.

Watch, some ass is going to report me being on here and UChicago will find out that they pay me to sit in a lab, talk on B3D and play Go... hehe.
 
If APUs are on the same route of PS2 VUs a single APU's FPU would be smaller (no superscalar execution, no full IEEE compliancy, no fancy other stuff.. ) than the FPU selected by Vince.
 
nAo said:
If APUs are on the same route of PS2 VUs a single APU's FPU would be smaller (no superscalar execution, no full IEEE compliancy, no fancy other stuff.. ) than the FPU selected by Vince.

Sounds reasonable.
 
Whoa, everybody missed that. Good catch nAo! Thanks.

At first, I read it as 1.33 mm^2 for 4 FPUs. Since your scaling from 180nm to 65nm is rather weird if it only one FPUs. That's why I only question your Power440 scaling.

So everything is find and dandy, I just brisked through the rest. Didn't realised what you did, until nAo pointed it out.

Actually, realistically, it will still be around ~300mm2 with it. Because just thinking about it off hand, almost ~200mm2 of the IC is logic - logic which I scaled very conservatively. That further is composed of ~100mm2 of FPU/FXU logic. Scale it all correctly and add in 3X the FPU/FXUs you'll be ~50-60mm2 over.

So you're being too conservative now ?

And before you eat into my slush-buffer, find out the size of an actual FXU. I question if it's going to be 150% the size of a FPU and if not then you might actually be ahead slightly... relatively speaking.

Its probably larger, but that sound reasonable.

Just do your scaling properly and take into account the space in between those units, and you should get a sub 400mm2 chip.
 
nAo said:
If APUs are on the same route of PS2 VUs a single APU's FPU would be smaller (no superscalar execution, no full IEEE compliancy, no fancy other stuff.. ) than the FPU selected by Vince.

APU has 4 FP Units and 4 FX Units: as nAo is saying, both the FX and FP units would be smaller than the PowerPC 440 FPU and FXU Vince provided.

The PowerPC 440 FPU is an out-of-order design and the FXUs are handled by out-of-order logic with register renaming and all other neat thingies.

The complexity of the FP and FX units in the APUs is going to be considerably lower ( probably no branch prediction, static scheduling, etc... ).

Likely you could have 4 FMACs, 1 FDIV and 4 basic 32 bits iALUs or we could have 4 units that can do either FP or FX calculations to save even more area.

Taking the FPU Vince showed from the PowerPC 440 and the iALU that was calculated and multiplying both by 32 and adding the results up is quite incorrect.
 
A note about feature size scaling. It takes place in three dimensions and they do not all scale at the same rate -- I believe Russ can confirm this.
 
Panajev2001a said:
Likely you could have 4 FMACs, 1 FDIV and 4 basic 32 bits iALUs or we could have 4 units that can do either FP or FX calculations to save even more area.

I'd think that they'll do completely without FDIV in the APUs. Instead they will have a reciprocal estimate instruction (doing 4 estimates in parallel) which you then can refine to the desired precision with Newton-Raphson. This has the added bonus that it can be pipelined.

Also I think they'll build the Floating point hardware so that they can just push integers through as denormalized floating point values - saves transistors on execution units.

Cheers
Gubbi
 
Gubbi said:
Panajev2001a said:
Likely you could have 4 FMACs, 1 FDIV and 4 basic 32 bits iALUs or we could have 4 units that can do either FP or FX calculations to save even more area.

I'd think that they'll do completely without FDIV in the APUs. Instead they will have a reciprocal estimate instruction (doing 4 estimates in parallel) which you then can refine to the desired precision with Newton-Raphson. This has the added bonus that it can be pipelined.

Also I think they'll build the Floating point hardware so that they can just push integers through as denormalized floating point values - saves transistors on execution units.

Cheers
Gubbi

or we could have 4 units that can do either FP or FX calculations to save even more area.

;)

Nice idea about the parallel estimate operation with Newton-Raphsod :).
 
Back
Top