Broadway specs

Nice find! The date and the paired single unit indicates that it's definitely the Broadway.

*keeps reading*
 
Is this really Broadway?

That die is tiny! ~16mm^2. The voltage seems high for some reason -- 1.15v on a pea sized 90nm die? I figured it'd be sub 1v... It doesn't seem like it'd matter in any way really, but it jumped out at me for some reason. Power consumption is kind of boggling when you compare it to PS3/X360 (~5 watts for Broadway). Wii looks like it could feasibly be a handheld... especially if it got shrunk to 65nm!

I don't imagine performance is going to be all that sexy when you have a chip the size of a pea. Don't know much about 750CXe/CX though, so I can't answer you're question really.
 
Last edited by a moderator:
Matt C's CL article that he had to take down for various reasons. It uses 5 watts during game usage at 729mhz from what I know. You could easily see how much less power it would use if it was at GC's speed. The handheld bit is something I can only wink to at this point but say it's a definite possibility. If you look at my post history here you can find me discussing more about the broadway and the changes to it compared to a stock CL, but things are looking up as time passes on. Hopefully next month or this we can get useable hollywood details from someone.
 
If this is indeed the chip, then it looks to be a straight die shrink. Note the cache sizes are identical: 64 KB Harvard L1, 256 KB unified L2. uarch looks identical.
 
You've stated a lot of things, but none of them are reflected in this datasheet.

The fact I've been stating it was based from 750CL not a CXe, FX, or GX when there was no real reason to choose it considering you couldn't find the specs up until this week is if big thing this datasheet reflects.

Die shrink

If that was the case how do you explain this quote by another forum member

""To clear things up: the top-of-the-line PPC 750, the 750GX, is the fastest 7xxx PowerPC. It's much faster than Apple's G4 (PPC 744x) at the same frequency. To call it an overclocked G3 would be quite an understatement. IBM's current 750-series PPCs even outperform G5s at the same clock speed, but don't feature 64bit integer processing. The 7xx series still is the mainline PowerPC family, the 8xxx series are special, embedded 7xx chips, and the 970 chips are nothing more than cheap POWER4 spinoffs, initially created for Apple (IBM prefers POWER4 and POWER5 over PPC970 for their own systems). And no, Gekko was no G3, it's only part of the same CPU family. But Gekko, aka PPC 750CXe, had SIMD units similar to Altivec/ VMX, the G3 only had a regular FPU."

Going from lower tier of the 750 class to the top is not a basic Die shrink.
 
Cool, so how does this compare with Gekko (on a per clock basis) and what more could they put in there to bost it to the 25mm^ like we saw in the other thread (just a jump to 512Kb?).


If this is indeed the chip, then it looks to be a straight die shrink. Note the cache sizes are identical: 64 KB Harvard L1, 256 KB unified L2. uarch looks identical.

A die shrink would make it at 11mm^ this does have and once the caches are the same then there should be some nice boost to the logic within the chip, now the big question is what is that logic doing there?
 
This chip's numbers fit well with what IBM has told us about Broadway. They said 20% power savings over Gekko. The 750CXe would draw about 7.3W at 485 MHz, and this chip draws 5.5W at 700 MHz. Bump that to 729 MHz and you'll be pretty close to 5.84W, which is 80% of 7.3W.

The only differences I see in the block diagrams are 1) DMA is added to the 60x bus, and 2) a "paired single unit" has been attached to the FPU. Sounds like SIMD to me. Within the text, they've added "quantization" to the Load/Store unit regarding conversion of floating-point internal format. The FPU has gained "paired single floating point arithmetic" and "data quantization support." Now, I know that quantization is used for analyzing waves, could this be specialized hardware for reading the Wiimote's analog output (I doubt this as it would make more sense for the Wiimote to do this, then transmit it digitally). I don't know what a BAT is, but the CL has double those of the CXe (instruction and data). In one datasheet they claim that 2^52 is 4 exabytes, in the other, 4 petabytes.

I just went back and watched the IBM video. The logic used to find the die size appears to be sound (though I believe it is an underestimation). Each 300mm wafer is divided into 8 sections across, and each section is 7 cores wide. So Broadway is about 5mm on a side, yielding a 25mm^2 die size. So if this chip is only 16mm^2 then what occupies that additional space? Even with 512KB of L2 you're probably looking at 22mm^2.

I really believe that Nintendo intends to go portable with this design, rather soon. Why else stress such a low power design?
 
""To clear things up: the top-of-the-line PPC 750, the 750GX, is the fastest 7xxx PowerPC. It's much faster than Apple's G4 (PPC 744x) at the same frequency. To call it an overclocked G3 would be quite an understatement. IBM's current 750-series PPCs even outperform G5s at the same clock speed, but don't feature 64bit integer processing. The 7xx series still is the mainline PowerPC family, the 8xxx series are special, embedded 7xx chips, and the 970 chips are nothing more than cheap POWER4 spinoffs, initially created for Apple (IBM prefers POWER4 and POWER5 over PPC970 for their own systems). And no, Gekko was no G3, it's only part of the same CPU family. But Gekko, aka PPC 750CXe, had SIMD units similar to Altivec/ VMX, the G3 only had a regular FPU."

I'd call it a bit of a crock...

There are very few code mixes that a 750GX will beat a 7448 at the same clock. For starters if the 7448 is sitting on an MPX bus (most likely) instead of a 60x bus, it already has the advantage of no idle cycles between tenures, support for more outstanding transactions, out of order transactions, better streaming support, split transactions, enveloped transactions, as well as support for a shared state for cache coherency in MP setups. The 7448 has more scalar integer execution resources, deeper instruction and completion queues, more rename buffers, and a more efficient FPU (no additional latency for double precision madd, and can source operands from rename buffers). That's not even bringing AltiVec into the equation.

A 970 even at the same clock can also handily stomp a 750GX in many instances (particularly floating point), unless the code mix is very latency sensitive or fairly branchy, but yeah in the 1GHz ballpark it's not too hard for a 750GX or 7448 to put a bit of smack down on the 970 in various integer code mixes. However the 970 *is* designed for higher clock domains than either processor so it's a bit of give or take. And IBM may prefer Power5s nowadays, but that hasn't stopped them from marketing 970 blade systems either (e.g. JS20).

Now Gekko did have some nice provisions for streaming data to it's L1 via DMA and bursting writes. It also had support for some minimal SIMD operations, however it did not have any SIMD execution units and in no way should really be equated with AltiVec. IBM simply did what many in embedded space do and extended the FPU to support packed operands in a scalar execution unit (in the case of Gekko packed single-precision floats). This support isn't even in the same class as AltiVec, which is not only far more flexible and applicable in more use cases, but also retains it's own execution resources instead of overlaying the FPU.

Now, I know that quantization is used for analyzing waves, could this be specialized hardware for reading the Wiimote's analog output (I doubt this as it would make more sense for the Wiimote to do this, then transmit it digitally).

No, it's for packing and unpacking data. Basically quantizing floats down into smaller fixed point representations to improve data throughput. Very similar to what the VIFs were for in the EE...
 
Last edited by a moderator:
i generally agree with archie here, i.e. without being a seasoned ppc coder and largely from reading the CL specs (thanks, theafu), i can't see a 75x beating per clock a motorola G4 - definitely not at the average statistical case. *but* with that said, we should not forget that a G4 (say, my favourite 7447) has considerably higher power dissipation than broadway - at the comparable 867MHz it's 8.3W typical, 11.5W max. vs. ~6W max for 750CL at ~750MHz. actually i'm well impressed of the latter!

No, it's for packing and unpacking data. Basically quantizing floats down into smaller fixed point representations to improve data throughput. Very similar to what the VIFs were for in the EE...

yes. the nice thing here is that those packed formats should be direclty usable by flipper, hence by hollywood, and furthermore be nicely DMA-deliverable directly from L1 cache down to flipper's reach. and it's not difficult to see where xenon/xenos' creators got some of their good ideas from. i mean, if that was a form of flattery, the little cube should be blushing all over right now ; )
 
Last edited by a moderator:
It looks as if we finally got something solid about Wii. :p

So the current suspect is a 750CL + more L2 cache + some alterations?
 
Back
Top