Wii U hardware discussion and investigation *rename

Status
Not open for further replies.
The 750 gx and fx curb stomp the cle (broadway). It is a much, MUCH larger improvement than bobcat to jaguar, even on the same process size, and at the same clock as the 750 cx/cle.

Could you name some of these big improvements? According to the technical summary (https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/BECF98824B9B663287256BCA00587B22/$file/750FX_Technical_Summary_DD2.X_V1.0_prel28May02.pdf) the improvements from the original 750 and its shrinks are:

1) More L2 cache
2) External bus pipelining
3) An extra outstanding L1 miss
4) an extra FPU reservation station, and faster reciprocal estimations.

This is at best comparable to the changes made with Jaguar, which off the top of my head include widening the L2 cache interface, more/shared L2 cache, deeper OoO window, better divider, deeper load/store queues, and 128-bit SIMD (widened integer too, not just SIMD) with a lot of improved timings (latency, not just throughput), and a bunch of support for new instructions.

GX then adds exactly two performance improvements (https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/5A61BEB893287FF987256D650066CFD5/$file/PPC750GX_diff_v1.0_080604.pdf): more outstanding L2 misses (from 1 to 3-4) and more/higher associative L2 cache.

These are not exactly earth shattering changes.

And Broadway is not just a shrunk vanilla 750 itself. It actually has two reservation units in front of its FPU vs one on FX and GX, and has paired singles. So FX and GX would have even worse FPU performance, hardly what Nintendo should have gone for.

I'm skeptical about all this hype about huge perf/MHz gains in the 750 family, especially when the people posting it aren't actually referencing the changes. There's a reason these processors are called 750xx instead of being made part of a new processor family.

espresso does not have 3 cle (broadway) cores. Those are 750 Gx's with nintendos custom extensions. Big difference. Huge difference, and no, its not just cache size.

Do you have a single piece of credible evidence for that claim? Spending the money to update a different old processors like GX to fit Broadway's specifications instead of going for a substantially better processor doesn't make sense. The entire reason Nintendo would be so interested in using processors that are functionally as close as possible is likely a paranoid fear of compatibility problems.

But its a big enough difference for it to be just as silly thinking its just a tricore arthur 750 (the twelve year old processor you were talking about, might as well call jaguar or an icore or any x86 platform a 1978 processor since its based off the 8086. Its a silly practice.) Or even broadway.

That comparison is outrageous. The original 750 was released in 1997, the FX in 2002 (5 years later) and GX in 2004 (7 years later). That's absolutely nothing like the difference in time between 8086 and Jaguar (35 years!). And just because FX and GX are substantially newer doesn't mean that they're dramatically better. You can't just say that because processors X and Y over a time span from some other architecture are very different that any given two processors on a different architecture over a similar time span will be just as different. There are embedded cores that have stayed almost the same for decades. Alongside FX and GX there were much more modified PPC cores; this is what market segmentation/product differentiation is about.
 
There's also the mysterious Mojave 750VX that was reported finished at the design level, but never produced. Of course Majave reportedly had VMX which the Wii U doesn't seem to have so it couldn't have directly served as the base for Espresso.
 
There are about five years between Tualatin (last Pentium III) and Pentium Pro yet I have no trouble saying it's the same basic design.
Or what about buying a PC 20 years ago, for good bang for the buck you could buy a 386 DX40 even though the first 386 was made many years before.
 
At one point my pair of Voodoo2 seemed to care it was in 2002, games would get pissy (I STILL am not happy about Microsoft dropping support for 3D-only GPUs in DirectX 8/9 games)
 
At one point my pair of Voodoo2 seemed to care it was in 2002, games would get pissy (I STILL am not happy about Microsoft dropping support for 3D-only GPUs in DirectX 8/9 games)

I bought a Radeon hardware-T&L-less DX 7 card in 2001 from a friend. I didn't know it was T&L less till I got home. Performance was so bad I had to fall back on my 1998 Voodoo 2 to play games (it was a lot faster!)

After that I never made a snap decision about PC hardware ever again.
 
Could you name some of these big improvements? According to the technical summary (https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/BECF98824B9B663287256BCA00587B22/$file/750FX_Technical_Summary_DD2.X_V1.0_prel28May02.pdf) the improvements from the original 750 and its shrinks are:

1) More L2 cache
2) External bus pipelining
3) An extra outstanding L1 miss
4) an extra FPU reservation station, and faster reciprocal estimations.

This is at best comparable to the changes made with Jaguar, which off the top of my head include widening the L2 cache interface, more/shared L2 cache, deeper OoO window, better divider, deeper load/store queues, and 128-bit SIMD (widened integer too, not just SIMD) with a lot of improved timings (latency, not just throughput), and a bunch of support for new instructions.

GX then adds exactly two performance improvements (https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/5A61BEB893287FF987256D650066CFD5/$file/PPC750GX_diff_v1.0_080604.pdf): more outstanding L2 misses (from 1 to 3-4) and more/higher associative L2 cache.

These are not exactly earth shattering changes.

And Broadway is not just a shrunk vanilla 750 itself. It actually has two reservation units in front of its FPU vs one on FX and GX, and has paired singles. So FX and GX would have even worse FPU performance, hardly what Nintendo should have gone for.

I'm skeptical about all this hype about huge perf/MHz gains in the 750 family, especially when the people posting it aren't actually referencing the changes. There's a reason these processors are called 750xx instead of being made part of a new processor family.



Do you have a single piece of credible evidence for that claim? Spending the money to update a different old processors like GX to fit Broadway's specifications instead of going for a substantially better processor doesn't make sense. The entire reason Nintendo would be so interested in using processors that are functionally as close as possible is likely a paranoid fear of compatibility problems.



That comparison is outrageous. The original 750 was released in 1997, the FX in 2002 (5 years later) and GX in 2004 (7 years later). That's absolutely nothing like the difference in time between 8086 and Jaguar (35 years!). And just because FX and GX are substantially newer doesn't mean that they're dramatically better. You can't just say that because processors X and Y over a time span from some other architecture are very different that any given two processors on a different architecture over a similar time span will be just as different. There are embedded cores that have stayed almost the same for decades. Alongside FX and GX there were much more modified PPC cores; this is what market segmentation/product differentiation is about.

1. Actually that covers a lot of it. Theres a bit more like added physical registers but yeah. Thats pretty much the gist of it. You convinced me you read the document. If you have that information I take it you saw the 35% performance improvement listed by IBM. Thats all at 90nm. Espresso also went down to 45nm and clock speeds been nearly doubled, jaguars went up what did it say? 10%? And its increased cache is shared by all 4 cores.

Im still pretty confident in my assertion its the 750 series that saw the notably bigger increase in performance over its predecessor.

2. Being that weve already been talking about paired singles extensively trying to bring up this point in this manner is.... in poor taste. This is implied so strongly it doesnt need to be brought up. A 750gx for nintendo systems will have nintendos custom extensions the stock base model wont.

3. Theres a reason a jaguar isnt called a piledriver. The point still stands, an arthur is not comparable to a gx certainly not the customized one nintendo is using. Neither is broadway to an espresso core.

4. Yes, the parody assertation is of course more ridiculous than the actual situation. I was expecting going back to the 70's would make that overly obvious. I exaggerated.

5. I have enough evidence to personally believe its not a cle/broadway. Fail overflow have found additional registers which dont exist on cx, cle, broadway, but do on Gx. The CXe fx and gx are the only 750's ibm has ever claimed could even support smp (though not well at all, espresso is obviously nintendo customized here). 750Fx and Gx are pin compatable. Nothing else in the line is with them, because the changes, most notably the large increase in cache changed the footprint and package. How much cache does espresso cores have? Fx and Gx are also highly code compatable with previous 750's, you dont even need to recompile code except in specific circumstances.

You are going to need to provide more than 'nuh uh' for me, or we can just agree to disagree.

Are you all right? You seem... Angry? Vested?

I dont really care if you think wii u's a peice of crap. Its okay. Its kind of the only reason i'm interested in it on this level. Look at the size of it, its not a powerful machine. I just want to know what this particular peice of crap can do.
 
Last edited by a moderator:
1. Actually that covers a lot of it. Theres a bit more like added physical registers but yeah. Thats pretty much the gist of it. You convinced me you read the document. If you have that information I take it you saw the 35% performance improvement listed by IBM. Thats all at 90nm. Espresso also went down to 45nm and clock speeds been nearly doubled, jaguars went up what did it say? 10%? And its increased cache is shared by all 4 cores.

I didn't see that 35% number, but I take it that was with an increased clock speed, I'm very skeptical that it could get such a high typical IPC improvement with only these changes. I think the highest confirmed clock speed increase for Jaguar over Bobcat is 1.8GHz to 2GHz. or 11.1%.

This discussion is very much uarch or perf/MHz oriented, since we already know the MHz but you're saying that the expectations of perf/MHz are too low because we have the uarch wrong.

Why are we talking about Bobcat vs Jaguar again? Because I said that darkblu's tests were on Bobcat but would be a lot faster on Jaguar? His test was nothing but float matrix multiplications. For this particular test Jaguar would make a huge difference thanks to double the SIMD width and from newer instructions like broadcasts. The more typical performance differences aren't pertinent.

Im still pretty confident in my assertion its the 750 series that saw the notably bigger increase in performance over its predecessor.

Sure, if you look at clock scaling, because back then process improvements opened up a lot more headroom. Bobcat started out a lot later in the process game.

But in terms of uarch improvements they weren't larger.

2. Being that weve already been talking about paired singles extensively trying to bring up this point in this manner is.... in poor taste. This is implied so strongly it doesnt need to be brought up. A 750gx for nintendo systems will have nintendos custom extensions the stock base model wont.

But the differences between an original 750 and a 750GX are barely larger than the differences between an original 750 and Broadway. Some new CPU combining the best of both would be some totally different thing. Like I said it's not just paired singles, it has more FPU reordering.

3. Theres a reason a jaguar isnt called a piledriver. The point still stands, an arthur is not comparable to a gx certainly not the customized one nintendo is using. Neither is broadway to an espresso core.

I don't understand the point you're trying to make with that Piledriver comment. I think that IBM/Motorola/Freescale/whoever making new processors that kept the 750 in the name did actually imply something about the level of changes present.

5. I have enough evidence to personally believe its not a cle/broadway. Fail overflow have found additional registers which dont exist on cx, cle, broadway, but do on Gx. The CXe fx and gx are the only 750's ibm has ever claimed could even support smp (though not well at all, espresso is obviously nintendo customized here). 750Fx and Gx are pin compatable. Nothing else in the line is with them, because the changes, most notably the large increase in cache changed the footprint and package. How much cache does espresso cores have? Fx and Gx are also highly code compatable with previous 750's, you dont even need to recompile code except in specific circumstances.

Please provide some kind of link for this. I don't see how modifying Broadway for some/better SMP is necessarily more invasive than modifying a 750GX to be compatible with Broadway's execution resources. Espresso cores have different amounts of cache, I don't think that really has anything to do with GX vs Broadway; neither would be compatible with Espresso's L2 cache as is.

BTW, Broadway supports cache coherency via a sub-par protocol (lacks exclusive state), so it was designed for multiple devices to share the same memory which is the main component of SMP. As far as I can tell this is the same as FX and GX's coherency support. Hopefully Wii U doesn't use that coherency protocol.

Sure, the newer 750 processors are code compatible, as I'm sure some other PPC CPUs are. What I was really referring to is timing compatibility. Not that I think Nintendo has that much to worry about but they've been pretty paranoid about this in the past Technically it's possible, albeit rare, for even Gamecube/Wii era code to rely on timings (most likely accidentally) and fail noticeably if the timings change a lot. On the other hand, I doubt Espresso is really that close to timing compatible with Broadway given the different L2 cache type.

EDIT: I've been googling around and I can't find anything from fail0verflow that talks about Wii U using 750FX or 750GX. He does say that the coherency initialization registers were present on Broadway:

http://fail0verflow.com/blog/2013/espresso.html

And perhaps more interestingly, he has no problem himself equating Wii U's performance with 3 1.2GHz Gecko or even Arthur cores; this is from the comments (which I seem to be incapable of linking to directly so I'll just paste it):

fail0verflow said:
Yes, a 1.6GHz quad-core Cortex-A9 with NEON from ~2010 beats a 1.2GHz tri-core PowerPC 750 with paired singles from ~1997 or 2001 (depending on whether you count the PS or not). The PPC750 is a nice core and has lasted long (and beats crap like the Cell PPU and the 360's cores clock-per-clock on integer workloads), but sorry, contemporary mobile architectures have caught up, and the lack of modern SIMD is significant. Performance varies by workload, but I'm willing to bet that they're similar at integer workloads and the Cortex-A9 definitely has more SIMD oomph thanks to NEON.

Also this:

fail0verflow said:
Some, like the PPC750CL (and the Broadway) only have one PLL and its configuration can only be changed at hard reset time (externally). Initially, we thought/hoped that the Espresso would borrow the 750FX's dual PLLs, but that turned out not to be the case. The CPU die shot also does not show two symmetrical PLLs. The HID1 bits that control the PLLs in the 750FX are not present in the Espresso.

EDIT2:

This one's especially bad for your case..

fail0verflow said:
There is no clock switching. It's not a triple-core 750FX, it's a triple-core 750CL with a new L2/bus/coherency subsystem and a wider bus. The cores are laid out from scratch, but the core features up to the L1 caches are basically identical to the 750CL.

Are you all right? You seem... Angry? Vested?

I dont really care if you think wii u's a peice of crap. Its okay. Look at the size of it, its not a powerful machine. I just want to know what this particular peice of crap can do.

No I'm not even remotely angry or vested, honestly you sound pretty defensive on this yourself. I don't really care if Wii U is a piece of crap or not and I'm not one to judge what people should or shouldn't value (Wii was super lazy hardware wise but beloved, and there's nothing wrong with that). I'm just interested in the technical details here, and am a little snarky in response to what I thought were exceptionally silly analogies.
 
Last edited by a moderator:
I didn't see that 35% number, but I take it that was with an increased clock speed, I'm very skeptical that it could get such a high typical IPC improvement with only these changes. I think the highest confirmed clock speed increase for Jaguar over Bobcat is 1.8GHz to 2GHz. or 11.1%.

It was about the same difference with the max of fx/gx. I think.it was exactly 200mhz. Though, the differemce between espresso and broadway is much larger.

This discussion is very much uarch or perf/MHz oriented, since we already know the MHz but you're saying that the expectations of perf/MHz are too low because we have the uarch wrong.

Yes, I believe if people are pointing out the difference between bobcat and jaguar, they shouldnt group espresso with broadway for the same reasons.[/quote]

Why are we talking about Bobcat vs Jaguar again? Because I said that darkblu's tests were on Bobcat but would be a lot faster on Jaguar? His test was nothing but float matrix multiplications. For this particular test Jaguar would make a huge difference thanks to double the SIMD width and from newer instructions like broadcasts. The more typical performance differences aren't pertinent.

No, im staying away fp/simd, its not exactly something i believe is a particular strength to espresso. It really was just what i mentiomed above.


Sure, if you look at clock scaling, because back then process improvements opened up a lot more headroom. Bobcat started out a lot later in the process game.

Thats pretty much my basis. Again, if people are bringing jaguars gains over bobcat, I dont think its practical to ignore the cle base vs gx/fx base for wii u.

But the differences between an original 750 and a 750GX are barely larger than the differences between an original 750 and Broadway. Some new CPU combining the best of both would be some totally different thing. Like I said it's not just paired singles, it has more FPU reordering.
Looked pretty decent to me. When i say gx, i mean as a base. I assume all nintendo made its customizations for espresso.

I have no idea what point you're trying to make with that Piledriver comment. I think that IBM/Motorola/Freescale/whoever making new processors that kept the 750 in the name did actually imply something about the level of changes present.

Yeah. that wasnt very clear. Bobcat and jaguars are both cats. I dont feel keeping the 750 in the name is any more indictive of anything except they didnt want to think of another knights of the round themed name.


Please provide some kind of link for this. I don't see how modifying Broadway to be compatible with SMP is necessarily more invasive than modifying a 750GX to be compatible with Broadway's execution resources. Espresso cores have different amounts of cache, I don't think that really has anything to do with GX vs Broadway; neither would be compatible with Espresso's L2 cache as is.

https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/291C8D0EF3EAEC1687256B72005C745C

7. I am upgrading from a PPC750 to a PPC750CXe
- do I need to recompile my code?

In general, software that utilizes only user model
resources (GPRs, FPRs, CR, FPSCR, etc) will not
require changes, and this is true for most
compiler-generated code. Some software
changes are required when migrating to a 750L,
750CXe, 750FX or 750GX, but these changes are
limited to the PowerPC Supervisor Model
(configuration registers, cache control and
memory management registers, processor
version registers, etc).

This seems to say to me its not that big of a deal. Of course, I realize none of these have nintendos custimizations to gekko or broadway, but i assume espresso would have nintendo 750 specific customizations, yeah? I dont see a few bc tweaks being a deal breaker.

BTW, Broadway supports cache coherency via a sub-par protocol (lacks exclusive state), so it was designed for multiple devices to share the same memory which is the main component of SMP. As far as I can tell this is the same as FX and GX's coherency support. Hopefully Wii U doesn't use that coherency protocol.

Yeah, its really bad. I think its even touched on on the link i provided. Although ive only seen it brought up for cxe fx and gxe. Never really read anything on it in regaurds to broadway or cl/e.


No I'm not even remotely angry or vested. Are you...? Where did I call Wii U a piece of crap? I'm just interested in the technical details here, and am a little snarky in response to what I thought were exceptionally silly analogies.

ah ha ha. Yes, all my analogies will be silly.

And my puns will all be bad.
 
When I brought up the differences between Bobcat and Jaguar it was only in the context of that particular test, which was being used to compare Espresso's performance with the upcoming consoles while normalized for clock. Even if you take into consideration benefits to using a better uarch than Broadway (where all signs point to no), even up to a 30% IPC improvement which I find very dubious, you still wouldn't match the difference from Bobcat to Jaguar for this test only. It'd be way more than 30%.

I meant please provide a link to where fail0verflow says something about Espresso only having registers on FX or GX. But it's kind of moot now, I edited my post several times to show fail0verflow thinks Espresso is not FX or GX based, which your entire position kind of hinges on (I also edited it with a bunch of other stuff, damn you and your ninja responses :()
 
When I brought up the differences between Bobcat and Jaguar it was only in the context of that particular test, which was being used to compare Espresso's performance with the upcoming consoles while normalized for clock. Even if you take into consideration benefits to using a better uarch than Broadway (where all signs point to no), even up to a 30% IPC improvement which I find very dubious, you still wouldn't match the difference from Bobcat to Jaguar for this test only. It'd be way more than 30%.

Oh, its not just in response to you. People have been making a deal out of jaguar>bobcat for a while now.

I meant please provide a link to where fail0verflow says something about Espresso only having registers on FX or GX. But it's kind of moot now, I edited my post several times to show fail0verflow thinks Espresso is not FX or GX based, which your entire position kind of hinges on (I also edited it with a bunch of other stuff, damn you and your ninja responses :()

Sorry dude, I dont feel like digging through marcans twitter, he explicitely states he believes its likely a fx/gx, though that doesnt seem to stop him from considering it as being 'basically three broadway cores with more cache'. From his end it makes sense, he wouldnt see many of the changes from his end beyond cache. Fail overflow should be one of their last pages though. Probably one of the ones you are talking about as they rant about basically 3 broadways, ouya being more powerful etc, And mention they found new registers they dont recognize from broadway: 0x700201r0 or something close, and maybe three, four others others. The same im pretty sure ive seen in documentation I read in changes between l/cx/fx/gx.

And no, thats just further supporting evidence. my position hinges on the fact fx/gx are no longer pin compatable with the rest of the 750 series, because the large change in edram irrevocably changed the footprint of the chip.

If you are going to be making a 750 with cores with 512kb and 1Mb caches, you cant use broadway, or any other 750 package as a base. That leaves you with gx and fx as a perfectly good starting point, or completely redesigning broadway to a fx/gx sized package/footprint.

Im going for the simpler answer.
 
Sorry dude, I dont feel like digging through marcans twitter, he explicitely states he believes its likely a fx/gx, though that doesnt seem to stop him from considering it as being 'basically three broadway cores with more cache'. From his end it makes sense, he wouldnt see many of the changes from his end beyond cache. Fail overflow should be one of their last pages though. Probably one of the ones you are talking about as they rant about basically 3 broadways, ouya being more powerful etc,

Now you're just putting your fingers in your ears and ignoring reality. I'm sorry if that sounds mean but that's what you're doing. Marcan (fail0verflow) unequivocally, undeniably believes that Espresso is 750CL based and not FX or GX, the comments I posted make that clear. I didn't give links because the system didn't give them but it would take you only a few minutes clicking to list replies to find that I didn't make the posts up. So it doesn't matter if he ever posted that he thinks it's FX or GX because he clearly doesn't think so now, I don't even care if you can or can't provide links.

Maybe you didn't read my edited post like I asked you to. If so, please do it.

And mention they found new registers they dont recognize from broadway: 0x700201r0 or something close, and maybe three, four others others. The same im pretty sure ive seen in documentation I read in changes between l/cx/fx/gx.

So you think you saw something like this but can't give anything concrete.

And no, thats just further supporting evidence. my position hinges on the fact fx/gx are no longer pin compatable with the rest of the 750 series, because the large change in edram irrevocably changed the footprint of the chip.

That doesn't make any sense.

If you are going to be making a 750 with cores with 512kb and 1Mb caches, you cant use broadway, or any other 750 package as a base.

That leaves you with gx and fx as a perfectly good starting point, or completely redesigning broadway to a fx/gx sized package/footprint.

That also doesn't make any sense. The cache is on-die, the footprint of old commercial packages and the pinouts don't have anything to do with what core design they used. Since we're talking about three cores and L2 cache (and some other stuff) on the same package, along with some interfaces that are specific to Espresso, the footprint really has nothing to do with any commercial products at all.

Neither Broadway nor FX/GX have suitable coherency support or an interface for L2 eDRAM, so either would have to be changed. FX/GX would also need significant extra changes to make it compatible with the instructions Gecko/Broadway support. If this whole thing was so trivial why weren't they added to GX in the first place?

Im going for the simpler answer.

No, you're going for the answer that's most appealing to you because it lets you claim that the cores are 30% faster than everyone thinks they are (although that claim would be dubious)
 
And video sharing on Mario Kart 8 (aka Mario Kart TV):!:

http://uk.ign.com/articles/2013/06/11/e3-2013-nintendo-reveals-mario-kart-tv?+main+twitter

Could this be done on the CPU or probably use (like PS4) dedicated hardware on that big GPU parts that none knows what are being used for :?:

We already know the Wii U has a dedicated video encode unit (h264 encoder with at least baseline profile support thats used for Off TV play in conjunction with the decode unit in the GamePad). While the Wii U OS (what's it shortname?) doesn't have the functionality at a system level as with XBO and PS4, at a hardware level it would appear to have everything they need to support that functionality to some extent, without impacting game resources.
 
We already know the Wii U has a dedicated video encode unit (h264 encoder with at least baseline profile support thats used for Off TV play in conjunction with the decode unit in the GamePad). While the Wii U OS (what's it shortname?) doesn't have the functionality at a system level as with XBO and PS4, at a hardware level it would appear to have everything they need to support that functionality to some extent, without impacting game resources.

Thanks, I completely forgot about that.
 
All of the 60FPS games that look on-par with current gen 30FPS games are starting to confuse me... Maybe the GPU really is 20-30% more powerful than current-gen after all? Or are they just using a lot of tricks to make up for it being 20-30% weaker than current-gen? Also, if SSB4 really is 1080p/60FPS (1080p was hinted at, all SSB games thus far have been 60FPS), what does that say?
 
Status
Not open for further replies.
Back
Top