Looks made up to me. Why have differnet L2 sizes? And huge ones at that (and therefore slow). My guess is that it will have:
3 x souped up 476FP CPUs; 32nm, clocked at 3GHz, 2-way SP FP similar to Wii/Gamecube
Each CPU with 32KB D$, 32KB I$, 256 private L2 cache
2MB shared L3 (victim cache)
That adds up to 3008 MB cache in total. PR megabytes mind you, the effective amount is lower.
Some developers mentioned porting existing PS360 games to Wii U wasn't straight forward. If it had Power 7 cores it would stomp all over PS360.
Cheers
Hi Gubbi
I tried to read or more I read and tried to understand the following
presentations, there are many more
here
I'm not sure I get it properly but I see many things that are troublesome to me.
The docs states that the supported size for the L2 are 256KB, 512Kb and 1MB.
The so call PLB6 interface seems to be a pretty critical part of the design as much as the core it-self.
Actually looking at the 470s whereas the core is synthesizale I've seen nothing that let think that that part (the bus) of the design is meant to be modified.
Another thing I don't get is that whereas the L2 are said to be private I feel like that are meant to be implemented as "block". It looks like the PLB6 bus / L2 interface as well as the L2 are set to run at half the speed of the core(or slower).
I would say they are part of the "uncore" as Intel would say. To me the L2 as describe seems pretty much "L3ish". If I compare to Nehalem or power7 I would say that the L1 & L2 are part of the core, the L3 is part of the uncore (even though slices of the L3 are tighly connect to a given core).
I don;t know how to make my self clearer so I use pictures, those PPC 67x if I were to compare them to the 2 aforementioned CPUs I would they that they are more "L2 less" than "L3 less".
The L2 seems ( to me I may misunderstand) to be design like the Last level of cache in those CPU.
Are you really sure that having a L3 is good option wrt to how the CPU works (from my pov obviously)? I feel like it may be less an headache to support a bigger Slice of L2 (from 1MB to 2MB) than to rework the cache hierarchy/ L2 interface?
Then there is the clock speed, it really has a really short pipeline. Do you think it would really reach 3GHz. For example Bobcat are really dog when it comes to OC (almost no headroom).
Either way even if what you say is doable , my belief is the issue is not that Nintendo could use even off the shelves ones PPC 476fp(would have save them money and time) at speed advertized by IBM ie 1.6GHz. The issue is the number of cores. Those cores are damned tiny just above 4mm^2 @45nm. The L2 interface /PLB6 supports up to 8 nodes/cores.
I see nothing in the doc that prevents Nintendo (or whoever for that matter) to use EDRAM for the L2. As the L2 is to be clocked really slow (800MHz) there may not that much of difference in perfs VS SRAM cells. From other presentations (power A2) IBM says that EDRAM gains are -50% in size and -80% in power. It's cheap and 2,3,4MB of cache should be real cheap both in silicon and power.
So the real question is: with those really tiny and low power CPU cores, this high density and low power last level of cache
why in hell Nintendo settled down for only three cores?
If Nintendo really uses this kind of CPU (which seems likely no matter late PR statements from IBM) it's really beyond my understanding even if costs was a strong concern to them.They cut on CPU core count but go all out on 32MB of edram.
One have to wonder if Nintendo has any ideas about what they are doing... I don't mean this for the engineers but more the decisions process within the company