Wii U hardware discussion and investigation *rename

Status
Not open for further replies.
You're claiming 1GB is reserved???!!
We have a few options here based on the chips Nintendo are using.

1GB total system RAM, no OS. Seems unlikely.
1.5 GBs, with 512 MBs reserved. That's a stupid amount of RAM for OS use.
2 GB s with 1 GB reserved. 1 GB of system is even more ludicrous.

But, if Nintendo are using really cheap RAM without much regard for speed, perhaps because eDRAM is doing all the BW work, then they don't really lose a great deal by chucking in 2GBs. Quite why they wouldn't then allow lots more for games is a quandary. Maybe, just maybe, the OS incorporates a significant disk cache?
 
I would like to know dig deeper into the following statements:

The same IBM CPU technology in the Watson supercomputer powers the Wii U

and we have,

WiiU has 45nm custom chip ... a SOI design & packs same processor tech found in #IBMWatson

Watson uses Power7, but these statements dont say the WiiU CPU is a Power7. Maybe because the WiiU cpu has supposedly three cores it cant be called a Power7? But they certainly must mean the processor technology that is in Power7 is in the WiiU.

So I would like to know, what technology are they talking about? And why do they only refer to Watson, and no other IBM system such as Blue Gene, etc? We know about the eDRAM, we know about the SOI, we know about it being 45nm. So we can discount those. So what else about the Power7 technology is key to what is being used for the WiiU CPU?

The only thing I could find is the following:

The POWER7 symmetric multiprocessor architecture was a substantial evolution from the POWER6 design, focusing more on power efficiency through multiple cores and simultaneous multithreading (SMT).[6] The POWER6 architecture was built from the ground up for frequencies, at the cost of power efficiency and achieved a remarkable 5 GHz. IBM claimed at ISCA 29

So, what makes Power7 different than Power6 is the fact its a symmetric multiprocessor.

A Symmetric Multiprocessor System is a multiprocessor system with centralized shared memory called Main Memory (MM) operating under a single operating system with two or more homogeneous processors—–i.e., it is not a heterogeneous computing system

More precisely a SMP is a tightly coupled multiprocessor system with a pool of homogeneous processors running independently, each processor executing different programs and working on different data and with capability of sharing common resources (memory, I/O device, interrupt system, etc.) and connected using a system bus or a crossbar.

Is that significant step for the gaming industry? Would it cause porting issues between current gen CPU designs and the WiiU?

But as important as frequency is to computer performance, it is not and has not been the end all, whether for system capacity or for single-threaded performance. IBM has rather proven that with its POWER7 design. In a complex computer system, there is a lot of processing that goes on which executes at rates independent of core frequency and are also critical to performance.

Even though the frequencies of POWER7 are lower than the POWER6 processors, the
core-to-core capacity of corresponding POWER7 systems exceeds that of POWER6 by a
considerable amount.
Even many applications where there is a single task executing on a core find that POWER7 produces superior results over that of POWER6. But let’s be up front here; frequency does matter. There are classes of single-threaded applications in isolation where the advantages of POWER7’s design remain insufficient to make up for the difference in frequency; for some uses POWER7 is slower.

So in some tasks and applications the architecture of a Power7 trumps the higher frequency CPU's, but in other cases its slower. Does that sound like the WiiU CPU?

And here is something interesting

..one thing that immediately jumps out is that POWER7 can run in a POWER6 compatibility mode, making this the first generation of POWER hardware that supports previous technology levels.

Is that why its easy to port current gen games to the WiiU?

Now why would the WiiU CPU have eDRAM and need SOI?

IBM's embedded dynamic random access memory will help deliver a thrilling new game experience to Nintendo fans. The new memory technology, a key element of the new Power microprocessor that IBM is building for the Nintendo Wii U console, can triple the amount of memory contained on a single chip, making for extreme game play.


Here is what IBM's SMP architect has to say:

First, some context. Power7 is a single-chip, eight-core cluster of processors. This is quite a leap from previous generations of the Power architecture, in which the largest number of CPU cores on a die was two. The chip is designed to fit into an extended coherent array of processors, scalable up to 32 sockets, or 256 CPUs in a single hardware-coherency fabric. This contrasts with PC or workstation CPUs intended only for much smaller coherency networks. This fundamental difference shows up in all aspects of the chip design.

The first architectural point is the growing importance of memory hierarchy. Many architects today are saying that managing memory traffic has become a more critical problem than CPU microarchitecture in today's multicore systems. The Power7 certainly reflects this growing concern, integrating either three or four levels of local memory—depending on how you count them—onto the die.

The problem, according to IBM chief storage hierarchy and SMP architect William Starke, is a dilemma at the systems level. "Our experience with building up to 64-processor computing systems in previous generations has convinced us that symmetric multiprocessing [SMP] is the best way to deliver performance to a computing cloud," Starke said. But he went on to say that in order to keep even latency-tolerant multithreaded CPUs from standing idle, you really need 2 to 4 MBytes of tightly coupled cache on each processor and a shared "memory tank," as he called it, of more than 30 MBytes.

That was not so much a problem when there were two cores on a chip, each with its own local cache. You could construct a fast parallel interface from each chip to a shared pool of external fast memory. But with eight cores on a die, you simply run out of pins to connect all those local caches to the tank.

IBM's solution was to turn to an embedded DRAM process, apparently a variant of the deep-trench DRAM developed with the now-defunct Qimonda. The company has implemented deep-trench embedded DRAM in its 45-nm SOI process, and used it to put a 32-MByte L3 cache down the center of the Power7 die. Thus, each of the private L2 caches directly abuts the L3.

So if WiiU's CPU is a SMP, it would need eDRAM. From what is stated, at minimum 30Mbyte for its main memory tank, and 2MB for each core. That means the CPU itself should have at minimum 38MB reserved for it?

We know that Nintendo likes Out of order CPU's. And the Power7 really seems to shine in that regard. Its also a CPU that Nintendo can scale from for future consoles.

So the question is, is WiiU's CPU a SMP, and is its architecture then similar to the Power7.



 
We have a few options here based on the chips Nintendo are using.

1GB total system RAM, no OS. Seems unlikely.
1.5 GBs, with 512 MBs reserved. That's a stupid amount of RAM for OS use.
2 GB s with 1 GB reserved. 1 GB of system is even more ludicrous.

But, if Nintendo are using really cheap RAM without much regard for speed, perhaps because eDRAM is doing all the BW work, then they don't really lose a great deal by chucking in 2GBs. Quite why they wouldn't then allow lots more for games is a quandary. Maybe, just maybe, the OS incorporates a significant disk cache?
A disk cache just short of 1 GB for a tiny 8GB flash drive?

What about Nintendo not using an UMA setup? The CPU gets 1GB main memory and the GPU backs its eDRAM with some other 1GB memory and everyone confuses this somehow? :devilish:
 
Yeah, the "insiders" which have always been correct on all the leaks on various GPU, CPU, Console, You-name-it generations, right? ;)

Yep, this is exactly the spreadsheet given to devs by Nintendo.

VGLeaks is legit http://www.vgleaks.com/wii-u-first-devkits-cat-dev-v1-v2-classic-controller-adapter/
http://www.vgleaks.com/wii-u-first-devkits-cat-dev-v1-v2/

Grall said:
What I meant is, those specs are so imprecise as not really warranting calling them "specs". Ok, so it has a CPU, whoopee de do dah, we knew that already. A true specs sheet would state WHAT CPU it has.

That is all Nintendo gives away to developers
 
They were confirmed as not target specs but actual specs months ago by insiders


What was posted in that link is not specs whether they are targeted or bits of real information. It is a list of features the hardware has, but by no means is it exhaustive or even all that informative. Good to know one core has 2 MB L2 and the other two have 512 KB L2. Is each core identical minus the cache, are they different cores? What speeds will they run at?

If the GPU is not DX11 class then Nintendo has absolutely failed in making a next gen console and again is planning on releasing a current gen extender. With these developments and knowing the cheap route Nintendo is taking I wish them the best, but I hope they get crushed by the competition now that MS and Sony will be going after the casual market with the same tenacity MS did with the Wii. Maybe it would be for the best if Nintendo gets pushed out of hardware all together and forced to go software only.
 
The thing is I can already see a lot of people blaming that it's not DX11 for a failure to garner support in the future. When it's lack of muscle not any particular DX spec that is the real factor.

To add more to todays excitement factor, on a tangent, a gaffer posted pic of "Wii U in the wild"

FCmE8.jpg


Though discussion suggests it is not really a wild Wii U but a debug unit for practice or something, not something the person will be allowed to keep.

Getting close!
 
We have a few options here based on the chips Nintendo are using.

1GB total system RAM, no OS. Seems unlikely.
1.5 GBs, with 512 MBs reserved. That's a stupid amount of RAM for OS use.
2 GB s with 1 GB reserved. 1 GB of system is even more ludicrous.

But, if Nintendo are using really cheap RAM without much regard for speed, perhaps because eDRAM is doing all the BW work, then they don't really lose a great deal by chucking in 2GBs. Quite why they wouldn't then allow lots more for games is a quandary. Maybe, just maybe, the OS incorporates a significant disk cache?

I think the proper phrase that EG article should have used was "at least 1GB". Going by that vgleak link, there is up to 3GB in the dev kit. However, according to some other sources, 1GB of it was inaccessible to developers. As Shify and others had said, 512MB reserved for OS tasks sounds a bit extreme, so it's very likely that a noticable amount of that 512MB may ultimately be available to developers in the final retail version (and maybe more in later OS revisions.)
 
It's a lack of features and a lack of muscle. Developers will be hesitant to port down from PS4720 if the machine is not only lacking in muscle, but also the feature set being used for the games. Obviously if the machine sells millions it won't get ignored by all devs, but just like this generation a large amount will avoid the machine like the plague if it's incapable of handling ports with ease.
 
I would like to know dig deeper into the following statements:

Okay, you're way out of your depth here. ;) I won't give a thorough reply as some of your questions don't make much sense, as I hope this post clarifies.

So I would like to know, what technology are they talking about? And why do they only refer to Watson, and no other IBM system such as Blue Gene, etc?
References to Watson are PR. Watson is famous, so is an easy reference to communicate about a CPU architecture to the masses. Talking about the depth of the floating point pipelines, the cycle latencies of the caches, instructions per clock and such would mean nothing to Joe Public, so it's all summarised as, "well, you know that big computer that won Jeopardy? Well we use the same sort of processors as are in that." The details are unknown, but don't need to be known by the public.
Now why would the WiiU CPU have eDRAM and need SOI?
I don't know where you got the quote from. Reads like a clueless fanboy website. It's utter gibberish. eDRAM on the CPU won't enable "extreme gameplay". A CPU can only work on the information it has internally inside registers in its execution units. When it wants more data, it has to fetch that data from another store. Main RAM is a long way away, and if the CPU had to fetch data from RAM every time it wanted to do something, it'd spend most of its life waiting around. So special fast memory is placed very close to the CPU's execution units. This is static RAM (SRAM) and is very fast, but also expensive and becomes slower the larger pool you have. So a very small pool of cache (L1) is used to keep the execution units busy. Data is copied to this cache and then from there into the execution units for computing. When these caches don't have the needed data, they have to get filled from the slow RAM again. To speed that up, the data that they want is copied into another larger cache, which is slower than L1 cache if the execution units want data directly from it, but still much, much faster than trying to load data direct from main RAM. A third layer of cache can be used between main RAM and this L2. By being bigger it has more chance of having the necessary data that the execution units want when they go asking for more info, but by being bigger they are also slower. Cache is a balancing act of size to keep it faster enough to keep the execution units loaded and large enough to fit in enough data so that whatever the execution unit wants is to hand to deliver without have to go off to a slower RAM.

Now depending on your workload, it's quite possible to manage your data so the SRAM L1 and L2 caches are efficiently feeding the CPU so it's always got data to work on. If you're reading and processing data linearly like reading words on a page, you know exactly what data you'll need to load into the caches ready for the CPU's to process, so you'd need to store just a few words at a time. It's only on certain workloads where a couple of MBs L2 cache means data isn't available and has to be fetched from main RAM. If you had to process the words on a page in a random order for some task, then you'd need enough space to store all the words in cache. That's where another level of cache is useful. These workloads are more common in large dataset supercomputing and server jobs.

For gameplay, we're talking physics and AI mostly I think. Physics is pretty linear and doesn't have complicated access patterns, so L3 cache won't be a benefit. AI can be handled lots of different ways. I'm sure if Wii U has many MBs of cache then devs will make use of it, but it won't be a game changer. Whatever CPU is in any console, it's contribution to the game experience is only whatever the devs use it for. Anyone claiming amazing advances in experience like "Emotion Engine" is trying to sell you something. ;) CPU's just crunch numbers, and it's developers that do wonderful things with them, which may or may not be amazing gameplay experiences.

So if WiiU's CPU is a SMP, it would need eDRAM. From what is stated, at minimum 30Mbyte for its main memory tank, and 2MB for each core. That means the CPU itself should have at minimum 38MB reserved for it?
SMP is any contemporary multicore processor with multiple copies of the same core. Intel's i7 is SMP. Xenon in XB360 is SMP. Quad-core ARM in an iPad is SMP. You also don't need 30 MBs eDRAM memory tank for many CPU tasks, and 2MBs cache per core is expensive. So at this point we've no idea what the cache will be in Wii U's CPU. It could be anything from 4 MBs (1 MB for each core) to 40+ MBs with 32 MBs eDRAM. Although the P7 architecture has 4 MBs eDRAM per core, so a four core version would be 16 MBs eDRAM. But that's a lot of expense for very little benefit in a console as I understand it.
So the question is, is WiiU's CPU a SMP, and is its architecture then similar to the Power7.
We keep being told it's architecture is similar to P7, so we should accept that, but how it's similar is an unknown beyond running the same code.
 
The thing is I can already see a lot of people blaming that it's not DX11 for a failure to garner support in the future. When it's lack of muscle not any particular DX spec that is the real factor.
So you actually know how much muscle it has? :rolleyes:
 
lherre chimed in on the new info

http://www.neogaf.com/forum/showpost.php?p=41573982&postcount=2006

This "new info" is the same that was leaked some time ago, where it is the surprise?

I told a long time ago that the memory amount will be 1-1.5 gb in applications (depending on nintendo's choice) so ... I mean there is no new info in eurogamer's article. This is why devkits have 3 gb (doble memory in the "better" case I said long time ago).

I guess this still doesn't exactly clear up for me whether there is a grand total of 2GB in the retail system or 1.5GB, but it seems to lean toward the latter.

I read somewhere also "there is 3GB in the dev kit with 1GB not accessible". Which kind of implies 2GB. But I'm still pretty sure it's 1.5.
 
I won't give a thorough reply as some of your questions don't make much sense, as I hope this post clarifies.


I simply want to establish if the architecture of the WiiU CPU is based on the same Symmetric Multiprocessor architecture as the Power7 And if it is, what does this design offer? Why did Nintendo choose that, over other options?


References to Watson are PR. Watson is famous, so is an easy reference to communicate about a CPU architecture to the masses.

You bring up a good point. Though, I don't consider Watson followers, and people paying attention to IBM press releases as constituting the general public, or the masses. I think those same people would know about Deep Blue, and Blue Gene. So if IBM stated that WiiU was based on Blue Gene architecture, it would give people a point of reference. In this case, Watson (& Power7) was mentioned, several times from several sources.


I don't know where you got the quote from. Reads like a clueless fanboy website. It's utter gibberish. eDRAM on the CPU won't enable "extreme gameplay".
Great! IBM are clueless Nintendo fanboys, good to know! :LOL:
Sorry, I figured those quotes had been posted so many times by now, I didn't bother to source them, but here:

IBM's unique embedded DRAM, for example, is capable of feeding the multi-core processor large chunks of data to make for a smooth entertainment experience.
http://www-03.ibm.com/press/us/en/pressrelease/34683.wss
So you'll have to take it up with IBM's press people.


For gameplay, we're talking physics and AI mostly I think. Physics is pretty linear and doesn't have complicated access patterns, so L3 cache won't be a benefit. AI can be handled lots of different ways. I'm sure if Wii U has many MBs of cache then devs will make use of it, but it won't be a game changer.

OK, what about running more than one application? As in, a game is running, and in the background, a browser, or some other apps?

You also don't need 30 MBs eDRAM memory tank for many CPU tasks, and 2MBs cache per core is expensive. So at this point we've no idea what the cache will be in Wii U's CPU.

This is why I wonder, if to make the 32MB of eDRAM worthwhile, wouldn't they design it so that the GPU also has access to it? The Power7 starts off with 4 cores. Why did Nintendo go with 3?
 

Looks made up to me. Why have differnet L2 sizes? And huge ones at that (and therefore slow). My guess is that it will have:
3 x souped up 476FP CPUs; 32nm, clocked at 3GHz, 2-way SP FP similar to Wii/Gamecube

Each CPU with 32KB D$, 32KB I$, 256 private L2 cache

2MB shared L3 (victim cache)

That adds up to 3008 MB cache in total. PR megabytes mind you, the effective amount is lower.

Some developers mentioned porting existing PS360 games to Wii U wasn't straight forward. If it had Power 7 cores it would stomp all over PS360.

Cheers
 
Looks made up to me. Why have differnet L2 sizes? And huge ones at that (and therefore slow). My guess is that it will have:
3 x souped up 476FP CPUs; 32nm, clocked at 3GHz, 2-way SP FP similar to Wii/Gamecube

Each CPU with 32KB D$, 32KB I$, 256 private L2 cache

2MB shared L3 (victim cache)

That adds up to 3008 MB cache in total. PR megabytes mind you, the effective amount is lower.

Some developers mentioned porting existing PS360 games to Wii U wasn't straight forward. If it had Power 7 cores it would stomp all over PS360.

Cheers


It's legit. The asymmetric L2 cache and the (total) amount was one of the first things we learned last year about Wii U.
 
No he does not, but he's making a reasonable assumption.

I recon a whole bunch of the pessimists assumptions are wrong. Not wrong in a sense of specs but wrong in what those specs mean for the console. The WiiU will be way further along the curve then the wii was relative to the more powerful consoles.

So much of what makes games look good is art, art can be at a much more comparable level with the WiiU, everything else from there is diminishing returns. Nintendo need to get this out ASAP with strong launch titles and really highlight how much better things like texturing look on the WiiU vs XBOX360 and PS3. They need to take advantage of there approach by maximizing the amount of time it is out before the 720/PS4.
 
Last edited by a moderator:
Looks made up to me. Why have differnet L2 sizes? And huge ones at that (and therefore slow). My guess is that it will have:
3 x souped up 476FP CPUs; 32nm, clocked at 3GHz, 2-way SP FP similar to Wii/Gamecube

Each CPU with 32KB D$, 32KB I$, 256 private L2 cache

2MB shared L3 (victim cache)

That adds up to 3008 MB cache in total. PR megabytes mind you, the effective amount is lower.

Some developers mentioned porting existing PS360 games to Wii U wasn't straight forward. If it had Power 7 cores it would stomp all over PS360.

Cheers
Hi Gubbi :)

I tried to read or more I read and tried to understand the following presentations, there are many more here
I'm not sure I get it properly but I see many things that are troublesome to me.
The docs states that the supported size for the L2 are 256KB, 512Kb and 1MB.
The so call PLB6 interface seems to be a pretty critical part of the design as much as the core it-self.
Actually looking at the 470s whereas the core is synthesizale I've seen nothing that let think that that part (the bus) of the design is meant to be modified.

Another thing I don't get is that whereas the L2 are said to be private I feel like that are meant to be implemented as "block". It looks like the PLB6 bus / L2 interface as well as the L2 are set to run at half the speed of the core(or slower).
I would say they are part of the "uncore" as Intel would say. To me the L2 as describe seems pretty much "L3ish". If I compare to Nehalem or power7 I would say that the L1 & L2 are part of the core, the L3 is part of the uncore (even though slices of the L3 are tighly connect to a given core).
I don;t know how to make my self clearer so I use pictures, those PPC 67x if I were to compare them to the 2 aforementioned CPUs I would they that they are more "L2 less" than "L3 less".
The L2 seems ( to me I may misunderstand) to be design like the Last level of cache in those CPU.

Are you really sure that having a L3 is good option wrt to how the CPU works (from my pov obviously)? I feel like it may be less an headache to support a bigger Slice of L2 (from 1MB to 2MB) than to rework the cache hierarchy/ L2 interface?

Then there is the clock speed, it really has a really short pipeline. Do you think it would really reach 3GHz. For example Bobcat are really dog when it comes to OC (almost no headroom).

Either way even if what you say is doable , my belief is the issue is not that Nintendo could use even off the shelves ones PPC 476fp(would have save them money and time) at speed advertized by IBM ie 1.6GHz. The issue is the number of cores. Those cores are damned tiny just above 4mm^2 @45nm. The L2 interface /PLB6 supports up to 8 nodes/cores.
I see nothing in the doc that prevents Nintendo (or whoever for that matter) to use EDRAM for the L2. As the L2 is to be clocked really slow (800MHz) there may not that much of difference in perfs VS SRAM cells. From other presentations (power A2) IBM says that EDRAM gains are -50% in size and -80% in power. It's cheap and 2,3,4MB of cache should be real cheap both in silicon and power.

So the real question is: with those really tiny and low power CPU cores, this high density and low power last level of cache why in hell Nintendo settled down for only three cores?

If Nintendo really uses this kind of CPU (which seems likely no matter late PR statements from IBM) it's really beyond my understanding even if costs was a strong concern to them.They cut on CPU core count but go all out on 32MB of edram.

One have to wonder if Nintendo has any ideas about what they are doing... I don't mean this for the engineers but more the decisions process within the company :(
 
Last edited by a moderator:
Status
Not open for further replies.
Back
Top