Wii U hardware discussion and investigation *rename

Status
Not open for further replies.
you calculate with Kgate/mm² number and 1 gate = 1bit?
in that case you confuse Mbyte and Mbit

edit: ok, you use 0.06um² DRAM cell number, but like Gipsel say it isn't an indicator for DRAM array
 
Last edited by a moderator:
1 square mm is 2 mbyte in this case.
16 square mm is 32 mega byte?
in that case 64-128 MByte should be the number with 130 + die size

But somehow I feel that quite unrealistic.
It is. You need more than just bit cells to build an DRAM array. Someone quoted the numbers for TSMC (with a comparably sized cell). You can obviously easily (more than) double the needed space.
 
I said that already and asked for other ideas. ;)
Argh, my mistake. :) Personally I see little benefit in 24bit floats given that comparable consoles work just fine at 16bit. 12bit floats or fixed-point values would be interesting but I'm not sure developers would want to deal with those (but lower-precision arithmetics would fit well with the low-power concept as min10float in DX11.1 should prove).
 
Maybe the memory is fast compared to the CPU.


Hasn't memory been a bottleneck for powerful CPUs?
Maybe Nintendo figured it wasn't necessary to invest in a more powerful CPU if
the memory would become a bottleneck for it. The CPU and Ram are in balance.
Thats why maybe that one developer stated:

“The performance problem of hardware nowadays is not clock speed but ram latency. Fortunately Nintendo took great efforts to ensure developers can really work around that typical bottleneck on Wii U,” he said.

“They put a lot of thought on how CPU, GPU, caches and memory controllers work together to amplify your code speed...

The developer said bottleneck apply to any hardware but Nintendo’s decisions as regards cache layout, ram latency and ram size prove an effective solution.
 
After the unified memory failure that was N64, Nintendo has always paid close attention to their console's memory subsystems. I could see this hardware being weak but smartly designed so it can be fully utilized.
 
The issue with memory bandwidth isn't just the CPU, in fact it's primarily the GPU.
Even with EDRAM the textures have to come from main memory, most of the recent performance captures I've seen for high end GPU's they are limited by texture memory bandwidth, not ALU's, and they have > 10x the bandwidth that the WiiU has.
The EDRAM alleviates the frame buffer bandwidth, and I'll be nice and assume it can be used for intermediate buffers, but it's still going to be an issue.

If the CPU is indeed the 3 enhanced Broadway's @1.6GHz or so, it's going to be an issue for some games regardless of the memory subsystem.
 
The issue with memory bandwidth isn't just the CPU, in fact it's primarily the GPU.
Even with EDRAM the textures have to come from main memory, most of the recent performance captures I've seen for high end GPU's they are limited by texture memory bandwidth, not ALU's, and they have > 10x the bandwidth that the WiiU has.
The EDRAM alleviates the frame buffer bandwidth, and I'll be nice and assume it can be used for intermediate buffers, but it's still going to be an issue.

If the CPU is indeed the 3 enhanced Broadway's @1.6GHz or so, it's going to be an issue for some games regardless of the memory subsystem.

IBM did say it is all-new, power based CPU back in 2011 announcement, so I don't think it would be just enhanced Broadway. But given the size of the CPU, it looks like it is around 1/2 or less than either Xenon or Cell, so it should be limiting factor no matter what.
 
The CPU size is not enough for three broadway.
From the other size,the IBM mentioned that they are the manufacturer of the CPU and the EDRAM.
It means that the IBM manufactured the GPU as well?Considering that porbably it contain the edram.
 
I don't think anyone ever considered turning Broadway from in-order to out-of-order execution, it has to be something else
 
Maybe the memory is fast compared to the CPU.

Almost every dev from Namco to Koei to Ninja Theory is pointing the finger of blame squarely at the CPU. I think it's because the low bandwidth memory pool is not an issue. I can't tell you why, but if it was then developers would have said something by now. Historically Nintendo are known for engineering good efficient memory solutions into their consoles.
 
Last edited by a moderator:
Almost every dev from Namco to Koei to Ninja Theory is pointing the finger of blame squarely at the CPU. I think it's because the low bandwidth memory pool is not an issue. I can't tell you why, but if it was then developers would have said something by now. History Nintendo are known for engineering good efficient memory solutions into their consoles.
Or if you listen to the resident devs here, they are notorious for over designing their memory solution.
Devs here have pointed out multiple time that as both GC and Wii CPU had a L2 cache the inclusion of embedded ram for the CPU was not necessary, it could have been replace by more ram off chip.

I think it is getting more and more obvious that Nintendo engineering capability are going down / executive hinder the capability of their teams to come with the best solutions.
Mize made a good point about it and if memory serves right is has Ph D and works in the field of semi conductor (/ runs his own company).
 
I don't think anyone ever considered turning Broadway from in-order to out-of-order execution, it has to be something else

Gekko and Broadway were always out of order.

And the person who started the Broadway rumour has been pretty reliable so far. He's said that's what it's described as in the dev documents.
 
Gekko and Broadway were always out of order.

And the person who started the Broadway rumour has been pretty reliable so far. He's said that's what it's described as in the dev documents.
They are not exactely out of order, there is a catch to that, I don't remember exactely, may be something about only the first stages of the pipeline being able to be fed out of order / the tech guys here are going to make you a proper answer, sorry for the fail attempt though those CPU are not out of order as say the celeron in the first xbox.
 
They are not exactely out of order, there is a catch to that, I don't remember exactely, may be something about only the first stages of the pipeline being able to be fed out of order / the tech guys here are going to make you a proper answer, sorry for the fail attempt though those CPU are not out of order as say the celeron in the first xbox.
I wasn't aware of this. Do you have links to old posts? Because everything I've read says the chips are out of order, even PPC 750 docs on IBM's site.
 
Last edited by a moderator:
I wasn't aware of this. Do you have links to old posts? Because everything I've read says the chips are out of order, even PPC 750 docs on IBM's site.

The point of the out of order is the beginning of the pipeline store the instructions & the data,and if both of them available,and there is no dependency then execute the instruction,even if that is not the oldest one.

http://en.wikipedia.org/wiki/Out-of-order_execution

The key concept of OoO processing is to allow the processor to avoid a class of stalls that occur when the data needed to perform an operation are unavailable. In the outline above, the OoO processor avoids the stall that occurs in step (2) of the in-order processor when the instruction is not completely ready to be processed due to missing data.



The benefit of OoO processing grows as the instruction pipeline deepens and the speed difference between main memory (or cache memory) and the processor widens. On modern machines, the processor runs many times faster than the memory, so during the time an in-order processor spends waiting for data to arrive, it could have processed a large number of instructions
 
“What surprises me with Wii U is that we don’t have many technical problems. It’s really running very well, in fact. We’re not obliged to constantly optimize things. Even on the PS3 and Xbox 360 versions [of Origins], we had some fill-rate issues and things like that. So it’s partly us – we improved the engine – but I think the console is quite powerful. Surprisingly powerful. And there’ a lot of memory. You can really have huge textures, and it’s crazy because sometimes the graphic artist – we built our textures in very high-dentition. They could be used in a movie. Then we compress them, but sometimes they forget to do the compression and it still works! [Laughs] So yeah, it’s quite powerful. It’s hard sometimes when you’re one of the first developers because it’s up to you to come up with solutions to certain problems. But the core elements of the console are surprisingly powerful.

“And because we’re developing for Wii U, we don’t have to worry about cross-platform optimization.

“We can push what the console can do; push it to its limits. And of course, we have a new lighting engine. In fact, the game engine for Origins was mostly just classic sprites in HD, but now we can light them and add shadows and all these things. So there is some technical innovation with the engine itself. “

It doesn't look like a "slow memory" issue. :?:
 
The issue with memory bandwidth isn't just the CPU, in fact it's primarily the GPU.
Even with EDRAM the textures have to come from main memory, most of the recent performance captures I've seen for high end GPU's they are limited by texture memory bandwidth, not ALU's, and they have > 10x the bandwidth that the WiiU has.
The EDRAM alleviates the frame buffer bandwidth, and I'll be nice and assume it can be used for intermediate buffers, but it's still going to be an issue.

Could you elaborate a bit on this? Such as - "given how rendering is typically done today, the eDRAM probably reduces overall traffic to main RAM by somewhere between X and Y percent".
 
They are not exactely out of order, there is a catch to that, I don't remember exactely, may be something about only the first stages of the pipeline being able to be fed out of order / the tech guys here are going to make you a proper answer, sorry for the fail attempt though those CPU are not out of order as say the celeron in the first xbox.

The effectiveness of out of order execution depends a lot on implementation. (And of course what kind of code we're dealing with, and how well scheduled it is by the compiler.) Of course, the PPC750 was fairly lightweight in the resources it devoted to OoO execution, by Intels Ivy Bridge standards. How that relates to whatever the hell is in the WiiU is anybodys guess.

Additionally, as it says in the quote from Wikipedia:"The benefit of OoO processing grows as the instruction pipeline deepens and the speed difference between main memory (or cache memory) and the processor widens." Unless I misremember, the PPC750 had 5 pipeline stages.... couple that with a lowish clock and the extremely sympathetic memory subsystem on the WiiU. (Large caches that are fast compared to the core, and even implied access to the eDRAM on the GPU) It simply doesn't need to be very sophisticated, so chasing the last percentages probably wouldn't be worth the price in complexity and power draw.


Now, die size alone, Anandtech:33mm2 at IBM 45nm, makes it clear that we aren't dealing with Broadway - the Gekko was 40mm2 on 180nm which, assuming linear scaling, would mean 2.5mm2 on 45nm. Scaling is never linear, but the WiiU is four times larger than three Geckos shrunk together (and remember that's not just the cores but three times the IO circuitry as well). So if we are indeed dealing with enhanced Broadway cores it's impossible to tell just what enhanced means outside of multiprocessor support, BUT we can definitely say that there is die area enough to modify them fairly heavily. Frankly, the disparity in approximated and real die areas are enough for me to take the "enhanced Broadway" with a grain of salt.
 
Well the Xbox CPU and GPU dropped from ~2 x 180 mm^2 to one ~180 mm^2 SoC in two full process nodes (90nm to 45nm), so it's quite possible that scaling is rather less than linear.

What was the Wii CPU die size @ 90nm?
 
Status
Not open for further replies.
Back
Top