Wii U hardware discussion and investigation *rename

Status
Not open for further replies.
Last rumor I heard was 3MB L2, which people inexplicably said was split with 2MB for the "fat" core and 512KB for the other 2. Seems to me like 1MB each makes more sense, but I'm not a secret keeper.

cache could shared, as with the Xenon, so there's no real "size per core". with a lock mechanism you can well do 2048K + 2x 512K, which is mildly reasonable if you run the main rendering / game thread on the first one and vector/throughput things on the other ones..

but maybe you partition in whichever way.
 
cache could shared, as with the Xenon, so there's no real "size per core". with a lock mechanism you can well do 2048K + 2x 512K, which is mildly reasonable if you run the main rendering / game thread on the first one and vector/throughput things on the other ones..

but maybe you partition in whichever way.

It's possible these are the cache sizes that are physically local to each core, not unlike how each Power7 core has a fast link to 4MB of eDRAM while still being part of the larger shared L3 - makes you wonder what the layout of the chip is.
 
How much affect the console power, if Wii U has a Power7 CPU?

What "Same SOI design" mean? Can the Wii U' CPU be "weak" even if based on Power7?
Well versus the a ppc 47x derivative or worse a enhanced Broadway?
I would say that per cycle the power 7 is going to be massively faster.
It's way wider, more aggressively out of order, can issue up to 8 instruction to the execution units.
It has fast L1 and L2 (vs a L2 clocked at half speed vs 47x). The thing should sustain significantly higher IPC than a ppc 47x or a enhanced Boradway (whatever it would be). Per cycle I would not be surprised if the thing is closer to Ivy Bridge level of performance per cycle than anything on the market.

Then there are FP performance, massive performances gap. Broadway /ppc 476 have paired FPU / 2 wide SIMD. They should be able to do a multiply and an add on on 32bit elements per cycle =>4 FLOPS. A power7 core can execute 2 VSX ( which seems to include FMA) instructions on 8 elements that's 16 FLOPS per cycle.

Power7 have built in power management feature akin to AMD/Intel CPUs.
The thing is not in the same ballpark, it's a monster more akin to Intel cpu than anything else.

It's possible these are the cache sizes that are physically local to each core, not unlike how each Power7 core has a fast link to 4MB of eDRAM while still being part of the larger shared L3 - makes you wonder what the layout of the chip is.
L3 local cache (Fast-L3 Region cache) = up to 4 MB (eDRAM), 128 B/line, Policy: Partial Victim
L3 cache = 32 MB per chip (eDRAM) consist of LOCAL-L3 from another cores, 128 B/line, (maybe sends 2 lines for read request). Policy: Adaptive victim
I don't think they would touch the L2 as it's part of the core, it would completely change the lay out of the core. Lot of work for not much benefit.
I could see one core having 2MB L3 local cache, the two others 512KB each.
 
I don't think that the power7 is only intended at massive data workloads. IBM only use it (used it) in high end server configuration but I don't see why the CPU could not be used in lesser set-up.
As I've just posted in tat other Wii U thread, a customised POWER7 could be anything. If it runs the same code but lacks most of execution units and most of the cache and clocks lower, than the resultant chip will be a POWER7 in name and DNA, but not in terms of what it is. Like saying the GPU is an nVidia GForce 6 series GPU. That could be anything from GT630 to a GTX690. A PR tweet 'PS4's new GPU is based on the same nVidia 600 series GPUs found in the Alienware Aurora R4' wouldn't be lying if PS4 came with an underclocked GT 630.

All this POWER7 talk is just confusing things, and TBH I think that's what it's there for. It's PR word-smithing, giving people vague ideas and leaving them to fill in the blanks according to preference. If Wii U had a monster CPU in it, it'd be good PR to come out and be clear about that. If it hasn't but people are confused and thinking it has, there's nothing to be gained from setting the record straight. Why would Nintendo/IBM respond with, "no, it's nothing like a POWER7 in Watson. Sure, it has the same ISA and can run the same code, but it's a significantly smaller, cooler, less powerful part with a tiny fraction of the mathematical throughput."
 
Not sure if this is the right place for this question but:

What is required to make a CPU backwards compatible? I assume having at least the same instruction set is a must but after? Intel/AMD have made huge changes in their CPU's over the years but modern CPU's can still run code from the 90's. Is this the doing of the O/S or is possible for a modified Power7 chip to run code design for the Wii?
 
Seems strange though. Why have two different core types unless you're trying to gun for a broad performance profile like CELL, i.e. conventional CPU-like core for CPU workloads, and large vector engine (like the SPEs)? If the rumours of poor CPU vector performance are true (and I believe them to be so), then what else could be the point of having an asymmetrical CPU with out the SIMD grunt.
It doesn't strike me as strange; it strikes me as stupid. ;) But then upon reflection maybe it isn't.

That is, only certain workloads are wanting parallel execution. Other workloads want the best single-thread performance you can manage. So what if the design is something like, "take a Xenon, but beef up one of the cores"? Although the problem there is you can't really beef up a core without adding execution units. Single threaded performance is all about data throughput, branch handling, and high clocks. Big chips with lots of execution units are handling multithreaded workloads. Now I don't know enough about code execution on CPUs to know if the units of a POWER7 such as the arithmetic unit, decimal FPU, and dual fixed-point units, will come together to speed up single thread general code performance. If they do, then there's a reasonable design philosophy behind a mixed design. Otherwise, it just seems like added complication. Cell needed it as it was specialist cores, but mixing two flavours of general purpose cores is just making trouble for yourself. No-one has ever done that AFAIK beyond including another CPU as a BC part and repurposing it for specific tasks. You can't buy an Intel CPU with a mix of i7 and i3 cores, for example. Then again, looking at something like the DS with dual ARMs, maybe Nintendo like the idea of a powerful core and ancillary cores?
 
It doesn't strike me as strange; it strikes me as stupid. ;) But then upon reflection maybe it isn't.

That is, only certain workloads are wanting parallel execution. Other workloads want the best single-thread performance you can manage. So what if the design is something like, "take a Xenon, but beef up one of the cores"? Although the problem there is you can't really beef up a core without adding execution units. Single threaded performance is all about data throughput, branch handling, and high clocks. Big chips with lots of execution units are handling multithreaded workloads. Now I don't know enough about code execution on CPUs to know if the units of a POWER7 such as the arithmetic unit, decimal FPU, and dual fixed-point units, will come together to speed up single thread general code performance. If they do, then there's a reasonable design philosophy behind a mixed design. Otherwise, it just seems like added complication. Cell needed it as it was specialist cores, but mixing two flavours of general purpose cores is just making trouble for yourself. No-one has ever done that AFAIK beyond including another CPU as a BC part and repurposing it for specific tasks. You can't buy an Intel CPU with a mix of i7 and i3 cores, for example. Then again, looking at something like the DS with dual ARMs, maybe Nintendo like the idea of a powerful core and ancillary cores?

Well that's exactly where we are heading. AMD recently announced that they will be using an ARM core in their x86 designs for TrustZone security. Some ARM A15 base SoC will have an A7 chip chip to deal with low power requirements- Tegra3 already has a similar feature. So mixing and matching CPU's isn't that unusual.
 
Posted on NeoGAF:

756 : 名無しさん必死だな : 2012/07/13(金) 02:40:15.75 ID:RXdSjXLc0
これマジスペックな

CPU PowerPC 476FPカスタム 3コア 1823MHz
2次キャッシュはコアごとに独立しており2MB x1、0.5MB x2
かなり微妙だが、クロックあたりの性能はまあまあ

GPU RV730カスタム 320SP 608MHz 389GFLOPS 16MB混載DRAM
素直に使うとXbox360と比べて1.6倍速い。
ポストプロセスAAをサポートしているので、VRAMを節約できる点が非常に大きい。
各社のゲームエンジンがDeferred Shadingに移行しつつあるので
近い将来、VRAM不足でMRTが使えないXbox360の寿命が尽きると思われるが、
WiiUはなんとかギリギリで次世代の入り口に立っている。

メモリは1.5GB DDR3 内0.5GBはOSが占有
いつもの任天堂なら高価なメモリを少量だが、今回はちょっと違う。
Google:

(Fri) 2012/07/13 02:40:15.75 ID:: RXdSjXLc0 Desperation nameless: 756
Seriously this a spec

Custom 3 core 1823MHz CPU PowerPC 476FP
2MB x1, 0.5MB x2 secondary cache for each core is independent
Subtle but significant, performance per clock is enough

GPU RV730 custom 320SP 608MHz 389GFLOPS 16MB embedded DRAM
1.6 times faster than the Xbox360 and I use obediently.
Because it supports the post process AA, except that you can save VRAM is very large.
I think that in the near future, the life of Xbox360 can not be used due to lack of VRAM MRT is exhausted because their game engines are moving to Deferred Shading,
WiiU is standing at the entrance of the next generation somehow barely.

0.5GB 1.5GB DDR3 but in a small amount of memory if the OS expensive Nintendo always occupied, the memory a little different this time.
http://www.pssokuhou.jp/archives/14821058.html

EDIT: It say 16MB embedded ram, and Wii U have 32Mb.
 
Even if it does have a more powerful GPU than the Xbox 360 it may still offer similar performance due to the fact that in the Xbox 360 the CPU lends a hand for rendering purposes and the overhead for the extra tablet.

From the above rumour I would say that the Wii U is only between 1.2-1.6 times more powerful than the Xbox 360.
 
With minimal rounding, 1823MHz is 2.5x Wii CPU clock (729MHz), 608MHz is 2.5x Wii GPU (243MHz).

So just how does BC mode work exactly :?: *ahem*

Wonder if the specs are old (assuming it's not made-up), though I scarcely recall when 32MB eDRAM was actually confirmed.
 
How do you figure 1.6x? AFAIK, 320 shaders vs 192 shaders = 66.6% increase, 608 mhz vs 500 mhz = 21.6% increase. 166.6 + 21.6% = 202.6% 2.02 times that of the Xbox 360, in terms of shaders, anyway.

I wonder if they've added VMX units on to those CPU cores, off-hand, it would be a little slower than Xenon in terms of instructions/second, though it should theoretically have higher utilisation being lower clocked and OOOe. If it has VMXs added per core then I'd say its on par with Xenon, perhaps even a bit better, but I'm gonna guess it doesn't have VMX units, which would put it lacking on the FP throughput front compared to Xenon.
 
Given the RAM doesn't align with the few specs Nintendo has released I'd assume that the rest is probably not accurate.
Anything that refers to actual PC graphics chips like RV730 or actual released CPU parts I would assume to be guesswork.
 
How do you figure 1.6x? AFAIK, 320 shaders vs 192 shaders = 66.6% increase, 608 mhz vs 500 mhz = 21.6% increase. 166.6 + 21.6% = 202.6% 2.02 times that of the Xbox 360, in terms of shaders, anyway.

I wonder if they've added VMX units on to those CPU cores, off-hand, it would be a little slower than Xenon in terms of instructions/second, though it should theoretically have higher utilisation being lower clocked and OOOe. If it has VMXs added per core then I'd say its on par with Xenon, perhaps even a bit better, but I'm gonna guess it doesn't have VMX units, which would put it lacking on the FP throughput front compared to Xenon.

I thought the 360's GPU has 240 shaders.
 
As I've just posted in tat other Wii U thread, a customised POWER7 could be anything. If it runs the same code but lacks most of execution units and most of the cache and clocks lower, than the resultant chip will be a POWER7 in name and DNA, but not in terms of what it is. Like saying the GPU is an nVidia GForce 6 series GPU. That could be anything from GT630 to a GTX690. A PR tweet 'PS4's new GPU is based on the same nVidia 600 series GPUs found in the Alienware Aurora R4' wouldn't be lying if PS4 came with an underclocked GT 630.
Well what you describe is an ISA. I pointed out some pages ago that actually the ppc47x and the power7 doesn't share the same ISA (it's mostly the same though).
I don't get why you would start from something as big as the Power 7 to end up with something more akin to a broadway or a 476. I mean those marketing people might be lying but a power7 is a power7, the ISA is Power ISA v2.06 (2.05 for the ppc 476, something custom for a broadway).
Those are details in the gran scheme of things still I would be wary about saying that IBM is bending words to that extend (which is close to lying).

Their previous statements were imho not as precise as custom 45nm power7 chip. It's the third time they go to it.
All this POWER7 talk is just confusing things, and TBH I think that's what it's there for. It's PR word-smithing, giving people vague ideas and leaving them to fill in the blanks according to preference. If Wii U had a monster CPU in it, it'd be good PR to come out and be clear about that. If it hasn't but people are confused and thinking it has, there's nothing to be gained from setting the record straight. Why would Nintendo/IBM respond with, "no, it's nothing like a POWER7 in Watson. Sure, it has the same ISA and can run the same code, but it's a significantly smaller, cooler, less powerful part with a tiny fraction of the mathematical throughput."
Well I do agree it's indeed confusing, but Nintendo may have its reasons no matter i disagree with their policies.
Think for the masses, do you think that if IBM could put together into a sane power envelop 3 power7 core and few MB of cache, the thing running at low speed (say like my A8-3500M ~1500MHz) it would look like a monster to the general audience?
It has low clock speed, lower throughput than the cell, lower than xenos too (thanks to the dot product instruction a xenon a twice the speed beat it in PR throughput...). In marketable terms it ain't that great.

I'm not sure that for the average Joe that has no specific interest in Nintendo IP and want to have the biggest things in town, Nintendo coming out of the wood to state such a thing would be that great. They would also have to disclose the specs of the GPU,etc.
Rumors have it that Durango have 8, 16 cores. Both the ps4/xbox next are rumored to have GPU in the +1 teraFLOPS range, etc.

For the sake of the discussion I've been researching existing CPU that could have close characteristic. The closest I found is the Phenom II x3 P820 with a TDP of 25 Watts.
I would not be surprised if a power7 is better per cycle but still, assuming a 1.6 GHz frequency, so a more than 10% frequency deficit, I could see performance being close with an edge in FP for the power7. Still it's hard to depict the thing has a monster (once you've considered clock speed and core counts and what is available in the pc realm).
I could not find proper gaming bench for that CPU (and it's not a ceteris paribus comparison) though looking at this techreport article, I suspect it's tough to pass the CPU as gaming monster.

Wrt to power consumption power7 have an edge on phenom II, I suspect that tdp can be fixed and the chip dynamically adapt.

Honestly I don't know but IBM insistence is troubling to say the least. W&S
 
How do you figure 1.6x? AFAIK, 320 shaders vs 192 shaders = 66.6% increase, 608 mhz vs 500 mhz = 21.6% increase. 166.6 + 21.6% = 202.6% 2.02 times that of the Xbox 360, in terms of shaders, anyway.

I wonder if they've added VMX units on to those CPU cores, off-hand, it would be a little slower than Xenon in terms of instructions/second, though it should theoretically have higher utilisation being lower clocked and OOOe. If it has VMXs added per core then I'd say its on par with Xenon, perhaps even a bit better, but I'm gonna guess it doesn't have VMX units, which would put it lacking on the FP throughput front compared to Xenon.

384 Gigaflops/240 Gigaflops.


Honestly, those are the most reasonable specs that I have seen. A 384 gigaflop GPU in 40nm should be around 16-20 gigaflops/watt , that would put the GPU around 16-20W so probably around half the power budget. Another 15 for the CPU and another and another 5 for the Wifi and Disc drive. There's your 45W typical power draw.
 
384 Gigaflops/240 Gigaflops.


Honestly, those are the most reasonable specs that I have seen. A 384 gigaflop GPU in 40nm should be around 16-20 gigaflops/watt , that would put the GPU around 16-20W so probably around half the power budget. Another 15 for the CPU and another and another 5 for the Wifi and Disc drive. There's your 45W typical power draw.

If they are really 476's at that clock, probably only 2w a core, maybe another watt for cache. 7-8w tops on the CPU side. Unless they are more custom than we think, which I would bet against.
 
With minimal rounding, 1823MHz is 2.5x Wii CPU clock (729MHz), 608MHz is 2.5x Wii GPU (243MHz).

So just how does BC mode work exactly :?: *ahem*

Wonder if the specs are old (assuming it's not made-up), though I scarcely recall when 32MB eDRAM was actually confirmed.

Clocks at an exact 1.5x, 2x etc. versus the former hardware are only useful when the hardware is identical, or the former hardware is a subset of the new one. i.e. Wii and Gamecube, Commodore 128 and 64, Game Boy Color and monochrome.
You drop at the older clock and feature level and have insta-compatibility. But here the CPU is all new so the timings would be wrong even if you run it at Wii frequency.
 
Status
Not open for further replies.
Back
Top