Wii U hardware discussion and investigation *rename

Status
Not open for further replies.
The 40nm eDRAM density is slightly higher than IBMs on 45nm SOI (0.067um2). On IBMs process, that works out to 0.24mm2 finished 1Mbit macros or 32MByte in 61mm2, so the same amount on UX8 might be 55mm2 or so, leaving on the order of 100mm2 for the rest of the GPU if initial size estimates are correct.

For whatever reason, bgassassin seems adamant that the GPU is made using finer lithography. I have no idea what he bases that on.
Cache requires a lot of additional logic. The Wii U eDRAM is no cache, so I think it should achieve a higher density. The cells itself require only ~16mm^2 for 32MB on UX8.
 
So... about eDRAM.

Takeda:
First of all, adoption of a multi-core CPU for the first time. By having multiple CPU cores in a single LSI chip, data can be processed between the CPU cores and with the high-density on-chip memory much better, and can now be done very efficiently with low power consumption...

...The GPU itself also contains quite a large on-chip memory.


So, I had asked in an earlier post if eDRAM was split, or pooled.
It looks like, to me, that they split it.
It does sound like the GPU will have more eDRAM then the CPU.
 
By having multiple CPU cores in a single LSI chip, data can be processed between the CPU cores and with the high-density on-chip memory much better, and can now be done very efficiently with low power consumption...
"On chip memory" can just mean shared CPU cache. He's talking about a multicore CPU which is a first for Nintendo who otherwise have lots of experience of multicore architectures sharing data over RAM. The comparison here isn't with other architectures, but Nintendo's history. What he's describing is literally Nintendo catching up with the rest of the world.
It does sound like the GPU will have more eDRAM then the CPU.
This is a no brainer. I repeat, eDRAM isn't needed for a console CPU but is needed for the GPU. One reason to use eDRAM in a CPU is lower density than SRAM, so that 2 MBs supposed CPU cache would be smaller and lower power if implemented in eDRAM. We may be looking at 2 MBs CPU cache + 32 MBs GPU eDRAM, or 32 MBs with 2 MBs shared. But I don't think so. Separating CPU cache from the die is just slowing the system down.
 
I wonder why bgassassin (and a few others) thought it was going to be closer to the next gen machines than the current consoles.

Not speaking for bgassassin, but lets be honest, we dont know the true performances of SONY and MS's next consoles. Alot of nice numbers are being rumored, but we dont know where or what bottlenecks will make those numbers in real world application irrelevent. Its not like this is the first time this has happened. Even Nintendo makes a point about it:

No matter how great the numbers are that you can boast, can you only draw that out under certain conditions, or can you actually draw out its performance consistently when you use it? Insisting on the latter way of thinking has always been at the root of hardware and system development at Nintendo.

Thats why they probably can't talk about performance numbers. Because they will lose that battle to due their design approach. But it doesn't mean they won't be able to produce games of the same calibur as SONY or MS. That's why I will say, I highly doubt PS4 or Durango will produce a game that will blow Nintendo's best WiiU game out of the water by the end of the generation.
 
It'll be extraordinary for MS or Sony to produce a console with such tiddly components as Wii U is using. That won't be a conventional console. We can comfortably consider that Wii U will be like Wii was this gen compared to next-gen - souped-up last gen hardware that is no rival for the performance of the other two. Nintendo's PR comments are as worthless in that respect as all the other PR we get (and I don't understand why you continue to place faith in it as meaningful).
 
in 3-4 years cell phones and tablets will be on 20nm, I doubt this will allow them to catch up the Wii U and its vastly high power budget. Anyway power isn't everything, Palm Pilot didn't threaten a 1989 Game Boy for gaming, because the Game Boy had gaming controls as standard and a real game library. Good luck getting your phone games planned around using them rather than targetting the 99% touch screen only devices.

Did the Palm Pilot start at ~$120, have an installed user base of hundreds of millions of users, a game library with several thousands of games that give hundreds of millions in profits to game publishers/developers, and did it have tens of gaming accessories, including dedicated gamepads that are being promoted by phone carriers?


No?

Kthxbye ;)
 
"On chip memory" can just mean shared CPU cache.

If you mean cache shared between the CPU and GPU, then no.
Clearly he states the CPU has on chip memory, as well as the GPU.

He's talking about a multicore CPU which is a first for Nintendo who otherwise have lots of experience of multicore architectures sharing data over RAM.

Working with a company, IBM, who implements eDRAM for the their own multi-core CPUs. A company who stated last year that the WiiU CPU cores will also be fed by their eDRAM.
I repeat, eDRAM isn't needed for a console CPU but is needed for the GPU.

It might not make sense for you, but for what ever reason, Nintendo & IBM find it usefull.
 
If you mean cache shared between the CPU and GPU, then no.
Clearly he states the CPU has on chip memory, as well as the GPU.
No, CPU L2 cache shared between all cores; the same as on every multicore CPU.

Working with a company, IBM, who implements eDRAM for the their own multi-core CPUs.
Yes. Thus I believe the L2 cache is eDRAM as opposed to SRAM. Its function is exactly the same as any CPU L2 cache, and nothing special or exciting to Wii U.

That is:
CPU - 2 MB eDRAM L2 cache shared between three cores
GPU - 32 MBs eDRAM workspace
It might not make sense for you, but for what ever reason, Nintendo & IBM find it usefull.
I mean large eDRAM quantities, the typical use for eDRAM which some are believing in here. eDRAM can be replaced with SRAM as cache, although it obviously brings something to the mix regards hitting Nintendo's targets.
 
Did the Palm Pilot start at ~$120, have an installed user base of hundreds of millions of users, a game library with several thousands of games that give hundreds of millions in profits to game publishers/developers, and did it have tens of gaming accessories, including dedicated gamepads that are being promoted by phone carriers?


No?

Kthxbye ;)

Samsung Galaxy Note II pictured in the article : 639.90 €.
nice try :p

There's certainly potential but issues as well. cheap phone with Android 2.3, cheap phone with Android 4.1, former high end phone stuck with Android 2.2, high end phone with Android 4.0 : have fun making an AAA title (or AA or A), choose a subset to support among those platforms, market the game to people who can run it and have a controller and do something other than phone, SMS, timer, failbook in the first place.
You get down to < 1 million users not 500 millions.
Unless this is really caught up by lotsa people in more than one country.
 
If Nintendo speak of 4AA at 720p The edram is probably in the 12Mo range? 16 Mo max? No need of 32Mo, if natif 4AA1080p is not the target?
And with a good scaler like, on 3.6, the visual in 1080p is largely good for the target audience.
 
In response to Al's deleted post (you cannot hide from me!)

The eDRAM latencies is something I looked up but couldn't find. Is eDRAM slower than SRAM? My assumption is 'yes' explaining why it hasn't seen wider use - otherwise it offers smaller and more power efficient at the same speed. Given Nintendo's focus, I think they'd choose smaller and more power efficient over lower latencies, which could explain why we see an eDRAM cache being adopted.
 
If Nintendo speak of 4AA at 720p The edram is probably in the 12Mo range? 16 Mo max? No need of 32Mo, if natif 4AA1080p is not the target?
And with a good scaler like, on 3.6, the visual in 1080p is largely good for the target audience.
16 bit HDR = 2 bytes x 4 channels - 8 bpp x ~1 million pixels ~ 8 megs for a non AA FB. x4 = 32 MBs for 4 samples per pixel.

You also don't just want the backbuffer in there. A Full 32 MB scratchpad eDRAM would be superb for many graphics tasks. You could put your particle textures in there and read/write to your heart's content. You can render to multiple rendertargets and have direct access to those buffers with no need to export/import from system RAM. I would anticipate Wii U being strong in graphical special FX if not raw polygon and pixel power. Making the most of that BW would also require a specialist engine, meaning ports not doing as well on the hardware at first.
 
No, CPU L2 cache shared between all cores; the same as on every multicore CPU.
This is nitpicking for sure, but anyway, not EVERY multicore CPU shares L2 between CPUs. Intel Core i-series do not share L2 with other cores. I don't think AMD K6-derivate CPUs do either. Older Core-series intel CPUs do share L2 (penryn, and older). Those CPUs only had two cores per physical silicon die though, and quad-core chips were built using two dies on one substrate, so each pair of cores shared their L2, and the other pair did not share with the first pair.

On AMD side, Bulldog cores do share...depending on how you define a core anyhow since a BD core/"module" is essentially two integer cores/pipes ganging up on one single float pipe, so it's a bit half-and-half so to speak.

...Okay. Original programming will now resume. :D
 
Cache requires a lot of additional logic. The Wii U eDRAM is no cache, so I think it should achieve a higher density. The cells itself require only ~16mm^2 for 32MB on UX8.

I don't think cache control logic is included in the IBM eDRAM macro cell size. However that eDRAM functions at 1.35ns and 1.5ns cycle and access time, which is simply beyond the needs of GPU eDRAM. The design doesn't have to maintain signal integrity at these high frequencies, and that definitely offers an opportunity to increase density. On the other hand, I don't know the specifics of Renesas design, it may be over-specced for this particular application.

Personally, I feel that the assumed/rumoured specifications (RV730 based (320 "shaders"), 500ish MHz, 32MB eDRAM at 40nm) do not really match up with the physical size of the GPU (roughly 150mm2) and the overall active power draw of 45W.
The RV730 at 600MHz and 55nm had a TDP of 25W (link), so at 40nm and 500MHz it should be in the 10-15W ballpark. Why would the console draw four times as much when active? That CPU isn't going to use much. Also, the shader array at 40nm should be around 50mm2, and we've pegged the eDRAM to probably take similar or less. The various small bits and bobs added shouldn't take much.

So I feel there is something a bit off. The rumors, as they stand, seem a bit low judging by the physical reality of the actual chip. The discrepancy isn't huge. But still.
 
16 bit HDR = 2 bytes x 4 channels - 8 bpp x ~1 million pixels ~ 8 megs for a non AA FB. x4 = 32 MBs for 4 samples per pixel.

You also don't just want the backbuffer in there. A Full 32 MB scratchpad eDRAM would be superb for many graphics tasks. You could put your particle textures in there and read/write to your heart's content. You can render to multiple rendertargets and have direct access to those buffers with no need to export/import from system RAM. I would anticipate Wii U being strong in graphical special FX if not raw polygon and pixel power. Making the most of that BW would also require a specialist engine, meaning ports not doing as well on the hardware at first.

True.
The design assumes heavy use of the eDRAM, otherwise they would have been better off just spending those gates on ALUs, and saved themselves R&D costs and additional time-to-market risks. It would be nice to hear someone from the graphics trenches expand a bit on how this could be utilized, and why making this decision with associated risks and obvious porting consequences made sense instead of just going the safe route with a more off the shelf part.
 
In response to Al's deleted post (you cannot hide from me!)

The eDRAM latencies is something I looked up but couldn't find. Is eDRAM slower than SRAM? My assumption is 'yes' explaining why it hasn't seen wider use - otherwise it offers smaller and more power efficient at the same speed.

DRAM and logic processes are optimized differently. For DRAM you want very low static power, for logic you want very high dynamic performance. DRAM processes typically has less than 1% leakage power compared to a typical logic processes.

Integrating the DRAM, you end up compromising both DRAM and logic performance. You also get a more costly process. Steps are needed to create the trench capacitor for the DRAM cells, and all the metal layer steps needed for the logic area are wasted on the DRAM cell.

The compromised performance has extra consequences in a console where you cannot bin and sell slower units at a discount. In order to maximize yield you'll need to provision for higher power consumption of your lower quality bins. This impact the cost of the entire system (cooling, reliability, PSU). I'm guessing that's why MS hasn't opted for integrating the eDRAM of Xenos; They don't need the performance so it is cheaper overall to have a separate die and spend a few dozen cents on adding a substrate to connect the CPU/GPU to the eDRAM die.

Cheers
 
Status
Not open for further replies.
Back
Top