Wii U hardware discussion and investigation *rename

Status
Not open for further replies.
Wait wait wait, in the previously posted tear-down there are 4 separate DDR3-1600 chips (each one 512MB) on the board. Would it be that each of them have a bandwith of 12.8 GB/s individually and so there is a total bandwith of 50.4 GB/s for the RAM, or would it just flatly be 12.8 GB/s?

If it's the first case, then I can see why the ports still came out the way they did. They were not developed to access the RAM in such a fashion, so the de-facto bandwith was very low.

Would this have anything to do with the DDR3 memory controllers that IBM uses with Power7? Because we dont know yet what technology the CPU shares with Watson.


Nehalem-EX has 4 buffered DDR3 channels per chip, where, using on-board buffers, every channel splits into two actual 64-bit DDR3-1333 DRAM paths. If the buffers had the abilities like FBD AMB (Advanced Memory Buffer) chips, you might be able to do simultaneous read and write transactions on each channel, effectively doubling the bandwidth. Either way, you're looking at some 50GBps of memory bandwidth per CPU chip, not bad at all.

In the case of POWER7, though, there are two 4-channel DDR3 memory controllers, for a total of 8 channels of memory and a claimed 100GBps total memory bandwidth.

Power7
Dual DDR3 Memory Controllers per chip. Each DDR3 Memory Controller:
8 KB scheduling window
Connects to up to 4 proprietary memory buffer chips. Differential-signaling interface. Buffer chip:
6.4 GHz * 2 bytes buffer chip -> Power7 chip bandwidth
6.4 GHz * 1.5 bytes buffer chip <- Power7 chip bandwidth
dual high-frequency DDR3 DIMM ports (DDR3: 800, 1066, 1333, 1600)

Maybe WiiU is using one of them?
 
Maybe WiiU is using one of them?

One what? One memory channel? One channel is 64-bit. Power7 has a 512-bit interface, hence 100GB/s with DDR3-1600.

The DRAMs in a DIMM are usually either x4 or x8 width each -> 16 chips x4 or 8 chips x8 = 64-bit I/O per DIMM. There is also memory bank switching so you can expand capacity without using more I/O.
 
No offense, but don't you think that if there was even a slight chance of what you are describing here to be true, that it would have already been pointed out.

12.8 GB/s is the total.

Actual speed can be debated, IMO Nintendo make quite a sacrifice by only using 64bit ddr3, so it wouldn't be wise to use their chips at their lowest rated speeed. Using them at 2133 makes sense, to hell with the tiny difference in power and reliability. Just use high CAS latency setting (but, with the higher frequency, the latency doesn't really move), suck it up.
But the ddr3 memory controller would have to work good.
 
Do we really know if it's bandwidth starved ? I mean, except for devs working on it, we don' t really know how the data are moved around the system, what are the real penalties for slow main memory, etc... ?
 
I think it's a safe assumption, unless there's actually like 256-512MBs of embedded memory instead of just 32MBs. 32MBs isn't enough to magically compensate for the DDR3's limitations.
 
An interesting post from sebbbi a few months back: http://forum.beyond3d.com/showpost.php?p=1646788&postcount=15

Let me explain why huge amounts of low bandwidth memory is not a good idea. Slow memory is pretty much unusable, simply because we cant access it :smile:

The GDDR3 memory subsystem in current generation consoles gives theoretical maximum of 10.8 GB/s read/write bandwidth (both directions). For a 60 fps game this is 0.18 GB per frame, or 184 MB, assuming of course that you are fully memory bandwidth bound at all times, and there's no cache trashing, etc happening. In practice some of that bandwidth gets wasted, so you might be able to access for example 100 MB per frame (if you try to access more, the frame rate will drop).

So with 10.8 GB/s theoretical bandwidth, you cannot access much more than 100 MB of memory per frame, and memory accesses do not change that much from frame to frame, as camera & object movement has to be relatively slow in order for animation to look smooth (esp. true at 60 fps). How much more memory you need than 100 MB then? It depends on how fast you can stream data from the hard drive, and how well you can predict the data you need in the future (latency is the most important thing here). 512 MB has proven to be enough for our technology, as we use virtual texturing. The only reason why we couldn't use 4k*4k textures on every single object was the downloadable package size (we do digitally distributed games), the 512 MB memory was never a bottleneck for us.

Of course there are games that have more random memory access patterns, and have to keep bigger partitions of game world in memory at once. However no matter what, these games cannot access more than ~100 MB of memory per frame. If you can predict correctly and hide latency well, you can keep most of your data in your HDD and stream it on demand. Needless to say, I am a fan of EDRAM and other fast memory techniques. I would always opt for small fast memory instead of large slow memory. Assuming of course we can stream from HDD or from flash memory (disc streaming is very much awkward, because of the high latency).

Would have to wonder if the 32MB eDRAM changes much considering how much of it will be used for just render targets and various buffers (post-processing/offscreen/shadows etc).

At the same time, I'd wonder about the ability to handle larger assets (i.e. texture filtering, triangle raster rate) anyway. hm...
 
It's the same specs as the Samsung & Hynix chips; it's just JEDEC standards (DDR3-1600K bin, K=11th letter in the alphabet; incidentally, it's also the least aggressive timings specified for DDR3-1600). The data sheets for them are on their websites too.
 
An interesting post from sebbbi a few months back: http://forum.beyond3d.com/showpost.php?p=1646788&postcount=15

Would have to wonder if the 32MB eDRAM changes much considering how much of it will be used for just render targets and various buffers (post-processing/offscreen/shadows etc).

At the same time, I'd wonder about the ability to handle larger assets (i.e. texture filtering, triangle raster rate) anyway. hm...
That's not really my area of expertise, but I just watched a comparison between Darksiders 2 on 360 and Wii U, and the Wii U version apparently has an extended texture draw distance. Wouldn't that require more bandwidth?
 
It's the same specs as the Samsung & Hynix chips; it's just JEDEC standards (DDR3-1600K bin, K=11th letter in the alphabet; incidentally, it's also the least aggressive timings specified for DDR3-1600). The data sheets for them are on their websites too.
I perfectly aware of that, just linked an info about the Micron rams what we can confirm from the pictures, and what was not referenced here before, that's all.
 
Textures switching from lower resolutions to higher resolutions at a greater distance.

You mean this?


Something's quite off in the first part with the ground texture on WiiU (missing normal map or just lower res texture? The video is compressed to hell). edit: is there a day/night cycle?

The last part of the video looks like they increased the cascade shadow map res a bit (cascade switches a bit further out). The video is again too blurry to tell much about texture filtering (other than they're both equally shit).

---

Anyways, yes higher filtering modes does mean increased bandwidth consumption via increased texture sampling requests to main memory. On WiiU? Who knows how the texture cache is setup, but even so, anything within the last few years is significantly larger than the 32kB that Xenos texture units have. A larger texture cache would reduce the amount of traffic to the larger memory pool.

The other thing to consider is also filtering rates.
 
So I've been thinking about chip sizes and that. We don't know that TSMC are making the GPU, but just for the sake of throwing some numbers around, according to this (with "Taiwan Semicond. Manuf. Co. Ltd." listed as one of the authors):

http://ieeexplore.ieee.org/xpl/logi...re.ieee.org/xpls/abs_all.jsp?arnumber=5872231

... you're looking at 0.145 mm^2 per Mbit of edram on their 40nm process. So that's 37.12 mm^2 for the rumoured 32MB.

With a die size of 156.21 mm^2 for the GPU:

Give mucho thanks and condolences to Anand for sacrificing his unit the photos and die size measurements. Wii hardly knew U. :p

CPU: 32.76mm^2
GPU: 156.21mm^2
3rd die: 2.65mm^2

... that leave you with about 119 mm^2 for everything else.
 
Well that pretty much excludes the RV740 due to the fact that the chip was 137mm^2, doesn't it?
With a few SIMDs removed (and the DP support nixed), it could be an option. But why not using a Redwood (104mm²) or Turks (118mm²) as base right from the start (better suited for GPGPU)? Remove half of the mem controller (-10mm²) and you probably have some space for a bit SB functionality/connectivity. That is integrated there too, isn't it? Or is it in the CPU chip?

Edit:
I wouldn't want to remove ROPs. With a fast eDRAM on die, one would loose too much for the small die size saving in my opinion.
 
Already mentioned. :p

(Actually, awhile back too)

Thought I was actually being useful for once! o_O

I posted on that damn page too! Post without reading to the end of the thread, then forget to read the rest of the thread, then just continue reading from your last post next time, because, like, that must have been where you read up to last. :???:

Well that pretty much excludes the RV740 due to the fact that the chip was 137mm^2, doesn't it?

Well RV740 has a double width data bus, and it's GDDR5 which requires more pins iirc, so maybe you could squeeze it into less than that in the WiiU. But on the other hand the edram would need some kind of controller and CPU bus (presumably wider than PCI-E), and then there's Nintendo's Audio DSP and Arm processor (or is that the 3rd tiny die?) and probably some other stuff too so ... I dunno.

Would seem odd for Nintendo to skimp on the CPU, memory bandwidth, power and cooling, and actually pack in a fairly fast GPU. And with the main memory bandwidth being so low would 32 texture units (double the 360s) really be worth it?
 
Status
Not open for further replies.
Back
Top