Wii U hardware discussion and investigation *rename

Status
Not open for further replies.
Are you sure it isn't 256 Gbit/s, which would be 32 GByte/s? If it's really 256 GByte/s then isn't that an extreme amount of overkill for the pixel pushing power? You won't see that a total total bandwidth like that on any of the new consoles, and they are showing up 6-8 years later.

So, with 256GB/s of bandwidth available in the eDRAM frame buffer there should always be sufficient bandwidth for achieving 8 pixels per clock with 4x Multi-Sampling FSAA enabled and as such this also means that Xenos does not need any lossless compression routines for Z or colour when writing to the eDRAM frame buffer.

http://www.beyond3d.com/content/articles/4/4
 
You won't see that a total total bandwidth like that on any of the new consoles, and they are showing up 6-8 years later.
That bandwidth is not really not all that special for an on-chip buffer; the PS2 had 48GB/s to its eDRAM back in 1999, that's pretty crazy when you think about it. High-end PC boards at that time had only a small fraction of that.

The reason the 360 has 256GB/s is to provide "free" alpha blending and 4-sample multisample antialiasing, not that the latter is used much, due to the additional issues with tiling, which is required at any decent resolution + antialias.
 
That bandwidth is not really not all that special for an on-chip buffer; the PS2 had 48GB/s to its eDRAM back in 1999

Both the PS2 and 360 seem special in that regard to me, and it allowed them both to outperform newer and more expensive systems wrt z and transparent fillrate.

I don't see the Wii U having that kind of on chip BW even 7 years later. It might not even have the 360's off-chip bus BW for its on-chip buffer - if it did it shouldn't be having more trouble than Llano at 720p (or lower) resolutions.
 
Are you sure it isn't 256 Gbit/s, which would be 32 GByte/s? If it's really 256 GByte/s then isn't that an extreme amount of overkill for the pixel pushing power?
It's 256 GBs for the ROPS only. That's equivalent to counting cache bandwidth for a CPU - eg. SPUs have 143/47 GB/s read/write from LS. The ROPs have 256 GB/s for reading and writing values, which is used for drawing the buffers. The rest of the GPU has 32 GB/s read/write to eDRAM. That means if you want to texture, you can read 22.4 GB/s from main RAM and 32 GB/s from eDRAM, but not 256 GB/s.

We had many discussions on this architecture back in the day. I'm not sure what the end understanding for GPU<>eDRAM BW was, whether it had a full 32 GB/s read available or not (there was a 32GB write/16 GB read BE once upon a time), and there was discussion about what use the eDRAM was to the overall render pipeline. Dave's article certainly says it's 32 GB/s bidirectional, and he should know. ;) Suffice to say, the GPU is split into two parts - shader units and ROPS. The shader units have main 22.4 GB/s RAM and 32 GB/s eDRAM available to them. The ROPs have 256 GB/s available to them. This is effectively cache BW, only using eDRAM instead of SRAM to have lots of it. If a GPU was released with 10 MBs SRAM cache connected to the ROPs, it'd be effectively the same thing.
 
I believe it''s very likely limited in WiiU because all the ports are running at the same resolution as the 360 version. Turning up the resolution or enabling multisampling is trivial unless some aspect of the hardware gates performance, either that is that the WiiU is limited to 8Rops like 360 or because the bandwidth to EDRAM is limited.
If it only has 8Rops there is no point in having much moer memory bandwidth to EDRAM than about ~25GB/s, unless it can also be used as source for textures.

On 720if you believe the current rumors it would need AT LEAST 100GB/s out of the EDRAM to be competitive.

Hello ERP and fellow posters. If I may just chime in for a moment with my two cents on the matter. I surely don't have the in depth understanding as some on here, but I have followed the topic quite closely, and tried my best to weed out the nonsense.

Of the two scenarios you present, I find it hard to believe that eDRAM bandwidth is limited. I believe wsippel (whose overall conclusions I disagree with, but I don't dispute his findings here) dug up some info on Renesas' UX8GD eDRAM and with 1 MB modules each carrying a 128-bit bus, Wii U's bandwidth could be in the ballpark of ~282 GB/s. There are other possibilities as well, but this figure does not seem outrageous as it falls in the middle range of configurations and having 32 distinct pools should help keep latency low.

Furthermore, it seems to me that this eDRAM must be capable of storing textures. As Wii U boasts 100% backwards compatibility with Wii, Flipper/Hollywood's 1 MB texture cache must be accounted for. Integrating that into an R700 L1 or L2 cache seems needlessly complex, and so I'm thinking software might emulate the cache logic aspect, meanwhile pulling textures from the eDRAM. The hypothesis that textures can be pulled from the 32 MB in proper Wii U titles is strengthened by the end results of launch ports. As others have noted, would they even be possible at merely 12.8 GB/s?

I also keep going back to the ratio of shader cores to texture units found in the R700 series and how they scale. With Wii U, Nintendo seem to have gone "cache crazy" and it so happens that the GPUs in question scale favorably in regards to texture capabilities. For the commonly suggested 320 SPUs, AMD allots 32 TMUs and 8 distinct L1 texture caches. This seems like overkill perhaps, and AMD scaled back the TMUs in later series, but it does not seem all unlikely Nintendo may have found this configuration in line with their emphasis on efficient memory hierarchy.

Taking all this into account, I believe Wii U's apparent parity with current gen resolutions and the like are most likely explained by the eDRAM being forced into multiple duties; framebuffer and texture storage being among them. Perhaps the greatest hint is in the leaked specs from vgleaks, where the eDRAM is referred to as MEM1. This labeling points to general usage for a variety of tasks and, with 32 MB of the stuff, is the most logical explanation for the lack of MSAA or upgraded resolutions.
 
Renesas' UX8GD eDRAM and with 1 MB modules each carrying a 128-bit bus, Wii U's bandwidth could be in the ballpark of ~282 GB/s.
Such hopeful inferrences of non-facts and speculation often turn out not only to be wrong, but also picked up by wishful thinkers and fanboys and repeated as (near, if lucky) fact, not necessarily here, but certainly on other boards, only to come back this way again some time later. We've seen it many times.

As the wuu has not been advertised to have "free" multisample antialiasing like xenos, this bandwidth figure is most likely grossly overinflated compared to reality.

Furthermore, it seems to me that this eDRAM must be capable of storing textures. As Wii U boasts 100% backwards compatibility with Wii, Flipper/Hollywood's 1 MB texture cache must be accounted for.
As the eDRAM in wuu isn't the same as the texture cache of flipper, it really doesn't matter if it stores textures or not. Emulation either relies on full register/hardware compatibility throughout (like when hollywood runs gamecube games), or quite possibly not at all (like when MAME runs anything, basically). IE; the presence of a dedicated texture cache is not an issue. Modern GPUs don't require large texture caches, and could quite easily emulate a late-90s design like flipper with no issues whatsoever. Actually, having to manually manage a smallish memory space is more of a hindrance than a help in many cases.
 
Such hopeful inferrences of non-facts and speculation often turn out not only to be wrong, but also picked up by wishful thinkers and fanboys and repeated as (near, if lucky) fact, not necessarily here, but certainly on other boards, only to come back this way again some time later. We've seen it many times.

As the wuu has not been advertised to have "free" multisample antialiasing like xenos, this bandwidth figure is most likely grossly overinflated compared to reality.

I assure you I am not much "hopefull" for anything regarding Wii U at the moment. The information I passed on was not an inference but a calculation using publicly available information from Renesas. If I were that hopeful, I would have suggested a 8192 bit bus, as it is also possible to pair 1 MB of the eDRAM with a 256-bit interface. As it is, my suggestion is right in line with the bandwidth achieved on Xenos' daughter die between ROPs and eDRAM.

Meanwhile, this comes straight from the vgleaks specs sheet, which has been verified by multiple independent sources:

vgleaks said:
32MB high-bandwidth eDRAM, supports 720p 4x MSAA or 1080p rendering in a single pass.

This is the most solid info we have besides actual game performance and the clock speeds. Why choose to ignore it?


Grall said:
As the eDRAM in wuu isn't the same as the texture cache of flipper, it really doesn't matter if it stores textures or not. Emulation either relies on full register/hardware compatibility throughout (like when hollywood runs gamecube games), or quite possibly not at all (like when MAME runs anything, basically). IE; the presence of a dedicated texture cache is not an issue. Modern GPUs don't require large texture caches, and could quite easily emulate a late-90s design like flipper with no issues whatsoever. Actually, having to manually manage a smallish memory space is more of a hindrance than a help in many cases.

The possibility of using the eDRAM to emulate Flipper's 1t-SRAM is not my own, but was suggested on neogaf by blu, who has coded for the Gamecube. Similar to how 3DS emulates DS, it is not an "all or nothing" approach to hardware BC, but uses the hardware to create a similar enough environment that the software emulator doesn't get hung up, as often happens in pure software solutions like Dolphin. And surely you aren't saying that Wii U boasts the grunt of a PC capable of running Dolphin and all Wii games 100% perfectly.
 
It's 256 GBs for the ROPS only. That's equivalent to counting cache bandwidth for a CPU - eg. SPUs have 143/47 GB/s read/write from LS. The ROPs have 256 GB/s for reading and writing values, which is used for drawing the buffers. The rest of the GPU has 32 GB/s read/write to eDRAM. That means if you want to texture, you can read 22.4 GB/s from main RAM and 32 GB/s from eDRAM, but not 256 GB/s.

We had many discussions on this architecture back in the day. I'm not sure what the end understanding for GPU<>eDRAM BW was, whether it had a full 32 GB/s read available or not (there was a 32GB write/16 GB read BE once upon a time), and there was discussion about what use the eDRAM was to the overall render pipeline. Dave's article certainly says it's 32 GB/s bidirectional, and he should know. ;) Suffice to say, the GPU is split into two parts - shader units and ROPS. The shader units have main 22.4 GB/s RAM and 32 GB/s eDRAM available to them. The ROPs have 256 GB/s available to them. This is effectively cache BW, only using eDRAM instead of SRAM to have lots of it. If a GPU was released with 10 MBs SRAM cache connected to the ROPs, it'd be effectively the same thing.
Xenos' shader units don't have the ability to read from edram so they have only the 22.4 GB/s of bandwidth. When there's a render to texture the texture must be copied from edram to main memory before it can be used as a texture. This restriction is something many developers didn't like. I believe there's a thread where Humus discusses this.
 
Xenos' shader units don't have the ability to read from edram so they have only the 22.4 GB/s of bandwidth. When there's a render to texture the texture must be copied from edram to main memory before it can be used as a texture. This restriction is something many developers didn't like. I believe there's a thread where Humus discusses this.
You may well be right. I remember at the time there was a clear difference between Xenos eDRAM and PS2's, and the ability to freely R/W to it as I've described didn't gel with my feelings about that, but I'm unclear (hell, we were all damned unclear for ages after we were told it anyhow!). It's all in that thread. Or another. ;)
 
The information I passed on was not an inference but a calculation using publicly available information from Renesas.
Yes, but there's not the slightest shred of evidence that says that information relates in any way whatsoever to the edram integrated into wuugpu. That's where your inferrence fails. It's just baseless speculation, it has no foundation.

As it is, my suggestion is right in line with the bandwidth achieved on Xenos' daughter die between ROPs and eDRAM.
Yes, but as mentioned before, that bandwidth exists solely to support single-cycle 4x antialias combined with alpha blend. If wuu isn't designed with that capability in mind there'd be no point to have that much bandwidth at hand, especially if it only has 8 ROPs as suspected. As a comparison, the contemporary (from wuugpu's base technology) radeon 4890 has 16 ROPs and 128GB/s bandwidth.

This is the most solid info we have besides actual game performance and the clock speeds. Why choose to ignore it?
I didn't say you should ignore it; that information is years old and is nebulously diffuse. Nothing specific can be derived about the hardware itself beyond what is stated from that tiny fragment of information.

The possibility of using the eDRAM to emulate Flipper's 1t-SRAM is not my own, but was suggested on neogaf by blu, who has coded for the Gamecube.
Yeah, but as mentioned, you don't need to emulate the texture cache specifically, in hardware, to run wii games on another system. You just let the game think it moved textures into the cache, while in reality it just sits right where it always was in main memory, and the GPU renders transparently from there with no penalty - since that's exactly what it's designed to do. Flipper on the other hand probably can't texture directly from main RAM at all, IE it would be limited in capability compared to modern GPUs.

And surely you aren't saying that Wii U boasts the grunt of a PC capable of running Dolphin and all Wii games 100% perfectly.
Why not. The underlying wii hardware is almost 15 years old by now. The wuu CPU is still code compatible to wii from what we know, it should be able to run the game logic directly with little to no overhead. The differences in GPU, sound and I/O can be offloaded to a software abstraction layer, running on the other CPU cores. Remember that dolphin is a hobbyist amateur project, nintendo obviously has a lot more resources to dedicate to their own emulator, including full hardware documentation of both systems and so on.
 
Yes, but there's not the slightest shred of evidence that says that information relates in any way whatsoever to the edram integrated into wuugpu. That's where your inferrence fails. It's just baseless speculation, it has no foundation.

Why is speculation so bad? If we didn't speculate the thread would be 2 pages long and finished months ago. If it's pulled out of someone's arse sure, but there are some pieces of information (past partners etc) that lead to such speculation.



Yes, but as mentioned before, that bandwidth exists solely to support single-cycle 4x antialias combined with alpha blend. If wuu isn't designed with that capability in mind there'd be no point to have that much bandwidth at hand, especially if it only has 8 ROPs as suspected.

Do we really have evidence of it only having 8 ROPS? Serious question because I see it mentioned from time to time by don't understand why people are limiting it to 8.
 
Do we really have evidence of it only having 8 ROPS? Serious question because I see it mentioned from time to time by don't understand why people are limiting it to 8.

Well we don't have any evidence of it a having more, put it that way. It runs like an 8 ROP system with BW issues when using MSAA and high levels of transparent overdraw (so no magic ROPs like the 360 either).
 
Why is speculation so bad?
Making educated guesses isn't bad, if you have industry insight and/or experience for example. Just simple speculation however doesn't fit the purpose of the Beyond3D discussion board. Worse, plain speculation tends to breed fanboyism and hardware worship, the which is the anti-thesis of B3D.

Do we really have evidence of it only having 8 ROPS?
Wuu's real-world game rendering performance, as well as its power draw doesn't really support it having more. Also, wuu's terrible main memory bandwidth can't really support more either, as ROPs even with on-chip eDRAM consume quite a lot of texture data when drawing pixels, and if the bandwidth doesn't exist to support them then there's no point in cramming more ROPs in there in the first place. It'd just be a waste of money and power (both of which Nintendo is loath to do, especially these days.)
 
Well we don't have any evidence of it a having more, put it that way. It runs like an 8 ROP system with BW issues when using MSAA and high levels of transparent overdraw (so no magic ROPs like the 360 either).

Wuu's real-world game rendering performance, as well as its power draw doesn't really support it having more. Also, wuu's terrible main memory bandwidth can't really support more either, as ROPs even with on-chip eDRAM consume quite a lot of texture data when drawing pixels, and if the bandwidth doesn't exist to support them then there's no point in cramming more ROPs in there in the first place. It'd just be a waste of money and power (both of which Nintendo is loath to do, especially these days.)

OK, thanks. Does the fact that these are simple ports from current gen have any bearing? Couldn't it be a higher ROP system, but bottlenecked by developers not really utilising all the eDRAM and/or resolving too much to main RAM. I guess what I'm getting at is that it seems a little weird to look at launch ports and assume that's all the machine will be capable of. Or, is an 8 ROP system actually pretty good if it's utilised well?

Shit, I wish someone would photograph those chips.
 
So you're saying the Nintendo Super Mario Brothers team weren't able to utilize their system to at least generate higher resolution because they were launch ports?
 
OK, thanks. Does the fact that these are simple ports from current gen have any bearing? Couldn't it be a higher ROP system, but bottlenecked by developers not really utilising all the eDRAM and/or resolving too much to main RAM.
Nope. You send work to the GPU - it completes it as quick as it can. More ROPs with adequate BW would mean more drawing. The only bottleneck would be BW, and if Nintendo included more ROPs than BW to support them, then they are stupid. ;)
 
Nope. You send work to the GPU - it completes it as quick as it can. More ROPs with adequate BW would mean more drawing. The only bottleneck would be BW, and if Nintendo included more ROPs than BW to support them, then they are stupid. ;)

Haha. OK, so basically people are saying that 12.8GB/s basically mean 8ROPS that the frequency that's rumoured (550Mhz)? Sorry for the noob questions, trying to get an understanding.

So you're saying the Nintendo Super Mario Brothers team weren't able to utilize their system to at least generate higher resolution because they were launch ports?

When I made that comment I was talking about the third party ones which basically just hit parity (and worse in some cases). I imagine that Nintendo would have just chose 720 because it's easier and cheaper all around and 720 is enough. I would be pretty shocked if Wii U couldn't handle NSMBU at 1080.
 
If the WiiU is capable of running at higher resolution with the same performance due to untapped hardware performance being available then Nintendo would have simply upped the target resolution. It's quite simple to do. The fact that they did not is informative.
 
Everything taken together makes that the most likely scenario.

OK, thanks.

If the WiiU is capable of running at higher resolution with the same performance due to untapped hardware performance being available then Nintendo would have simply upped the target resolution. It's quite simple to do. The fact that they did not is informative.

I just don't buy that but I have no problem being proven wrong. Back in the day when I did Amiga development we could quite easily do a platformer along the lines of NSMB with dual playfield parallax at 60 fps. I can't see how 32 bit, 720p, a couple more layers and some lighting and maybe physics could undo 20 years of power games from technological advancement.

It boggles my mind but someone please explain how this could be the case. Anyway, sorry if this has gone off topic.
 
Status
Not open for further replies.
Back
Top