Wii U hardware discussion and investigation *rename

Status
Not open for further replies.
So we have a couple developers now which are using higher resolution textures in the Wii U version of a game (vs XBox 360/PS3). You don't need more shader resources (ALU/TMU/ROPs) to handle higher resolution textures, it simply takes more bandwidth. This means that they would probably not be anywhere close to bandwidth limited if using the same texture quality as XBox 360/PS3..

They also have a larger draw distance, and better lighting.
 
So we have a couple developers now which are using higher resolution textures in the Wii U version of a game (vs XBox 360/PS3). You don't need more shader resources (ALU/TMU/ROPs) to handle higher resolution textures, it simply takes more bandwidth.
Internal bandwidth (L1, L1<=>L2) usually scales with the number of SIMDs and hence with the number of TMUs. ;)

Getting the best out of the Wii U hardware probably requires some tricky memory management to keep as much objects requiring high bandwidth access as possible in the limited eDRAM. You only want to use the really slow main RAM for stuff where you can afford it. Hasty ports of XBox360 games probably don't try it very hard and get bitten by the considerably lower memory bandwidth of the WiiU compared to the XBox360 (and PS3, too). The larger eDRAM of the WiiU only helps if you use it efficiently of course.
 
Last edited by a moderator:
Internal bandwidth (L1, L1<=>L2) usually scales with the number of SIMDs and hence with the number of TMUs. ;)

Will internal bandwidth requirements tend to go up with higher resolution textures either? I figure those are the kind that will stream through/thrash texture cache.
 
I'm definitely having an off moment while replying about the power consumption. That's what you get when you're busy at work and reading + commenting at the same time.

A Bluray drive probably use more than 2.5w, since the external one mostly come with a Y usb cable. I also saw a model come with an adaptor... and this is all a BD reader + dvd burner (not BD burner).

Reading http://www.anandtech.com/show/2769/6 shows that merely attaching a BD drive can resulted in a 4w increase. Of course the article is old and efficiency usually improved over time.
 
Will internal bandwidth requirements tend to go up with higher resolution textures either?
Of course! If higher resolved textures are visible on screen it means they need to get to the TMUs first!
I figure those are the kind that will stream through/thrash texture cache.
The TMUs always fetch through the texture cache(s). There is no way to bypass that. So if the TMUs need to fetch more higher resolved textures, one needs more aggregate bandwidth. If you double the number of TMUs, you have automatically doubled the aggregate bandwidth from the L1s to the TMUs (and also its aggregate size). Between the L1s and the L2 slices sits a quite wide crossbar connecting all L1s on one side with the L2 slices on the other side. The bandwidth to the L1s basically scales with the number of L2 slices (usually bound to the number of memory channels). But even with constant number of L2 slices, more TMUs with more L1s will help a bit as you increase the cached amount of texture data (a bit less trashing of the texture L1 caches).
 
I've one question wrt the EDRAM size, G-buffer size and (virtual texturing).
Not speaking specifically of one title but I may use Trial HD/evolution as a reference point (Sebbbi you are welcome to give your take on the matter).

it seems that 32MB of edram is double confirmed at this point.
Games like Trial HD or Crisis2 use a pretty well packed, ligh weight G-buffer (to fit in the 360 edram).
Trial HD is slightly below 720p to fit into the (360) edram).
I won't make the calculation (as I don't remember the format of the G-buffer in the aforementioned games) but I will assert that light weight g-buffer (as such as the ones used in the aforementioned games) @720p would be barely above 10MB. That is quiet some room left.

I do not remember the exact figure but iirc the amount of RAM used by megatexture was pretty low (in between 20 and 30 MB). So it got me wonder if a neat use of the system would be to use a tigh g-buffer along with virtual textures, unique/mega texutre or not or even a "blend", for example some elements are stored in a big virtual textures and others are dealt as std textures 9/loaded from the RAM).
In which case the EDRAM would hold the G-buffer, some virtual textures (some could be streamed from the RAM, in a game like NFS for example you are sure to have the road to display every frame so for example the relevant part of the virtual texture that hold the road / part of the scenery information could be loaded in the edam).

I don't know how much room you need for shadow maps too.

Overall I wish sebbbi will read this and tell us what he could see has good balanced (in what you keep/stream in the EDRAM).

Edit the whole idea is to releave the pressure on the external bandwidth from the GPU pov

Edit 2

DeanoC pov would be interesting too, as he worked with multiple virtual textures in Blink (I'm not sure of the game's name sorry if wrong), so he has to know the memory usage, which part of the virtual textures it would be relevant to keep in ram and those that it would be worse it to load in the EDRAM ( I guess depend on the game)
 
Last edited by a moderator:
Of course! If higher resolved textures are visible on screen it means they need to get to the TMUs first!
The TMUs always fetch through the texture cache(s). There is no way to bypass that. So if the TMUs need to fetch more higher resolved textures, one needs more aggregate bandwidth. If you double the number of TMUs, you have automatically doubled the aggregate bandwidth from the L1s to the TMUs (and also its aggregate size). Between the L1s and the L2 slices sits a quite wide crossbar connecting all L1s on one side with the L2 slices on the other side. The bandwidth to the L1s basically scales with the number of L2 slices (usually bound to the number of memory channels). But even with constant number of L2 slices, more TMUs with more L1s will help a bit as you increase the cached amount of texture data (a bit less trashing of the texture L1 caches).

This is getting off track. The question isn't if adding TMUs increases demand on a shared L2 texture cache, nor is it if TMU fetches need to go through cache. I phrased the question about internal resolution badly - what it really meant is can this actually become a new bottleneck before external bandwidth could?

Unless the textures have strong reuse in the cache then scaling texture resolution will scale external bandwidth requirements. It'll scale internal bandwidth requirements too, but it doesn't matter if the external bandwidth was a bottleneck in the first place. If they had room to increase the texture resolution on a Wii U game (assuming they didn't do it at the expense of performance) it means they had leftover external bandwidth, but does not mean they needed more TMUs to handle it - the number of texture fetches that the TMUs make would be the same. The L1 misses will go up so the L2 bandwidth would need to be higher, but only if it too was a bottleneck in the first place.
 
I'm definitely having an off moment while replying about the power consumption. That's what you get when you're busy at work and reading + commenting at the same time.

A Bluray drive probably use more than 2.5w, since the external one mostly come with a Y usb cable. I also saw a model come with an adaptor... and this is all a BD reader + dvd burner (not BD burner).
A normal USB 2.0 plug uses a max of 500mw, devices that require more will use a Y usb cable which bumps that up to 1watt.
This is the first model that google popped up:
http://www.amazon.com/Blu-Ray-USB-External-Player-DVDRW/dp/tech-data/B001QA2Y9S/ref=de_a_smtd

Says it's powered off USB power exclusively, it may require a Y cable on typical USB ports, or a single cable on a higher output USB port, but I would wager it maxes out at 1W
 
A normal USB 2.0 plug uses a max of 500mw, devices that require more will use a Y usb cable which bumps that up to 1watt.
This is the first model that google popped up:
http://www.amazon.com/Blu-Ray-USB-External-Player-DVDRW/dp/tech-data/B001QA2Y9S/ref=de_a_smtd

Says it's powered off USB power exclusively, it may require a Y cable on typical USB ports, or a single cable on a higher output USB port, but I would wager it maxes out at 1W

Correction, normal USB 2.0 sockets can provide 500mA at 5V. That equates to 2.5W.

Is a Bluray + DVDRW a good comparison, or does it require substantially more energy to write optical media than to read them? But if the BD only ones still need the Y-cable I guess that makes it pretty moot..
 
So essentially, the Most wanted port is due to the 1GB of DDR3 available to devs, and the 32MB of EDRAM being managed carefully to get the most use out of it.

I wonder, has the rumor of the 32mb of EDRAM running at 70gb/s ever been validated by anyone?
 
I phrased the question about internal resolution badly - what it really meant is can this actually become a new bottleneck before external bandwidth could?

IMO, from one point of view, no, higher resolution textures don't increase the bw requirements because the TMUs will be querying the memory at the same rate (we still draw the same amount of pixels drawn to screen but the TMUs just get more resolution to select texels from). A few pages back, a cacheline was mentioned to be 64 bytes, so this will lead to extra bw requirements when no mipmaps are used.

Something like that?
 
IMO, from one point of view, no, higher resolution textures don't increase the bw requirements because the TMUs will be querying the memory at the same rate (we still draw the same amount of pixels drawn to screen but the TMUs just get more resolution to select texels from). A few pages back, a cacheline was mentioned to be 64 bytes, so this will lead to extra bw requirements when no mipmaps are used.

Something like that?

Not just when no mipmaps are used, but when you would have saturated availabls mip levels. The number of TMU fetches won't increase, but the bandwidth requirements will so long as you've increased the average LOD that the TMU selects. Very roughly speaking, anywhere where the texel average rate of change was less than one pixel is a place where it could have gone to a higher LOD if it were available. Increasing the texture resolution adds these extra LODs and increases the number of texture cache misses.

Of course this will only make a difference if the TMUs were ever hitting this point to begin with, but I don't think they would have increased the resolution if they weren't..

What it doesn't say is how often it happens. If it only applies to a small percentage of pixels on average, like those nearest to the camera, then it might not make a big bandwidth difference. But you'd still need the same amount of extra memory for the textures.
 
The TMU fetch rate will increase, but the bandwidth requirements will as well so long as you've increased the average LOD that the TMU selects.

You mean in the sense of, it ends up in more texels on screen and therefore an increased fetch rate? I'm not sure I get it!

EDIT: oh, I responded a bit too quick, I see you edited your post a little
EDIT2: I do agree with the rest of your post, sampling from different mipmaps bumps it up. And somehow we're always looking at and comparing theoretical maximums.

BTW would it be plausible that nintendo might have gone with 1 texture unit per SIMD, they did so in Flipper/Hw too afterall. It would reduce texelrate to 4.4Gpx/s
 
Last edited by a moderator:
You mean in the sense of, it ends up in more texels on screen and therefore an increased fetch rate? I'm not sure I get it!
You got it. The higher the tex LOD, the lesser the texel reuse, the lower the cache efficiency, the greater the BW stress on the bus.
 
Correction, normal USB 2.0 sockets can provide 500mA at 5V. That equates to 2.5W.

Is a Bluray + DVDRW a good comparison, or does it require substantially more energy to write optical media than to read them? But if the BD only ones still need the Y-cable I guess that makes it pretty moot..
Whoops, I get confused because a USB device is suppose to only use 100mA @5V, but can request up to 5 units of that = 500mA. My bad. I was assuming that the BD drive linked might need a Y cable on some systems, I don't know for sure.
 
Success! This review has both the GDDR5 and DDR3 versions.

Green bar is 625 with mHz DDR3 @ 1333 (so less bandwidth in total than the Wii U main memory alone).
Blue bar is 750 mHz with GDDR 5 @ 3600.

And for anyone else reading this, remember these tests are with all settings on max (so Crysis 2 suffers), and that these are 8:160:4 parts. That's right folks, only 4 ROPs!

http://www.techpowerup.com/reviews/Sapphire/HD_6450_Passive/9.html

Despite being only 20% faster the GDDR 5 version is pulling in figures 40% + higher. And check out the Hawx 1024 x 768:

http://www.techpowerup.com/reviews/Sapphire/HD_6450_Passive/16.html

Holy crap! 76% faster for the GDDR 5 version!

So yeah, double up the ROPs and TMUs and give it a chunk of fast memory and it looks like a 16:160:8 Wii U could hang with the 360 pretty well. The GDDR5 equipped 6450 can do it with only 8 TMUs and 4 ROPs.

To follow up on my last post, while I commend your efforts to figure this thing out, I don’t think these Call of Duty numbers prove your point. You are trying to say that if we were looking at a 320 SPU part, we’d be seeing vastly better frame rate and resolutions, correct? In reaffirming the statement about the effect of resolution on GPU and CPU loads that I quoted before, I started to notice some peculiarities in those benchmarks.

Let’s look at the HD 6450 for comparison. At 1024x768, we are seeing comfortable frame rates. This makes sense since the GPU is barely being taxed at that resolution. The next bump up in resolution/IQ and look what happens to the frame rate. It takes a nose dive. Is it any coincidence that when we look at low resolutions, CPU bottlenecks are easier to discern? So when looking at the lowest numbers in this chart vs avg figures for BLOPS II Wii U, it makes sense that we would see a difference due to Espresso being no i7. And then there are other performance factors like the locked vsync and characters on screen (which seems to be a cpu thing) on Nintendo’s console. Meanwhile the chart also displays the clear effect of memory bandwidth on performance.

It’s pretty amazing that on the same card, the difference between the 1280x1024, 2xAA, 8x AF and 1920x1200, 4x AA, 16 AF is only ~8 frames!

In short, while I agree that the jury is still out on whether it’s a 160 SPU part, I don’t think you rule out a 320 SPU part by making the rightful observation that games thus far haven’t automatically featured increased resolution and framerate. If getting the image quality to where they felt comfortable resulted in a merely acceptable framerate together with everything else that affects performance, why would the developer then go ahead and increase the settings?
 
^ The difference being however, that from the ports we have seen, its not just one or two games that render at these lower performance ratios, its a vast majority of launch titles.

Most Wanted and Trine 2 are the only two games so far that have been superior to their HD twin equivalents in a framerate, resolution or texture perspective.

The other side to your argument would be; why would only these two games so far have a handle on the Wii U to that extent?

We also know that especially in Criterons case, they already had the game essentially made, they just didn't want to ship before knowing Nintendo's online plans.

So does that say more about Nintendo's documentation, the hardware itself, or the developers?

It could be a mix of all 3.
 
I emailed Richard what function responded to you, this is what he wrote me back

I’m not doing an article on why Wii U ports are inferior. A private email was taken out of context.

I assume you are Function on B3D?

What is your take on the far larger shader cores vs. Bobcat? It’s pretty much the only remaining argument I see against your 160SP theory.

(function) i will PM you his email and you can get back to him if you like.

I'd actually already PM'd function Richards email address when he posted his reply to my quote of Richard's email.

Function, did you end up emailing Richard? Last he wanted to know was whether anyone had a 'theory at all that accounts for the Wii U shader cluster being 66% larger than Bobcat’s?'
 
Status
Not open for further replies.
Back
Top