Wii U hardware discussion and investigation *rename

Shifty Geezer · Jan 17, 2013

Laa-Yosh said:
Although, the other problematic case was alpha blended particles... but that shouldn't be a problem with the EDRAM...

Unless the eDRAM is slow. ~30 GB/s would place working BW at around PS3's level. I'm assuming Wii U isn't using Xenos's eDRAM+ROP setup. If it is, you'd be right and particle blending should be free.

SoreSpoon · Jan 17, 2013

Laa-Yosh said:
Most of the optimizations that had a really big effect on performance on the PS3/X360 were reorganizing algorithms and data structures for heavily multithreaded execution, and make the code itself latency tolerant. Which is also why these types of optimizations have benefited both consoles at once.

This kind of optimization is already present in all multiplatform engines, and it will also be beneficial for the WiiU's architecture - as it has a multi-core CPU as well, and the memory subsystem also has considerable latency too.

The one big difference that probably has to be dealt with is the 32MB EDRAM, but all of the big multiplatform games had plenty of development time to deal with that. Also, most of the frame rate problems are associated with characters which shouldn't be dependent on video memory or rendering, and has more to do with AI code running on the CPU.
Although, the other problematic case was alpha blended particles... but that shouldn't be a problem with the EDRAM, so it might be the particle animation that causes trouble, which would mean another CPU related bottleneck.

All of this assumes that Wii U development is intended to be identical to 360 development, and that any customizations to the CPU and GPU are either very light or non-existent. It also assumes that time is the only factor in development of the ports, and that all ports started development at the earliest time possible.

Entropy · Jan 17, 2013

BobbleHead said:
So if you are doing all reads or all writes and have a nice stream of addresses you can keep the data bus busy >90-95% of the time. Start mixing writes and reads and the % drops. Use an address stream which isn't as friendly and you drop even further. Make less than optimal arbitration decisions in order to give lower latency to CPU fetches and you drop even more.

Thanks, and great job on the write-up!
Correct as far as I can see, and with better judged presentation and length than I would be capable of.
It was needed to balance your first post which created an impression of single cycle access and a 100% bus utilization. (For less experienced readers, there's one heck of a lot more to be said - the last two sentences of BobbleHeads post above are easily textbook/PhD thesis material.)

What we haven't touched much yet is - what about code that doesn't just stream large contiguous chunks of data? Also, if data can always, or even typically, be laid out optimally for streaming from a data structure point of view, never mind hardware issues such as page boundaries.

I'll introduce the problem of the first issue I raised above. If you use a 128-bit bus using 8 deep bursts, if you access a word, 1024 bits to be transferred over the bus. (Lets ignore burst chop.) Now, if all you were interested in was that particular word (say a 32-bit pointer), then 31/32 or 97% of your effective read bandwidth was spent transmitting junk data. Which, adding insult to injury, evicts possibly relevant data in your cache hierarchy.

As an introduction to my second issue, consider a simple three dimension matrix. If the data is layed out as {x1,x2,...,xn,y1,y2,...,yn,z1,z2,...,zn} you will get drastically different bus utilization depending on along which axis you traverse it, and also if you only want a single coordinate value, or the whole (x1,y1,z1) triad. Alternatively, the data could be laid out as {x1,y1,z1, x2,y2,z2,...,xn,yn,zn}, which would provide another set of bus utilization numbers again dependent on just how you traverse the matrix. And if even a simple matrix is tricky, how about more complex data structures? And if they are not organized linearly, but as for instance trees?

I'm sorry that I can't dig deeper into this -- I'm a fairly slow typist, and I'm strapped for time. But in my experience from performance computing, data flow is THE main issue. And as soon as you move away from the very simplistic cases, it gets really messy really quick. As with multiprocessing, I think it would be good if the people here who aren't active programmers gained an understanding that what we are dealing with are really thorny and complex problems, that often as not simply doesn't have optimal solutions.

(For Exophase, since I know you have an interest in mobile computing, take a look here at the the compiler dependance of even such a simple benchmark as STREAM, even using a single compiler (and version). "So depending on gcc optimization options, we get some nice semi-random benchmark numbers." Moral: It's not trivial.)

Laa-Yosh · Jan 17, 2013

Shifty Geezer said:
Unless the eDRAM is slow. ~30 GB/s would place working BW at around PS3's level.

In that case they could solve it just by resorting to 1/4 resolution alpha buffers. Which is why I think it's more about the particles themselves, they could be quite expensive to calculate and writing a completely GPU-based system may not be a viable solution, either because of technical or budget restrictions.

Laa-Yosh · Jan 17, 2013

SoreSpoon said:
All of this assumes that Wii U development is intended to be identical to 360 development, and that any customizations to the CPU and GPU are either very light or non-existent.

I base my opinion on a lot of developer explanations from joker, sebbi, repi and so on - they all focused on how handling high latency and proper data structures are far more important than the actual hw related low level optimizations themselves.
Most of the performance problems of first generation HD games were related to such issues and it was the hardest part to understand and develop solutions for, coming from a single-core Xbox/PS2/Gamecube background.

The Wii is a very similar architecture, again: multiple CPU cores, unified memory with high latency, and programmable GPU. Thus the same general high level optimizations must be beneficial to this system as well because their general architecture is still quite similar. The WiiU is far closer to the 360 than to the first Wii itself, for example.

It also assumes that time is the only factor in development of the ports, and that all ports started development at the earliest time possible.

We've seen plenty of games running on the WiiU at E3 2012 already, more than 6 months before release. Including those that had performance problems in their final builds.

Shifty Geezer · Jan 17, 2013

Laa-Yosh said:
In that case they could solve it just by resorting to 1/4 resolution alpha buffers. Which is why I think it's more about the particles themselves, they could be quite expensive to calculate and writing a completely GPU-based system may not be a viable solution, either because of technical or budget restrictions.

Fair point.

SoreSpoon · Jan 17, 2013

Laa-Yosh said:
I base my opinion on a lot of developer explanations from joker, sebbi, repi and so on - they all focused on how handling high latency and proper data structures are far more important than the actual hw related low level optimizations themselves.
Most of the performance problems of first generation HD games were related to such issues and it was the hardest part to understand and develop solutions for, coming from a single-core Xbox/PS2/Gamecube background.

The Wii is a very similar architecture, again: multiple CPU cores, unified memory with high latency, and programmable GPU. Thus the same general high level optimizations must be beneficial to this system as well because their general architecture is still quite similar. The WiiU is far closer to the 360 than to the first Wii itself, for example.

We've seen plenty of games running on the WiiU at E3 2012 already, more than 6 months before release. Including those that had performance problems in their final builds.

So budgets, the amount of manpower, and the relative skill of the teams handling the ports are not factors? Also, 5 is not more than 6.

Laa-Yosh · Jan 17, 2013

I believe I've given plenty of technical details for my arguments. Your posts however would be more fitting in the '1 million tears...' topic, in my opinion.

In other words stop with the same old WiiU apologizing in this thread please.

SoreSpoon · Jan 17, 2013

Laa-Yosh said:
I believe I've given plenty of technical details for my arguments. Your posts however would be more fitting in the '1 million tears...' topic, in my opinion.

In other words stop with the same old WiiU apologizing in this thread please.

I'm not "apologizing". I'm just trying to understand this whole thing. I'm asking serious questions. I just want to understand why we can be so sure that Wii U's hardware is the only thing to blame here and that there's no way Wii U could do better with time. If you're just going to insult me for not reading every post in this thread, then just ignore me.

Personally, I think that truth is somewhere in-between what you guys say and what the fanboys say, since both sides are obviously biased. The fact that everyone here seems to agree that the customizing of the GPU was simply cutting it down seems to be evidence enough of that.

Laa-Yosh · Jan 17, 2013

You've registered here less than a month ago, you admit you haven't done your reading even though there's a TON of info in this thread and some of it even repeated multiple times - and yet it's us who should go out of our way to help you understand?

Also, why the hell are we supposed to be biased???

SoreSpoon · Jan 17, 2013

Laa-Yosh said:
You've registered here less than a month ago, you admit you haven't done your reading even though there's a TON of info in this thread and some of it even repeated multiple times - and yet it's us who should go out of our way to help you understand?

Also, why the hell are we supposed to be biased???

What info? It all amounts to, "The system is tapped out because multi-core CPUs and programmable GPUs are fully understood". I just want to understand why there definitely isn't more to the console, and why we can be sure that all of the ports are the best they could have possibly been. If you can't answer that, then just say so and we can move on.

And it's pretty clear that everyone here wants to hurry up and have a definitive answer to move on to more interesting systems. Nobody here would suggest that the 720 or PS4 will be tapped-out at launch despite the fact that they use multi-core CPUs and programmable GPUs.

I don't know. Maybe I was just mistaken in thinking that a console is a whole package, rather than just the sum of its parts.

Laa-Yosh · Jan 17, 2013

The info is that the Wii U has a serious memory bandwidth bottleneck, and that it's CPU has far less transistors and power draw compared to its competitors so it's probably a bottleneck too.

On top of that there is no magical optimization to free up some large amounts of power, because developers already have 5-8 years of experience in writing multithreaded code capable of handling high memory latencies.
Catering for the system's specific intricacies just wouldn't be enough to move those multiplatform games from 10-15fps in stressing situations to 30fps. Such large gains could only be produced if devs had no clue about how a multicore, unified memory system needs to be programmed.

All this info was posted in this thread several times already.

SoreSpoon · Jan 17, 2013

Laa-Yosh said:
The info is that the Wii U has a serious memory bandwidth bottleneck, and that it's CPU has far less transistors and power draw compared to its competitors so it's probably a bottleneck too.

On top of that there is no magical optimization to free up some large amounts of power, because developers already have 5-8 years of experience in writing multithreaded code capable of handling high memory latencies.
Catering for the system's specific intricacies just wouldn't be enough to move those multiplatform games from 10-15fps in stressing situations to 30fps. Such large gains could only be produced if devs had no clue about how a multicore, unified memory system needs to be programmed.

All this info was posted in this thread several times already.

I know all of this. I'm more wondering about how we're so certain about the GPU's limits and how we know that the bottlenecks don't affect the GPU. Also, I'm wondering why the GPGPU and DSP can't possibly at least partially make up for the CPU bottleneck. I'm not expecting some kind of magic that makes the system 5x current-gen or something ridiculous like the fanboys; I'm just wondering if, possibly, a game build from the ground-up exclusively for Wii U could match or exceed the best of current-gen without sacrifice.

Also, I don't know if power draw should really be considered much of a bottleneck, since it's certainly possible to make a 40W system that exceeds current-gen consoles in all areas. It's the parts themselves that are the issue.

I just have a personal issue with making definitive statements before truly knowing enough to do so.

kagemaru · Jan 17, 2013

SoreSpoon said:
I know all of this. I'm more wondering about how we're so certain about the GPU's limits and how we know that the bottlenecks don't affect the GPU. Also, I'm wondering why the GPGPU and DSP can't possibly at least partially make up for the GPU bottleneck.

You mean CPU bottleneck, right?

SoreSpoon · Jan 17, 2013

kagemaru said:
You mean CPU bottleneck, right?

Yes. Thank you.

Laa-Yosh · Jan 17, 2013

Read the damn thread, we can't do it for you.

Grall · Jan 17, 2013

That's a pretty harsh attitude, Laa-Yosh, the thread's up to over 170 pages now of mostly noise and (by now) largely irrelevant speculation. The acorns of truth and accurate insights are truly far and few between.

Laa-Yosh · Jan 17, 2013

Grall said:
That's a pretty harsh attitude, Laa-Yosh, the thread's up to over 170 pages now of mostly noise and (by now) largely irrelevant speculation. The acorns of truth and accurate insights are truly far and few between.

They're in the last 1,5 months' worth of posts.

But you're welcome to gather them for the guy...

Grall · Jan 17, 2013

No need to do that, just briefly answer the dude's questions if you have the answers. This IS a discussion board after all.

...Or if you don't feel like discussing, just don't post at all. I don't see the point however in berating him for not backtracking through what can easily amount to dozens of pages of old posts. That's not the point of a discussion.

Shifty Geezer · Jan 17, 2013

SoreSpoon said:
I know all of this. I'm more wondering about how we're so certain about the GPU's limits and how we know that the bottlenecks don't affect the GPU.

We have the GPU's approximate tech, some sort of die size, clock speed, and power draw. Wii U is basically achieving what PS360 are achieving in half the watts. That's what the technology and efficiencies are bringing to Wii U - not extra performance and abilities. We can be pretty certain of the limits because 1) devs have told us, 2) the games aren't showing any clear advantage, 3) the tech details tied with the laws of thermodynamics tell us there's only so much you can do with 40 watts of power, 4) It's been Nintendo's modus operandi due to its success to use simpler, cheaper tech in their products, 5) other aspects of the machine are poorly engineered and implemented, which doesn't indicate fabulous design within.

All of these points have been discussed. You can agree or disagree, but I for one aren't going to repeat the discussion.

Wii U hardware discussion and investigation *rename

Shifty Geezer

uber-Troll!

SoreSpoon

Entropy

Laa-Yosh

I can has custom title?

Laa-Yosh

I can has custom title?

Shifty Geezer

uber-Troll!

SoreSpoon

Laa-Yosh

I can has custom title?

SoreSpoon

Laa-Yosh

I can has custom title?

SoreSpoon

Laa-Yosh

I can has custom title?

SoreSpoon

kagemaru

SoreSpoon

Laa-Yosh

I can has custom title?

Grall

Invisible Member

Laa-Yosh

I can has custom title?

Grall

Invisible Member

Shifty Geezer

uber-Troll!

Similar threads