What the Wii U hardware should have been?

Exophase · Jul 31, 2013

Rangers said:
The XBO specs dont match up with an 8 core JAg and 12 CU's very well to come over 400mm. I guess we'll need to see the PS4 SOC to get a better idea though (even then of course it wont be concrete).

It could be XBox One's SoC is done on a much less dense process than Jaguar or Pitcairn were, which is very possible if it's GF 28nm vs TSMC 28nm.

ninzel · Aug 1, 2013

I've thought at times it would be nice to see a competitive console that does nothing but play games. A back to basics stripped down system where the UI would literally have three simple options,Play games,Download games,System settings in plain black and white.
I don't know if that's even a real option for a console anymore,maybe the genie is out of the bottle for good. But since this is a speculation thread.......
Nintendo could build a system capable of playing games at the same level as upcoming gen even with an on paper disadvantage.
For example they could get away with say 6GB of RAM, a smaller HDD maybe even a slightly less powerful CPU/VPU and a really good Classic style controller. But all the resources would go 100% to game.No setting aside or reserving resources for non game play functions.And of course a better name.
Use the money they save on BOM to price it $50-75 less than their competitors.
A risk yes but could it be any worse than the situation they are in now?

Squeak · Aug 1, 2013

Gipsel said:
Then you first have to have a specific purpose. Building a general purpose CPU out of FPGAs is incredibly wasteful.
and achieving very little performance compared to an ASIC.
It appears you have a serious misconception about FPGAs. They are used because of their flexibility to model the behaviour of basically arbitrary logic/memory combinations. But this comes with a severe overhead (power, performance, die size [cost]) compared to an ASIC which integrates that directly (where you lose the flexibility of course). As others have said, the most common use of FPGAs is often to avoid the very steep costs of designing a manufacturing an ASIC for a specific purpose if the target market is relatively small.
FPGAs can of course beat general purpose CPUs on some algorithms which are not suited to a CPU. But there is no way they beat them across the board. And you don't want to reprogram the FPGA each time you do something slightly different (your code calls some subroutine). And there is also no way it will beat a GPU on graphics related tasks. After all, a GPU is an ASIC build for such tasks. You would need A LOT of FPGAs to rival the performance of a GPU.
How would one do that?

Sorry, forgot about this.
Here are a few papers about FPGAs used for graphics and general purpose computing I dug put of my bookmarks:

http://www.doc.ic.ac.uk/~wl/papers/05/iee05tjt.pdf
http://www.ece.wisc.edu/~kati/Publications/Compton_ReconfigIntro.pdf
http://synergy.cs.vt.edu/pubs/papers/lin-reconfig11-fft.pdf
http://caxapa.ru/thumbs/282284/ISVLSI_FINAL.pdf
http://liu.diva-portal.org/smash/get/diva2:20165/FULLTEXT01.pdf
http://www.seas.upenn.edu/~andre/pdf/aitr1586.pdf

There are many other studies out there, just do searches on for example "reconfigureable computing" or "FPGA vs. GPU" or "FPGAs for general purpose computing".

Almost exactly your first gripes was once leveraged against microprocessors and micro controllers.
This is a complex subject with many facets and I don't feel like going into a multipage discussion about something as yet immaterial.
FPGAs essentially breaks the heat/power wall, the one thing more than any other holding raw hardware back at the moment.
You can achieve very, very good general purpose computing results with FPGAs.
Costs will come down more than exponentially (have already done so for 20 years) and tools will mature.
Once that happens, a paradigm change (for lack of better word) will happen.

Gipsel · Aug 2, 2013

Squeak said:
Sorry, forgot about this.
Here are a few papers about FPGAs used for graphics and general purpose computing I dug put of my bookmarks:

http://www.doc.ic.ac.uk/~wl/papers/05/iee05tjt.pdf
http://www.ece.wisc.edu/~kati/Publications/Compton_ReconfigIntro.pdf
http://synergy.cs.vt.edu/pubs/papers/lin-reconfig11-fft.pdf
http://caxapa.ru/thumbs/282284/ISVLSI_FINAL.pdf
http://liu.diva-portal.org/smash/get/diva2:20165/FULLTEXT01.pdf
http://www.seas.upenn.edu/~andre/pdf/aitr1586.pdf

There are many other studies out there, just do searches on for example "reconfigureable computing" or "FPGA vs. GPU" or "FPGAs for general purpose computing".

I'm far from impressed. Have you really read them? Like the examples where a CPU beats FPGAs on some subcategory of BLAS performance or the other one where a massive setup of 16 FPGAs just edged out a single GT200 GPU on FFTs (and this 16FPGA setup used more power btw., performance/power wasn't better, only for a single small FPGA which delivered an order of magnitude lower performance to start with). And FFTs are a task GPUs are not really build for. If what you are doing all day long are FFTs, any ASIC build for the job will stomp it anyway.

And at least one of the article basically mentioned most the reasons I have given already. For instance it is mentioned that FPGAs are usually far to small to hold all parts of a program (if it has any complexity). That means you have to reprogram them for a function call, or you have to invest far more total die space in multiple FPGAs or some special multi context FPGAs which enable a faster partial reprogramming. Anyway, all this severely adds to the overhead for general purpose computing in practice. You really don't want to wait even a few milliseconds for reprogramming the FPGA so it can execute a function call during the runtime of some application/game. That's half an eternity for a microprocessor.

So I still don't change my opinion that an FPGA is only helpful if you run a well defined algorithm for an extended period of time and you still need the flexibility to run different ones if you want (with a high latency switching them) or shy away from the steep startup costs of an ASIC (which beats FPGAs on both power consumption and performance and is also significantly cheaper for larger volumes). So while it may make sense as an accelerator chip in some HPC nodes for some niche applications, it's far from a general solution. Even while to some extent the same is true for GPU accelerators, they lend themselves to this a bit easier because they fit better to common programming paradigms and are actually fairly straightforward vector architectures with (quite) a few performance pitfalls. Developers are far more used to this and there are far less problems with compilers (compiling some general C code to some FPGA configuration is WAY more complicated and basically largely unsolved).

Squeak said:
Almost exactly your first gripes was once leveraged against microprocessors and micro controllers.

Really? How so?

Squeak said:
This is a complex subject with many facets and I don't feel like going into a multipage discussion about something as yet immaterial.

I I fail to see that it will materialize anytime soon or even at all. Choosing an FPGA design as you proposed for a game console like the Wii U would be beyond crazy. It wouldn't have worked at all commercially. They could have sold a few hundreds/thousands to some research projects. That's all. And there would be hardly any games for it because it would be a almost unworkable system.

Squeak said:
FPGAs essentially breaks the heat/power wall, the one thing more than any other holding raw hardware back at the moment.

Except they don't.

Squeak said:
You can achieve very, very good general purpose computing results with FPGAs.

Except you don't for a broad range of realistic general purpose scenarios.

Squeak said:
Costs will come down more than exponentially (have already done so for 20 years) and tools will mature.

Tools mature, okay. But how do the costs come down faster than what the whole industry experiences with shrinks (with ever less advantages)? The overhead for transistor count and speed is basically a constant factor.

Grall · Aug 2, 2013

Hah! It's fun - with what has been discussed in mind - listening to man prodigy John Carmack singing the praises of programming languages Haskell and Lisp in his keynote speech from this year's Quakecon.

He's got a lot of good things to say about it.

Squeak · Aug 14, 2013

Gipsel said:
I'm far from impressed. Have you really read them? Like the examples where a CPU beats FPGAs on some subcategory of BLAS performance or the other one where a massive setup of 16 FPGAs just edged out a single GT200 GPU on FFTs (and this 16FPGA setup used more power btw., performance/power wasn't better, only for a single small FPGA which delivered an order of magnitude lower performance to start with). And FFTs are a task GPUs are not really build for. If what you are doing all day long are FFTs, any ASIC build for the job will stomp it anyway.

And at least one of the article basically mentioned most the reasons I have given already. For instance it is mentioned that FPGAs are usually far to small to hold all parts of a program (if it has any complexity). That means you have to reprogram them for a function call, or you have to invest far more total die space in multiple FPGAs or some special multi context FPGAs which enable a faster partial reprogramming. Anyway, all this severely adds to the overhead for general purpose computing in practice. You really don't want to wait even a few milliseconds for reprogramming the FPGA so it can execute a function call during the runtime of some application/game. That's half an eternity for a microprocessor.

Being as they are research papers, they have to keep a neutral tone and be scientific about it. Therefore you will of course find some weighing back and forth and a lot of, sometimes it seems somewhat contrived, self criticism to "be scientific". And not a lot of goshing and swooning over the possible implications.
Also the FPGAs used for research, especially the older research, are clearly not designed with performance computing in mind, but more as prototyping, glue logic, industrial use etc. Therefore the results are that much more impressive.

So I still don't change my opinion that an FPGA is only helpful if you run a well defined algorithm for an extended period of time and you still need the flexibility to run different ones if you want (with a high latency switching them) or shy away from the steep startup costs of an ASIC (which beats FPGAs on both power consumption and performance and is also significantly cheaper for larger volumes). So while it may make sense as an accelerator chip in some HPC nodes for some niche applications, it's far from a general solution. Even while to some extent the same is true for GPU accelerators, they lend themselves to this a bit easier because they fit better to common programming paradigms and are actually fairly straightforward vector architectures with (quite) a few performance pitfalls. Developers are far more used to this and there are far less problems with compilers (compiling some general C code to some FPGA configuration is WAY more complicated and basically largely unsolved).

FPGAs are really in many ways the natural continuation of the idea of microprogramming. And again you mention current tools and compilers. It goes without saying, that a major amount of work, if not the biggest part, of building an FPGA computer would lie in doing tools and languages. And compilers? I was talking about interpreted languages, extreme late binding and dynamic type checking. That's one of the major draws, that you could use these previously somewhat slow and underdeveloped, but very very powerful "paradigms" for vastly speeding up the development process.
ASICs and specialized hardware in general will always have a place for its speed and effectiveness. An FPGA architecture would probably also include a certain amount of specialized circuitry in a mixed die for for example GPU abilities. As is already being done today with FPGAs in other realms than graphics.

Really? How so?

That it was too expensive and wasteful to have general purpose computing hardware used for industrial process control or in consumer goods, when asics where faster, cheaper and more reliable. Or with regards to general purpose computing, that microprocessors would never be fast enough or good enough to really be considered for general purpose computing.

Without pushing it too much it could be said that todays ISAs are really hyper beefed-up microcontrollers. That's what they started out as.
The whole field kind of got rebooted in about 1980 when micros really caught on. And mostly not for the better. A lot of good ideas where forgotten, diluted or distorted beyond recognition. Things were balkanized and were run mostly by talented amateurs and hacks that lacked the deeper understanding and wiseness of the older generation. And they weren't willing to learn and be humble about it.
And that is pretty much where we are today. In an extrapolated version of that reality. With optimized versions of things that are really not very good at all and don't scale well.
Heed Donald Knuths words: Premature optimization is the root of all evil.

I fail to see that it will materialize anytime soon or even at all. Choosing an FPGA design as you proposed for a game console like the Wii U would be beyond crazy. It wouldn't have worked at all commercially. They could have sold a few hundreds/thousands to some research projects. That's all. And there would be hardly any games for it because it would be a almost unworkable system.
Except they don't.
Except you don't for a broad range of realistic general purpose scenarios.
Tools mature, okay. But how do the costs come down faster than what the whole industry experiences with shrinks (with ever less advantages)? The overhead for transistor count and speed is basically a constant factor.

As FPGAs become more and more popular the economies of scale will make them far far cheaper than they are today. What's more they are usually not made on the newest node. So only with those two things in mind it is self evident that FPGAs are in for some major price/power ratio performance gains.

Doing the next console with an FPGA cpu would have been daunting. But if the project had been started in time and everything ran smoothly Nintendo would have had the major advantage of having an architecture they owned wholly, and being able to do development in record time (late binding would make it possible to upend the teatable very late in the development process if thing for some reason didn't pan out).
That would have been worth a thousand times the gimped tablet they have to find uses for now.

Here is a slightly old but still interesting video I missed linking to the last time around:
http://www.youtube.com/watch?v=ckFUXWKMymU

Gipsel · Aug 14, 2013

Squeak said:
Being as they are research papers, they have to keep a neutral tone and be scientific about it. Therefore you will of course find some weighing back and forth and a lot of, sometimes it seems somewhat contrived, self criticism to "be scientific". And not a lot of goshing and swooning over the possible implications.

I'm very familiar with this concept. The result is usually, that you avoid claiming some pipedream will come true.

Squeak said:
That it was too expensive and wasteful to have general purpose computing hardware used for industrial process control or in consumer goods, when asics where faster, cheaper and more reliable. Or with regards to general purpose computing, that microprocessors would never be fast enough or good enough to really be considered for general purpose computing.

How was it a question that a general purpose CPUs could not be fast enough for general purpose tasks? How would you have done general purpose computing with specialized ASICs? That really lacks the flexibility of controlling the complete behaviour with a program. For a different task you would have to reconfigure the hardware. That's basically almost the FPGA model, the reconfiguration of FPGAs is just faster at the expense of execution speed.

Squeak said:
As FPGAs become more and more popular the economies of scale will make them far far cheaper than they are today.

Is that really true?

Squeak said:
What's more they are usually not made on the newest node.

That's not true. One can easily get FPGAs using leading edge processes (even intel's 22nm FinFet process). They use new processes often earlier than GPUs for instance, as the repetitive structure lends themself to very simple methods of redundancy based salvaging of chips. It is relatively easy to get even very large FPGAs to yield on a bleeding edge process compared to other types of chips like complex CPUs or GPUs. And Xilinx for instance already uses "2.5D stacking" on interposers to combine several FPGA tiles (done with TSMC 28HPL) to one large device for more than a year. So if anything, they lead and don't follow the introduction of new manufacturing technologies.

Squeak said:
Doing the next console with an FPGA cpu would have been daunting. But if the project had been started in time and everything ran smoothly Nintendo would have had the major advantage of having an architecture they owned wholly, and being able to do development in record time (late binding would make it possible to upend the teatable very late in the development process if thing for some reason didn't pan out).
That would have been worth a thousand times the gimped tablet they have to find uses for now.

An architecture they wholly own but which sucks completely for normal tasks. Great!
Is there any evidence (let alone a demonstration) that it would have been a feasible solution? I don't think so. From the available data points it looks to me that outside of a few niches an FPGA isn't going to cut it.

Edit:
I just skipped through the slides in the video. And you could interprete the reasoning given there easily in a way to explain why FPGA accelerators aren't that much of a favoured solution over GPUs (I have said something about this before).
Basically he claims that FPGAs have roughly a factor of 100 advantage over CPUs in terms of performance/Watt, but only for massively parallel problems, while specialized ASICs have another 10-fold or more advantage over FPGAs. I could agree with that. The interesting thing is now, that general purpose vector architectures like GPUs (or the Larrabee offsprings) can claw back at least one order of magnitude if not two for a lot of those massively parallel problems (even the SIMD extensions on CPUs can pull off almost one order of magnitude compared to scalar CPUs he uses as base), i.e. pretty close to FPGAs on a logarithmic scale. And all that while retaining and just augmenting the usual programming principles and without requiring a complete new ecosystem. Combine that with the economy of scale of consumer products and you have a clear winner.
Or let's take some numbers from this 2006 talk:
A full rack with (I think 200 used for computation, 250 in total) parallel FPGAs could generate 1.5 TFlop/s (theoretical peak, so probably not a useful algorithm; and let's assume it's double precision, a lot of the operations he talks about are usually just 16 or 32bit operations, but anyway) at a power consumption of 6kW (and a cost of 500k$). 2006 to 2012 are 6 years, so let's assume 3 full node shrinks, each of them ideally improving performance by a factor of 2 at the same cost (or halving the cost and power consumption, which wasn't exactly true for the shrinks in question). That makes a factor of 2^3=8. That means 6 years later in 2012 one could have expected 1.5 TFlop/s with 750W power consumption, just 25/31 FPGAs and for a cost of ~60k$.
2012 one could buy a single K20X with 235W power consumption and 1.3 TFlop/s. Even assuming that example was done with the older 130nm FPGAs, which would increase the ideal scaling factor to 28nm from 8 to 22.6, then one would end up with 1.5 TFlop/s from 9/11 FPGAs with 265W consumption and 22k$. Hmm. Still doesn't look like a winner. And keep in mind, I assumed really ideal scaling for performance and costs.

Grall · Aug 14, 2013

Squeak said:
The whole field kind of got rebooted in about 1980 when micros really caught on. And mostly not for the better. A lot of good ideas where forgotten, diluted or distorted beyond recognition. Things were balkanized and were run mostly by talented amateurs and hacks that lacked the deeper understanding and wiseness of the older generation. And they weren't willing to learn and be humble about it.

Wut.

Who're you to so casually, presumptively and arrogantly dismiss over thirty years of enormously successful exponential computational growth?

Squeak · Aug 14, 2013

Gipsel said:
How was it a question that a general purpose CPUs could not be fast enough for general purpose tasks? How would you have done general purpose computing with specialized ASICs? That really lacks the flexibility of controlling the complete behaviour with a program. For a different task you would have to reconfigure the hardware. That's basically almost the FPGA model, the reconfiguration of FPGAs is just faster at the expense of execution speed.

The initial slew of 8 bit and to a certain extent 16 bit CPUs where never meant to be used in true general purpose computers. The only reason Chuck Peddle ever got the idea to put a 6502 in a standalone computer was after a visit to the Homebrew Computer Club and the garage with the two Steves. Likewise with the 8080 and others.
Initially they where actually not great successes as microcontrollers either or in dump terminals. They were either too expensive, slow, or people were suspicious of them and thought that good old bipolar would be win out in the long run.
But they really caught on with the hobbyists.
That's kind of where FPGAs are today, Although they are pretty successful in industrial applications.

Is that really true?

How could it not be true?

That's not true. One can easily get FPGAs using leading edge processes (even intel's 22nm FinFet process). They use new processes often earlier than GPUs for instance, as the repetitive structure lends themself to very simple methods of redundancy based salvaging of chips. It is relatively easy to get even very large FPGAs to yield on a bleeding edge process compared to other types of chips like complex CPUs or GPUs. And Xilinx for instance already uses "2.5D stacking" on interposers to combine several FPGA tiles (done with TSMC 28HPL) to one large device for more than a year. So if anything, they lead and don't follow the introduction of new manufacturing technologies.

Ok, that is a rather recent development to my knowledge. But that just goes further in favor of FPGAs. Better yields are always a good thing. The second you start to introduce mixed mode dies and the like, yields would start to drop off though.

Edit:
I just skipped through the slides in the video. And you could interprete the reasoning given there easily in a way to explain why FPGA accelerators aren't that much of a favoured solution over GPUs (I have said something about this before).
Basically he claims that FPGAs have roughly a factor of 100 advantage over CPUs in terms of performance/Watt, but only for massively parallel problems, while specialized ASICs have another 10-fold or more advantage over FPGAs. I could agree with that. The interesting thing is now, that general purpose vector architectures like GPUs (or the Larrabee offsprings) can claw back at least one order of magnitude if not two for a lot of those massively parallel problems (even the SIMD extensions on CPUs can pull off almost one order of magnitude compared to scalar CPUs he uses as base), i.e. pretty close to FPGAs on a logarithmic scale. And all that while retaining and just augmenting the usual programming principles and without requiring a complete new ecosystem. Combine that with the economy of scale of consumer products and you have a clear winner.
Or let's take some numbers from this 2006 talk:
A full rack with (I think 200 used for computation, 250 in total) parallel FPGAs could generate 1.5 TFlop/s (theoretical peak, so probably not a useful algorithm; and let's assume it's double precision, a lot of the operations he talks about are usually just 16 or 32bit operations, but anyway) at a power consumption of 6kW (and a cost of 500k$). 2006 to 2012 are 6 years, so let's assume 3 full node shrinks, each of them ideally improving performance by a factor of 2 at the same cost (or halving the cost and power consumption, which wasn't exactly true for the shrinks in question). That makes a factor of 2^3=8. That means 6 years later in 2012 one could have expected 1.5 TFlop/s with 750W power consumption, just 25/31 FPGAs and for a cost of ~60k$.
2012 one could buy a single K20X with 235W power consumption and 1.3 TFlop/s. Even assuming that example was done with the older 130nm FPGAs, which would increase the ideal scaling factor to 28nm from 8 to 22.6, then one would end up with 1.5 TFlop/s from 9/11 FPGAs with 265W consumption and 22k$. Hmm. Still doesn't look like a winner. And keep in mind, I assumed really ideal scaling for performance and costs.

What makes you think you can scale things as simply as that? SIMD extensions and GPU where also in existence back then.
The project evolved into the BEE2 and BEE3 which among other involved Charles Thacker, and they actually shipped boards that are used for image processing in astronomical observatories among other places.
Think I linked this one before, but anyway:
http://research.microsoft.com/pubs/130834/ISVLSI_FINAL.pdf
You wouldn't be able to accelerate HLL with a GPU either.

Changing the tools wouldn't be a big hurdle if it was worth it. Things can change (and have done so) as Broderson says in the video.
Politics would be a much more likely explanation for reluctance to adopt FPGAs for GPP.

Squeak · Aug 14, 2013

Grall said:
Wut.

Who're you to so casually, presumptively and arrogantly dismiss over thirty years of enormously successful exponential computational growth?

The only thing really driving the industry today is Moores law and explosive growth in the consumer sector that has let to. Things have gotten quantitatively better, not qualitatively.
The problem right now is that it becomes increasingly harder to scale the same old ideas.

This has let to the stagnation of imagination and lack of excitement for and in the industry. IBMism, or as it is today microsoftism has become rampant.
We're basically running on fumes.

I challenge you to come up with just one really truly new idea that has appeared in the field after 1980.

Gipsel · Aug 15, 2013

Squeak said:
How could it not be true?

Where do you see that FPGAs get increasingly more popular? They have about the usual (relatively small) growth as the whole industry.

Squeak said:
Ok, that is a rather recent development to my knowledge. But that just goes further in favor of FPGAs.

Not really. If it is indeed a rather recent developement as you say, they already got their period of faster than Moore growth. That is basically over now as you can't adopt new processes before they get available.

Squeak said:
What makes you think you can scale things as simply as that?

Do you want me to put in non-ideal scaling factors for the costs, which would show FPGAs in a worse light?

Squeak said:
SIMD extensions and GPU where also in existence back then.

That example was specifically about double precision FLOPs. GPUs were incapable of these back then. And they were also a bit away from having largely general purpose vector cores.

Squeak said:
The project evolved into the BEE2 and BEE3 which among other involved Charles Thacker, and they actually shipped boards that are used for image processing in astronomical observatories among other places.

I said already several times, that for niche applications FPGAs can be a very valid and good solution. But using it as the core element of a really general purpose machine isn't the best idea in my opinion. I gave plenty of reasons for that.

Squeak said:
Think I linked this one before, but anyway:
http://research.microsoft.com/pubs/130834/ISVLSI_FINAL.pdf

A task, where a single core of a intel Core 2 Duo E8500 beats the BEE3 setup with 4 FPGAs by a factor of two isn't actually proving your point. And to be frank, it's a pretty contrived benchmark to start with. That the CUDA version is hardly faster than the FPGA setup, can be also partly attributed to this. The other reason would be that being a double precision benchmark, the GT200 based GPU doesn't exactly excels at DP.
To expand on how it is contrived, just look at this quote from the paper:

Due to the on-chip memory limitations of the FPGA, the Gaxpy implementations are evaluated using a set of relatively small matrix sizes to measure on-chip computation time on the 3 platforms without considering the I/O impact of main memory access, which is the focus of our future work.

One can easily see, that the GPU never gets enough to do to load all of its SMs (actually it does for the matrix sizes on the very left of the graphs, but the execution time for the kernel is so short, the execution time is completely dominated by the calling overhead for the kernel launch

). The two matrix sizes on the very right of the graphs probably start even just a single Warp (it calculates just 32 or even only 8 elements)! Is that a realistic scenario for HPC accelerators? I don't think so. They have chosen such small matrix sizes so they can keep all data completely on chip for the FPGA setup.
And their conclusion that one can build DGEMM kernels from gaxpy also doesn't hold any water (one uses different cache blocking strategies to reduce the memory bandwidth requirements). But let's assume they do this as they say. The FPGA performance (tiny 3.1 GFlop/s peak for the 4 FPGA setup, a modern smartphone can probably beat this today) wouldn't increase in that case (as they kept everything on chip for gaxpy, which wouldn't work for dgemm [or any larger matrix sizes], so one would loose performance due to off-chip memory accesses, they basically worked completely out of the on chip memory). But CPUs and GPUs routinely achieve ~90% of their peak flops in optimized DGEMM kernels (even distributed over very large HPC installations one easily gets >60% of the whole machine).

Squeak said:
You wouldn't be able to accelerate HLL with a GPU either.

HLL? You mean HPL? Of course it works. Look at the machines in the Top500. I've heard they use that HPL benchmark for ordering that list. But of course they use really large matrix sizes (which is also contrived in some sense, but the only possibility to load such huge machines as a monolithic block [in practice, most of the time a lot of much smaller jobs run concurrently] and avoid the pitfall of that paper you linked).

Nano · Aug 18, 2013

I think the Wii-U hardware is fine how it is and this is after going through an initial phase of bewilderment.

The focus now should be taking advantage of its shader based/multi-core architecture, which has a lot of potential. GPU particularly comparing to current gen consoles.

The framework of shaders and visual effects we're seeing in the little Playstation Vita also remind me just how much can be achieved with limitations in mind. With similar approaches, I would be excited to see what could/could have been on the PS3/360 and what we could see going forward on a platform like the Wii-U.

The tech is there to achieve magnificent visuals and I'm happy with the Gamepad inclusion. Don't have the console yet personally but I'm quite excited about its future.

Arwin · Aug 19, 2013

I'm finding time and time again that it is too expensive currently. The Wii was successful partly because you could get it right away at launch, at $249/EUR. That seemed likely something you could afford to get next to the Xbox 360 or PS3, or get a Wii first, and then after one or two years when the HD consoles have become a bit cheaper, get one of those besides.

The Wii U started out too expensive, and it doesn't seem like it will be capable to come down in price enough. If both a PS4 and a Wii U are available this Christmas, then the Wii U will be relegated to (I'm exagerrating a little) the Nintendo fanboys and pre-teens, and the latter category (i.e. their parents) isn't generally in a hurry to get the latest, and is moving to 3DS and iPad already anyway.

So I think all we (Nintendo) can do really is hope they can get the device to come down in price enough, because otherwise they don't just need one killer-app with mainstream appeal - they'll need a good few of them.

DRS · Aug 21, 2013

Price is ok for a 320-400 shader unit like performance I guess, given that you can get a graphics card with those specs for ~50 euro's.

I wouldn't pay 300+ euros for an 360+gamepad. Unfortunately I already did, but back then I expected ~400-500 GFLOPS.

Rangers · Aug 22, 2013

I had an epiphany...the Wii U should have just been an Ouya style micro console with the very latest chipset (ok, this actually may not have been an option for Nintendo one year ago, as mobile have doubled in power since end of 2012).

It certainly could not have sold worse than Wii U is!

A Snapdragon 800 should compare well with a PS3 in power. Which means it's likely about as powerful as Wii U. 2GB RAM sounds right.

I would imagine they could price this at $149 at most. And I think it would sell more than 30k per month in NPD.

They could then do their high end Zelda and Mario style games in HD, sell SNES and NES classics for a few bucks emulated all day long...Would probably draw 10 watts or something. Even if you clock it super high and it goes way over and draws 20 watts, who cares.

About all you tangibly lose is the tablet controller, which doesn't seem to be working out anyway and conversely adds a high cost to the system, and backwards compatibility, ditto.

It's brilliant I tell you.

Of course to fully realize the potential of a micro console you'd need a company more internet savvy than Nintendo, but that's tangential.

Probably the one big decision regarding this console is whether you'd include an optical drive. Nintendo is kind of a legacy company so I suspect they would. Which would add some bulk and cost. But I think either way you went there they would still do much better than Wii U is.

DRS · Aug 22, 2013

I don't think the Wii U could ever sell better. There is three markets, families, fans and core gamers.

Sales to families is slow, especially since a tablet controller today is not as attractive as motion controls back in '05. Its okay for our kids though; not that mine are old enough to play yet, but it saves battles for TV ownership in future.

Sales to core gamers is faster but depends on hardware and graphics attraction. These core gamers already own a PS360 and so won't be interested in another PS3 like machine, whatever tech is used IMO.

Obviously Nintendo has less than 10 million fans, hence these bad sales. I don't think PS4 would sell well either if we wouldn't know the specs and get bad PS360 ports instead of the expected boost in graphics.

Grall · Aug 22, 2013

DRS said:
I don't think PS4 would sell well either if we wouldn't know the specs and get bad PS360 ports instead of the expected boost in graphics.

Worse, actually. Upwards of a year old bad PS360 ports - if we're lucky, seeing as most publishers simply refuse to port their stuff to wuu.

stiftl · Aug 24, 2013

I don't think Nintendo had much of a better chance, the alternatives just weren't there at the time, GFs 32nm was not there at the time when they would have needed it (Power7+ on IBMs identical (?) 32nm process also just started the same year as WiiU)

AMD could have only provided a doubled up Bobcat or a downclocked Llano, both which would probably be more power hungry nonetheless with not much more performance (Llano more with much higher power draw though). Both designs wouldn't be backwards compatible with the Wii architecture (okay, you could solve this with a small seperate chip, drives complexity and costs though) and the lack of memory bandwith (neither does use eDRAM). They could have chosen a GDDR5 interface instead, but this does again drive cost.

I am not saying that they couldn't have shown more effort from a performance view, but it isn't as bad as some people are making it keeping in mind the power envelope and the time of the design freeze.

I would have probably doubled the RAM, which would also mean double the bandwith and maybe they could have put in 2 more CPU cores (they are tiny).

If they would have known that they would only sell 3,5 million units in half a year they could have also gone with a not so mature 32nm process and make it a SoC.

As I said, a launch a year (or even half a year) later would have made a lot of new tech / processes possible, but I guess they had the design ready and wanted to keep the head start.

Exophase · Aug 24, 2013

stiftl said:
I don't think Nintendo had much of a better chance, the alternatives just weren't there at the time, GFs 32nm was not there at the time when they would have needed it (Power7+ on IBMs identical (?) 32nm process also just started the same year as WiiU)

GF has had 32nm products out since the middle of 2011 (Llano). Maybe you mean their HP 28nm process. Not sure why you'd be restricted to them though, it's not like they're being used by Nintendo now anyway.

TSMC 28nm would have been perfectly viable for Wii U.

All of that aside, even on 45nm the CPU is positively tiny on Wii U. They could have gone much bigger there. They could have included the Wii CPU seperately at a very modest die cost - there's no reason why it'd have to be a separate chip. Something better would have probably needed more power but that's not exactly the critical metric here, even if Nintendo is acting like ~33W is a huge challenge (maybe for them it was, and maybe that's part of the problem...)

We still know too little about the GPU but every indication thus far is that it's rather poor vs modern offerings from a perf/area perspective.

I think Nintendo's biggest problem here is their determination to stick with evolving old technologies rather than taking chances with new things. This extends to the choices they make in who they buy from, Renesas for example could be an anchor around their neck.

stiftl said:
I would have probably doubled the RAM, which would also mean double the bandwith and maybe they could have put in 2 more CPU cores (they are tiny).

RAM capacity is the one area where they're well suited, IMO. Doubling the interface width to 128-bit would have been good, but it's uncertain if the GPU + eDRAM is really that starved for external bandwidth.

You can't just throw as many CPU cores as you have space for on a design, the infrastructure has to be support it. No idea what their's supports.

liolio · Sep 1, 2013

As a side note looking at what can do a 79mm2 chip as the GK 208 using a single memory channel (though connected to fast gddr5) makes the WiiU looks really really bad, even more than looking at the "AMD side of the coin" which would be MARS mobile GPU.

What the Wii U hardware should have been?

Exophase

ninzel

Squeak

Gipsel

Grall

Invisible Member

Squeak

Gipsel

Grall

Invisible Member

Squeak

Squeak

Gipsel

Nano

Arwin

Now Officially a Top 10 Poster

DRS

Rangers

DRS

Grall

Invisible Member

stiftl

Exophase

liolio

Aquoiboniste

Similar threads