Nintendo GOing Forward.

And this amounts to some CPU cycles being freed up as well, correct?
Yes, in the sense the CPU does not have to perform the actual data copying. It does have to set up the DMA channel to do it though, and this adds a little bit of overhead. In the Amiga, there was a defined data size where it just wasn't worth using the blitter, and where using the CPU to copy was actually faster. You could pre-build "copper lists" though, if you had a specific set of tasks you needed taken care of.

The copper was a simple co-processor that ran independently of the MC68k CPU that could read lists of instructions and write specific values into the various hardware registers without CPU intervention, but this had limitations since the copper was very simplistic and couldn't perform any game logic operations. For example, in a game it couldn't set up the blitter to move enemy ships based on player input, but it could do some nifty other things for its time...

More modern DMA channels may also have some mitigation techniques, or just plain require less setup before starting the transfer. It would depend on the hardware, and I have no such information sadly. :) Then again, today's CPUs are brutally fast, so perhaps you don't notice any overhead. :p
 
There have been lots of Nintendo fans insisting that Nintendo won't deliver that cardboard coffin to the Wii U
this early with it having spent only 4 years on the market once 2016 hits. This left me wondering if this scenario
is plausible, would it be possible for them to launch a handheld NX in 2016 at comparable or even better performance
than the Wii U so that they can continue to support their home console while also leveraging their lucrative handheld business ?

I see that they've consolidated their game development teams for handheld's and home console :
http://www.wired.com/2015/09/nintendo-ead-spd-merge/

And that they're in the middle of releasing games across the Wii U/3DS or have already done so like with Smash bros. 4 on the 3DS/Wii U.
On top of that there have also been other notable dual releases like Monster Hunter 3 Ultimate and the soon to be released Hyrule Warriors 3DS.
These releases kind of made me wonder if Nintendo would try their hand at that, which could lead to them pleasing their hardcore fan base which went out and bought their home console while still satisfying people who find the 3DS to be showing it's age by now.

My second question is for what they might do with their next home console (which I'm assuming will also be apart of NX), would the CPU
in the WII U be able to compete with what PS4/Xbone have under the hood if it's paired with a much better GPU ? Or is it to old and possibly
inefficient to even think about competing with them and their newer designs ?
 
I could see some overlap with some of Wii U's final games also being available on NX. Zelda U is the most probably example. NX may be so different that it would be tough to classify it as strictly a replacement for 3DS or the Wii U, but potentially the product that is the successor to both. It doesn't seem likely for Nintendo is simply make a Nintendo branded PlayStation.

As for the CPU in the Wii U, its got to go. I see no reason why Nintendo wouldn't go with a modern ARM CPU that would absolutely destroy the Wii U's CPU in terms of floating point performance. Not to mention that as power efficient as the PPC750 based Espresso chip is, it still runs a lot hotter than a better performing ARM chip these days.
 
There have been lots of Nintendo fans insisting that Nintendo won't deliver that cardboard coffin to the Wii U this early with it having spent only 4 years on the market once 2016 hits.

I don't think those people have the perception on how bad the Wii U actually is for Nintendo.

The Wii U is by far Nintendo's worst selling console of the last 20 years. After 3 years in the market (until November 2015), the Wii U will have sold close to 11 Million units.
- The Gamecube, which was said to give a borderline decent result to Nintendo, sold almost 19 Million units in the same time range.
- The Dreamcast, which blew Sega out of the hardware competition for good, sold 10.7 Million units during the 2 years it was in the market, whereas in 2 years the Wii U sold less than 6 Millions. Let that sink in: the Dreamcast - a factual failure - sold almost 2x more consoles than the Wii U during its first two years.
- Nintendo has been losing money, mindshare and brand value since the Wii U was released, even with the handheld market going very well for them.

With the PS4 and XBone gaining millions of new users every month, there's really nothing else that can save the Wii U anymore. As long as that console is in the market, Nintendo is losing money, period.


By the end of fiscal 2016 (March 2017), the Wii U will have had almost 4 and a half years in the shelves.
Before the Wii U, almost all other home Nintendo consoles had close to a 5 year period between them. In the handheld models, things are even more erratic since the very successful GameBoy Advance only had 3 years before the DS practically took over (and the GBA had sold 40 million units during those 3 years).

So should Nintendo cut back the Wii U's planned lifetime in ~20% relative to its other home consoles in order to stop their doom?
Yes, they should. In fact, they should've done that this very same year.
 
Ok, liolio. I'm ready.
The WiiU was the biggest Nintendo system ever as far as silicon sq.mm are concerned. As a counterbalance it is also their only system that went with pretty outdated process. Now 130mm2 might be greedy using a now widely available process, let say in somewhere in between an A6X and an A5X.(or Cap Verde -BOnaire speaking of AMD GPUs).
And perhaps 180mm2 on a new process is greedy on my part. If and when Sony/MS do slim versions of their consoles w/ finFET, they would probably end up around that size. If Nintendo want smaller/cheaper, they need to go w/ less. Between 130-150mm2 maybe.
Actually I always leaned slightly toward the challenging side but ultimately I'm a rational AMD IPs are not cutting it at the moment. I absolutely get that they come at a discount compared to Nvidia, or INtel ones but I wonder to which extent they are competitive with mobile IPs within the budget and performance I can see them aim for.
Yeah, it's more business than technology at this point. But I would be interested in seeing the PowerVR GT7900 in a "low power console" like they mention in the blog.
Unreal Engine 4 runs on mobile, they have a things for making huge statement as they benefit from the sustained upgrade cycle in GPUs, etc. But TFLOPS are not even close to be an enabling factor for ports or engine support. Money comes first, then I would put forward CPU and RAM, graphics are hugely scalable. I would point again think of the difference in muscles between the XB1 and the 360 and the last AAA Tomb Raider game. DF speak of slow down in some huge open area or some hub in the game. What is the issue here ? RAM? CPU power GPU power?
I would bet that doubling Xenon L2 and the RAM would go a longer way than spending the same dollars on the Xenos and daughter die.
True, UE4 runs on mobile, but I seem to recall reading about there being different rendering methods depending on GPU grunt. Don't mobile games use a forward rendering technique? Would this matter to Nintendo?
I agree that ~130 sq.mm is going to buy them much. Kabini is 104 sq.mm iirc, Apple A7 mostly the same thought the later is much better. Give it anywhere close Kabini TDP and it will make a showing. Neither of those two SOCs are using update version of the IPs they are relying onto. The bandwidth available through standard RAM made a big jump thank to DDR4. Neither of those two SOCs were designed as gaming, budget constrained, chips.
The A7 as a SOC powering mobile smart devices includes things that may not be needed in an APU. The A7 and Kabini emphasis single thread performances, other choices can be made in order to make room for more GPU for example. A cluster of Jaguar/Puma and A72 both with 2MB of L2 should be really close in size but there could be other ways (than SMP) to provide a fitting CPU resources through asymmetric Multi processor akin to ARM big-little approach or STI Broadband Engine /Cell. Not as easy but at some point if you want to be both affordable, performant and secure some margins, something has to give.
Big.little is something they could be looking at if they just want some more SIMD capabilities w/ the CPU. I think they'll try to get the CPU comparable to Xbone/PS4, as CPU tasks don't scale as easily as GPU ones do. Also, going by several of their comments, Nintendo have clearly heard the developer complaints about the Wuu CPU. Even Miyamoto called it out as a bottleneck in Star Fox. I think back to the paltry N64 texture cache and how they subsequently decided to include a whopping 1 Megabyte of the stuff in Flipper. Does anyone know if that is a true cache, btw? Or is it more a scratchpad?

Back to die size, looking at Carrizo, that's 250mm2 for 8 GCN cores. That also doesn't include any embedded memory. It has a nice TDP, but is it binned or carry a laptop premium cost? If not, Carrizo, even at 28nm, would be a nice choice for them if they could get around memory limitations. I mean a similar SoC w/ ARM CPU cores though.
I hope they pass on both HBM and scratchpad memory whatever level of performances they are pursuing. It is a crazy expense when Nvidia has it mid end GPUs beating the hell out of 128 bit bus and fast GDDR5. Nvidia low-mid end offering the GTX 750 ti beats often the PS4 and its big and wide memory bus.

I wish for 128 bit bus to a reasonable amount of either DDR4 or GDDR5.
Overall I suspect Nvidia is too expensive, so the only partners up to the challenge Nintendo is facing is either ARM on its own (cortex CPU and Mali GPU) or ARM and PowerVR. I said it already, I would favor an all ARM design for convenience.
I don't think Nintendo would ever allow memory to be a bottleneck. Even Wii U has 30 GB/s eDRAM, which is more than the 25.6 GB/s standard which AMD uses for similarly specced consumer GPUs. If we scratch any embedded memory or HBM, the most they could probably get for a decent cost is ~50 GB/s DDR4 (128-bit bus). There are these 12 Gb lpDDR4 chips from Samsung that are faster. They would need 8 chips, however, in order to get the 72 GB/s which AMD seem to find appropriate for their ~1 TFLOP cards.
 
True, UE4 runs on mobile, but I seem to recall reading about there being different rendering methods depending on GPU grunt. Don't mobile games use a forward rendering technique? Would this matter to Nintendo?

That's correct, the renderers are vastly different, so if Nintendo wants "easy" ports from the other consoles, they'd need something in the same ballpark. At least the new console would be done mid-gen as opposed to the very end of last gen, but as mentioned already, they'll probably want to be somewhat conscientious of cost.

That said, I'm not sure they should be chasing ports at this point. Folks will be deeply entrenched with the other two consoles, so Nintendo will still be a secondary console....
 
I don't think Nintendo would ever allow memory to be a bottleneck. Even Wii U has 30 GB/s eDRAM, which is more than the 25.6 GB/s standard which AMD uses for similarly specced consumer GPUs. If we scratch any embedded memory or HBM, the most they could probably get for a decent cost is ~50 GB/s DDR4 (128-bit bus). There are these 12 Gb lpDDR4 chips from Samsung that are faster. They would need 8 chips, however, in order to get the 72 GB/s which AMD seem to find appropriate for their ~1 TFLOP cards.

Hum.. that was before delta color compression though. The latest Tonga card R9 380X does 4 TFLOP/s with a 180GB/s bandwidth. That's 45GB/s-per-TFLOP/s. Even the Fury X which is bandwidth behemoth has close to 60GB/s-per-TFLOP/s. (Of course, since bandwidth is generally closely related to ROPs, perhaps a less inaccurate measurement would be bandwidth-per-fillrate.)

Then again, it's Nintendo, so it's perfectly acceptable to think they'll pass on 2014 technology for their late 2016 / early 2017 console.

In the event that hell freezes over, Nintendo turns inside out and decides to use new technology, in 2016 they'll have a crapload of options to use for very high bandwidth at low power consumption and small footprint.
With GDDR5X at 10 000MT/s they would only need a 128bit bus to match the PS4 with half the number of chips. With JEDEC's Wide I/O2 they would need only two stacks for 136GB/s. With HBM2 they would need only one stack for up to 256GB/s.

Late 2016 would've been a great year for Sony and Microsoft to launch new consoles.
 
Last edited by a moderator:
Shrink Durango, cut off the esram and a couple CUs, 128-bit bus, update GCN and strap it to DDR4-3200? :p There, <140mm^2. /s
 
Yap, Carrizo with its GPU clocked towards 1GHz would get those results... Except that chip is 250mm^2 at 28nm.
 
As in hang on to PS360 for another 2 years before the next gen? :oops:

Rather as in the eighth generation took waaaay too long to come. It should have come 5 years after the seventh in late 2011 (maybe 32nm VLIW4, 4GB GDDR5, Bobcat cores, etc.), and in late 2016 we would have GCN2.0 + Zen + HBM2 + SSD drive.
 
Shrink Durango, cut off the esram and a couple CUs, 128-bit bus, update GCN and strap it to DDR4-3200? :p There, <140mm^2. /s
Hey, if AMD's claims about delta color compression are true, that might just work!
Yap, Carrizo with its GPU clocked towards 1GHz would get those results... Except that chip is 250mm^2 at 28nm.

It's a largish chip, but that would be a cheap and efficient solution. But can Nintendo really order millions of Carrizo-esque SoCs and expect good yields? 15w seems really low for that performance. I know AMD claim advancements and 28nm is mature now, but that still seems optimistic.

Speaking of node maturity, would AMD stick a couple redundant GCN cores on the NX SoC the way they did w/ Xbone/PS4? I notice that Carrizo (and Latte for that matter) don't have any unused cores. Is that due to the maturity of the node? The graphics architecture? Both?
 
Rather as in the eighth generation took waaaay too long to come. It should have come 5 years after the seventh in late 2011 (maybe 32nm VLIW4, 4GB GDDR5, Bobcat cores, etc.), and in late 2016 we would have GCN2.0 + Zen + HBM2 + SSD drive.
This gen is only just a decent upgrade to last gen. Launching that much earlier wouldn't have been a particularly impressive generational advance. That's another discussion though. Importantly, N. have a few tech advances they could use.
 
And perhaps 180mm2 on a new process is greedy on my part. If and when Sony/MS do slim versions of their consoles w/ finFET, they would probably end up around that size. If Nintendo want smaller/cheaper, they need to go w/ less. Between 130-150mm2 maybe.
I'm still unconvinced that MSFT or SOny will make a finfet version of the the XB! or the PS4 for the simple reason that if AMD were planning to have Puma+ core available on 14/16 nm process they would just be so happy to inform share owners and investors. The writing (for me at least) is on the wall and oh well it is not it's blank.
Yeah, it's more business than technology at this point. But I would be interested in seeing the PowerVR GT7900 in a "low power console" like they mention in the blog.
it is technology and business, even Nvidia can't touch PowerVR performance in the low end low power segment. The configuration you are pointing to is a bit too high as I don't think Nintendo will try to compete with Sony or MSFT and so it doesn't sound harsh actually I would agree with suh move it is not in their best interest. Now where are console in the greater realm of graphic? AMD GPUs in console and discrete GPUs are the GPUs that are pushing the most FLOPS they also happen to have the worst performances measured in FPS per FLOPS.
I'm saying one thing ultimately whether or not Nintendo wants to compete with, out do, or undercut (in price) its competitor it has nothing to do with Finfet or HBM, it has a lot more to do with the IPs you choose. I am convinced that you can outdo the both the ps4 and the XB1 using 28nm process, no fancy memory technology and while spending less on silicon (IPs it is difficult to know).
True, UE4 runs on mobile, but I seem to recall reading about there being different rendering methods depending on GPU grunt. Don't mobile games use a forward rendering technique? Would this matter to Nintendo?
Well hardware deferred renderers (/GPUs) make their things through hardware and drivers, if I'm not misslead it is transparent to software.
Big.little is something they could be looking at if they just want some more SIMD capabilities w/ the CPU. I think they'll try to get the CPU comparable to Xbone/PS4, as CPU tasks don't scale as easily as GPU ones do.
Indeed it seems it does not by any available public accounts available to us, though it could very well be that AMP is here to stay and not only for power saving purpose. when I read the ongoing researches I linked, my understanding is that the Cell Broadband engine issue was not heterogeneity but the memory model. One thing is clear, 8 middle of the road CPU cores connected through a sucky interconnect is not efficient ;) As for the SIMD capability, I don't know though dev would know through profiling what they need +/- . I would assert that low IPC /low power cores are good for significantly parallel tasks with a decent amount of dependencies, tasks where both the "mojos" of high IPC Cores (and usually they big caches) and brute force approach (/GPUs) fail. That would require real profiling and its something AMD can't deliver but it would be interesting to see how 2 clusters of x4 A72, each with 2MB of L2 compares to one cluster of x4 A72 with 2MB of L2 backed with 2 clusters of A35 each linked to 1MB of L2. The later might come slightly tinier as looking at the old Exynos 5433 (x4-A15 and x4-A7) and matching Anandtech data, you could almost get 4 clusters of A35 with 512KB of L2 within the same space as another A72 cluster.
Now thinking of a system that does not operate within the really razor tdp constraint mobile chip have to operate within, but with other cost related constraints one could make further trade off: pass on the benefit of the ARM v8 ISA and on some others (numerous in fact) architectural improvements but compensate through higher clock speed and the matching power cost. Looking at where AMD Jaguar cores stand compare to mobile CPU IPs, both in perfs per watts and mm2, saving some sq.mm on the CPU at the expense of power for the sake of your gpu could make sense. A72 and Jaguar should be around the same size at the same process, the former being better in pretty much every way. Now if you cut corners A17-A7 are going to save you lot of silicon , that might be enough to justify a bump of a couple hundred MHz (and the Watts) to if not make up for the loss but sweeten things up.
Also, going by several of their comments, Nintendo have clearly heard the developer complaints about the Wuu CPU. Even Miyamoto called it out as a bottleneck in Star Fox. I think back to the paltry N64 texture cache and how they subsequently decided to include a whopping 1 Megabyte of the stuff in Flipper. Does anyone know if that is a true cache, btw? Or is it more a scratchpad?
What I understood from lots of conversation on that very topic is that the Wii U cpu cores are far from bad if you look at performances per cycles, per mm2 or watts. the things is they 3 low power cores at 1.2GHz is only get you that far even if those cores are good at being low power cores.
Back to die size, looking at Carrizo, that's 250mm2 for 8 GCN cores. That also doesn't include any embedded memory. It has a nice TDP, but is it binned or carry a laptop premium cost? If not, Carrizo, even at 28nm, would be a nice choice for them if they could get around memory limitations. I mean a similar SoC w/ ARM CPU cores though.
imho sales say the whole story the overall chip merits, cpu and gpu, it is harsh, I wish AMD were doing better but wishful thinking only get you that far.
I don't think Nintendo would ever allow memory to be a bottleneck. Even Wii U has 30 GB/s eDRAM, which is more than the 25.6 GB/s standard which AMD uses for similarly specced consumer GPUs. If we scratch any embedded memory or HBM, the most they could probably get for a decent cost is ~50 GB/s DDR4 (128-bit bus). There are these 12 Gb lpDDR4 chips from Samsung that are faster. They would need 8 chips, however, in order to get the 72 GB/s which AMD seem to find appropriate for their ~1 TFLOP cards.
Heavenly bottleneck that is what they should aimed for ;) Now my hopes are low if not nills. My bet is that if AMD is inside the hardware won't cut it on any performance bracket, either not performant or to costly, etc.
 
Last edited:
Harsh on AMD, liolio! haha. But yeah, I almost got a bit upset earlier this year when I bought a 970 for my desktop and then saw that AMD's R9 390 had on-paper much better specs. More FLOPs and whatnot. Then, when I looked at the benchmarks, it was pretty even w/ the Nvidia card. Sometimes a few more fps and sometimes a few less. Considering I got 3 free AAA games w/ the Nvidia card, I think I made the right choice...even if I only get 3.5 GB of VRAM.
 
What I understood from lots of conversation on that very topic is that the Wii U cpu cores are far from bad if you look at performances per cycles, per mm2 or watts. the things is they 3 low power cores at 1.2GHz is only get you that far even if those cores are good at being low power cores.

Indeed. They seem to mostly be just lacking in clocks & SIMD.

It seems strange they wouldn't take advantage of all the dead space on the die though, unless they couldn't?
 
We could get darkblu in here to extol the virtues of the PPC750, but yeah, I think it's mostly the short pipeline which prevents from clocking higher. That and the primitive SIMD functionality are the big limiters. It's almost too bad IBM left the race. An SoC w/ FD-SOI and eDRAM would an interesting shakeup. AMD have stated they're not interested in specialty nodes, though. Blah.
 
Back
Top