Nintendo 3DS hardware thread

I have often wondered where we would have been today if desktop graphics had remained within the 25W limit imposed by the AGP socket, and/or not have evolved completely at odds with consumer trends towards mobile computing.
The path might have been somewhat different, but I think we would still have ended up with similar hardware today, just less powerful. Programmability in desktop graphics essentially took off when just outputting more multitextured pixels became less and less feasible (because bandwidth growth couldn't keep up), and no one really knew which advanced algorithms developers were longing for. You could argue that there was room for one or two additional years of "more pixels" – certainly anyone who wanted a V5 6000 would agree. But without agreement on what to implement, features like fixed function EMBM or tessellation were doomed. In an open platform, "more programmability" is sort of the lowest common denominator. And more programmability can have efficiency advantages, too.

If using some fixed functionality lets the 3DS do the job at lower cost and and lower battery drain, those are very compelling arguments. After all, those kinds of arguments are what justifies having dedicated logic in the first place.
As long as you're able to define closely enough what the job is. I guess it could work in a closed platform, especially if it's from Nintendo.
 
The path might have been somewhat different, but I think we would still have ended up with similar hardware today, just less powerful. Programmability in desktop graphics essentially took off when just outputting more multitextured pixels became less and less feasible (because bandwidth growth couldn't keep up), and no one really knew which advanced algorithms developers were longing for. You could argue that there was room for one or two additional years of "more pixels" – certainly anyone who wanted a V5 6000 would agree. But without agreement on what to implement, features like fixed function EMBM or tessellation were doomed. In an open platform, "more programmability" is sort of the lowest common denominator. And more programmability can have efficiency advantages, too.
Your points are well taken. "Programmability" isn't an absolute value though. Would we be where we are today? Or would we have stuck around at for instance DX9 SM3 level? "no one really knew what advanced algorithms developers were longing for" also implies to me that the process was driven by other considerations than developer interest, for example
* desire to raise the cost of entry for Intel, and of course companies like SIS, S3 and so on.
* desire to maintain an indisputable benefit over integrated graphics.
* desire to go out with a bang, so to speak - increase complexity and performance at the cost of huge dies and power draw, so that you can point to greater than CPU speed of "progress" with some legitimacy. This helps raise your market cap, so that once the inevitable happens, you'll get the highest possible price for your company.

By and large, I'd say that games developers would have gained from a slower rate of development, and greater adoption and lower cost of what would have been considered decent gaming graphics. So it makes sense to look at other contributing justifications. But it isn't a discussion for this thread, unfortunately, other than to note that the mobile market is not desktop graphics, and shouldn't necessarily be expected to follow in those footsteps.

As long as you're able to define closely enough what the job is. I guess it could work in a closed platform, especially if it's from Nintendo.
Of course it works. And Nintendo knows that they are making a hand held game console. It needs to produce 3D graphics inexpensively, and at low power draw, and with reasonable performance. While nothing says that the particular embodiment that Nintendo will use of this particular IP is the optimum compromise at this point in time it is probably decent enough. But the point JohnH raised is an important one, and particularly so in mobile space. The various reasons desktop graphics had for going down the path it did are not necessarily valid here.
 
These kind of arbitrary limitations to programmability definitely won't play nice with OpenCL, an important access to the full potential of mobile GPUs.
 
These kind of arbitrary limitations to programmability definitely won't play nice with OpenCL, an important access to the full potential of mobile GPUs.
That's marketing words.

When viewed as a general co-processor the GPU is horribly inefficient use of transistors, (and thus money and power). This is true for PCs and in mobile space where until recently an FPU at all was considered optional, the issue is far more critical.

GPGPU processing was something that was pushed by ATI and nVidia in order to try to strengthen their market legs outside gaming, or at least appear to investors as if they did. (I take the somewhat cynical view that this was done to increase the likelyhood of being bought out under good conditions to their shareholders.) So it received a lot of PR attention.
But it was only ever a valid proposition on the condition that the GPU was already in the system "for free", and any extra use you could put it to was gravy. While this can actually be considered somewhat true in mobile space, it is equally true that a GPU for mobile use will be subject to very stringent power limitations - and as opposed to a standalone computer card it will have to use system RAM, so there is no benefit to be had there.

So it is a very relevant question as to how focused a GPU for mobile space should be on its main task, producing 3D graphics. Will the extra complexity/cost/power draw required for more general programmability really pay for itself in general computing? My take is no. Hell no. The limitations in what problems are suitable are too severe to ever amount to much when the overall usage of a general purpose device is considered. Add that even those problems that can be addressed, and where someone spends the effort and time to explicitly code to take advantage of the GPU, will still be subsystem limited. The idea doesn't fly.

But we are not children. We know that ultimately this is about marketability. Will it put a mark in a checkbox that consumers care about? Performance is an easier sell than battery life, at least in reviews by PC tech sites, consumers may not agree. But ease of conformance to and performance under specific programming APIs? Is that even sellable? And if you are willing to trade performance for battery life, wouldn't it be far better to simply raise clocks, to the benefit of all applications? Is a more complex GPU better from a marketing perspective than improving all those benchmark graphs, not just the specific accelerated app?

However, this is Nintendo 3DS hardware. It's a handheld game console, not a general purpose computer. It has a job to do, and the more efficiently it does it, the better.
 
I'm not going to chime in on an overall GPGPU debate, but the fact of the matter is that it makes less sense on handhelds because ARM and MIPS are more efficient than x86. Just look at OMAP4: with dual-core A9s at 1GHz and NEON, you get 16GFlops. The SGX540 inside is capable of 6.4GFlops iirc (4 cores*4 flops*200MHz). In terms of efficiency, you'll certainly lose more for a variety of practical reasons than what you might gain through SGX's MIMD nature versus NEON's 4-way SIMD.

So it's not entirely clear to me why you'd bother using the SGX540 in OMAP4 for anything but 3D rendering. Certainly on Tegra the calculation is very different due to the lack of NEON, but Tegra2's GPU architecture isn't suitable to GPU Compute anyway. Tegra3 might be a more intriguing case, and I could see NV pushing GPU Compute (via OpenCL while teaching everyone to optimise for their arch rather than via CUDA I suspect) on it more aggressively than everyone else combined. I remain far from convinced, but we'll see...

In the 3DS case though, it's fair to say they are not missing out on anything.
 
Just look at OMAP4: with dual-core A9s at 1GHz and NEON, you get 16GFlops.
Where did you get that number from or how did you compute it? The NEON unit can only issue one 64-bit wide computation per cycle, so that means 2 32-bit single precision FP. Let's count MUL-ADD as 2 ops and this gives 8 GFLOPS. Add to that NEON isn't IEEE-compliant.
 
Where did you get that number from or how did you compute it? The NEON unit can only issue one 64-bit wide computation per cycle, so that means 2 32-bit single precision FP. Let's count MUL-ADD as 2 ops and this gives 8 GFLOPS. Add to that NEON isn't IEEE-compliant.
Hmm, A8 was certainly 64-bit but I can't seem to remember whether A9 switched to 128-bit, it's been too long since I looked into this stuff. You're probably right given what I remember of the die size. Either way, Qualcomm's Scorpion (as used in Snapdragon) is defintely 128-bit and scales between 800MHz and 1.5GHz on 45/40nm, so you can get up to 24GFlops on that at least. And assuming they kept the same fillrate-ALU ratio as on Snapdragon1, I think we're looking at only 5.3GFlops on the GPU there!

And I'm aware that NEON isn't IEEE compliant, but neither is SGX afaik. I don't know how they differ compliance-wise though...
 
Hmm, A8 was certainly 64-bit but I can't seem to remember whether A9 switched to 128-bit, it's been too long since I looked into this stuff. You're probably right given what I remember of the die size.
Yep, A9 kept the same NEON unit as A8 so it's still 64-bit wide. Look here for instance.

Either way, Qualcomm's Scorpion (as used in Snapdragon) is defintely 128-bit and scales between 800MHz and 1.5GHz on 45/40nm, so you can get up to 24GFlops on that at least. And assuming they kept the same fillrate-ALU ratio as on Snapdragon1, I think we're looking at only 5.3GFlops on the GPU there!
Impressive numbers for sure :)
 
Your points are well taken. "Programmability" isn't an absolute value though. Would we be where we are today? Or would we have stuck around at for instance DX9 SM3 level?
Some of the features beyond SM3 are certainly GPGPU driven (i.e. double precision, IEEE compliance, shared memory, etc.). However, even ignoring that influence I don't think sticking around at some level is an option in an open market/open platform. There are features which increase flexibility and allow graphics algorithms to run more efficiently. Every new GPU generation is influenced by feedback from developers, and this experience leads to chips which are better suited for the existing tasks while at the same time being more flexible. It's not like DX10 GPUs are poor at DX9, quite the opposite.

Of course if you dropped the DX10 requirements you could build an even better DX9 GPU using the same area and power, but I'm not convinced it'd be the better GPU overall.

"no one really knew what advanced algorithms developers were longing for" also implies to me that the process was driven by other considerations than developer interest, for example
These may have been factors, too. But what I meant was that, while there were plenty of ideas for beautiful effects, there was no consensus. The GPU companies had largely failed to have their own fixed-function ideas picked up by developers, or it took several generations to get everyone on the same page (think EMBM, tessellation, displacement mapping, PCF). Having a somewhat programmable pipeline helps not to end up with units that never get used.

This kind of ties in with JohnH's "insanity" argument. In the end it's all about what developers want and do, not what may be the most sensible compromise.

Of course it works. And Nintendo knows that they are making a hand held game console. It needs to produce 3D graphics inexpensively, and at low power draw, and with reasonable performance. While nothing says that the particular embodiment that Nintendo will use of this particular IP is the optimum compromise at this point in time it is probably decent enough. But the point JohnH raised is an important one, and particularly so in mobile space. The various reasons desktop graphics had for going down the path it did are not necessarily valid here.
I'd argue that the divide is much deeper along the open/closed platform line than between mobile and PC.

Mobile GPUs can't follow the same development as desktop GPUs regarding power consumption, but all the lessons leart about using more flexibility to create better graphics are equally valid in the mobile space.
 
Xmas, you make a lot of good points. But I also think your post illuminates how Nintendo has been able to avoid very extensive levels of programmability thus far.

Nintendo, having the top position in both the console and handheld markets, commands a lot of third party support regardless of what hardware decisions they make, and thus get to dictate to those companies what feature set they will get instead of the other way around. Furthermore, with their first party games representing such a great majority of their platform sales, they can truly pick hardware best suited to their desires. The platforms are locked between games and hardware, so they don't have to worry about excluding anyone. The only real weakness is that they haven't been designing their own 3D for 3DS. Fortunately for them a platform existed with tuned fixed function needs. Of course, the variety of graphical techniques will still ultimately suffer, and we won't see as much innovation throughout the life of the console, so it's still a tradeoff.

It's a shame that Nintendo doesn't have a very aggressive mentality towards graphics, and doesn't have the muscle to push a bleeding edge design. Because then we might see a really stunning platform hand tuned around an optimized fixed function mentality, and we'd get to judge how well the divergent approach works in comparison to the programmable mainstream alternatives, when both pushed their hardest. 3DS still looks impressive though, much more so than DS did at release.. maybe this is just a byproduct of what was available, or maybe Nintendo's mindset is shifting slightly.
 
I don't think nintendo have shifted their mindset, as with the 3ds they still (seem to) have delivered a very focused, very quintessential product - a toy, first and foremost.

Also, i don't think that the trend in the gpu space toward GP somehow conflicts with what nintendo do. Developers are seeking new spheres of application of that computational power, but that does not somehow validate electric toothbrushes with OpenCL. On the contrary, it shows the industry realizes that modern GPUs are past the justification stage of means for teenagers to play the FPS of the day.

I guess i'm somewhat of the anti-convergence type, but i have the faint feeling i'm not alone. i guess without really thinking about it, on a subconscious level, adults make the distinction between a tool and a toy. So back to nintendo - i think they are absolutely hones to themselves what business they're in. Perhaps that's why they don't seem to have much of a success with the 13-25 male crowd, as at that age boys tend to be generally confused about their tools and their toys (some of them never pass that stage, judging by the 40year-olds commuting in sports cars).
 
I don't think nintendo have shifted their mindset, as with the 3ds they still (seem to) have delivered a very focused, very quintessential product - a toy, first and foremost.

I guess i'm somewhat of the anti-convergence type, but i have the faint feeling i'm not alone. i guess without really thinking about it, on a subconscious level, adults make the distinction between a tool and a toy. So back to nintendo - i think they are absolutely hones to themselves what business they're in. Perhaps that's why they don't seem to have much of a success with the 13-25 male crowd, as at that age boys tend to be generally confused about their tools and their toys (some of them never pass that stage, judging by the 40year-olds commuting in sports cars).
I disagree with you. The DS is pretty much seen as a tool more than a toy. I think the software variety speaks that out, but I still think the stigmata Nintendo usually gets associated with (which you yourself admit, the 13-25 old who like to regard it as toys) is what's holding back people. And there's one thing that's undeniable about the 3DS is that the 3D effect is for the horizontal setup only, so I don't think it's going to be of benefit to the "casual crowd" (with which said 13-25 or "macho hardcore" frown upon) considering most of the non-game titles use the vertically oriented format. Add the fact they they added an analog slider and you can see how they seemed to have "stepped up" its position towards said 13-25 year olds. The lineup shown was also pretty aggressive.

If you're still wondering why Nintendo hasn't stepped up with bleeding edge hardware, for one there is the Gunpei Yokoi principle, and the other there is also the experiences with the N64 and the GameCube return investment. (3rd would also be how they often tend to give conservative specs, which would be of no use to both hardcore and usual consumer.) Custom tailoring hardware to something 3rd parties can agree upon doesn't necessarily have to be uber expensive.

And lastly, I'm guessing the hardware they've settled with more than does their job because this Hideki Konno who is designing it is coming from a programmers standpoint.
 
I'm not going to chime in on an overall GPGPU debate, but the fact of the matter is that it makes less sense on handhelds because ARM and MIPS are more efficient than x86. Just look at OMAP4: with dual-core A9s at 1GHz and NEON, you get 16GFlops. The SGX540 inside is capable of 6.4GFlops iirc (4 cores*4 flops*200MHz). In terms of efficiency, you'll certainly lose more for a variety of practical reasons than what you might gain through SGX's MIMD nature versus NEON's 4-way SIMD.

So it's not entirely clear to me why you'd bother using the SGX540 in OMAP4 for anything but 3D rendering. Certainly on Tegra the calculation is very different due to the lack of NEON, but Tegra2's GPU architecture isn't suitable to GPU Compute anyway. Tegra3 might be a more intriguing case, and I could see NV pushing GPU Compute (via OpenCL while teaching everyone to optimise for their arch rather than via CUDA I suspect) on it more aggressively than everyone else combined. I remain far from convinced, but we'll see...

In the 3DS case though, it's fair to say they are not missing out on anything.

Comparing GFlops of a CPU to GFlops of a GPU isn't particularly informative.

Many of the applications that you might want to accelerate with the GPU lend themselves to massive parallel threading which the GPU takes advantage of to hide memory latency allowing it to achieve a significant proportion of it's quoted GFlop's. However on the CPU you're unlikely to be able to hide a significant proportion of the memory latency so it's likely to struggle to achieve anything close to it's quoted peak.

Obviously there will be some tasks that are better suited to one or the other, however when it comes to a lot of the likely applications of compute in the mobile space you're likely to find that a GPU will easily out perform a CPU with much higher claimed throughput.

John.
 
There's nothing wrong with that - especially if the car is nearly half your age!
There are plenty of things wrong with that (and I'm saying that as somebody who almost bought a ginetta once for daily driving), but I don't think this forum is the place to discuss them, so I apologize for bringing up the subject (even though it was relevant to the topic) and I'll shut up now.
 
Here's some shots of the 3DS version of Super Street Fighter IV taken from the recent Comic Con event:


253y261.jpg

254d1af.jpg

268s4oa.jpg


It looks like a very impressive conversion.

Oh and incase people suspect these are just PS3/360 shots, they're clearly not, see this direct comparison of the same scene:


3DS

25844b9.jpg


PS3/360

5ciwsl.jpg
[/QUOTE]

The assets are clearly very slightly tweaked and the shading isn't quite the same, that and there's noticeable aliasing artifacts, These are the real deal.
 
Last edited by a moderator:
Comparing GFlops of a CPU to GFlops of a GPU isn't particularly informative.

Many of the applications that you might want to accelerate with the GPU lend themselves to massive parallel threading which the GPU takes advantage of to hide memory latency allowing it to achieve a significant proportion of it's quoted GFlop's. However on the CPU you're unlikely to be able to hide a significant proportion of the memory latency so it's likely to struggle to achieve anything close to it's quoted peak.

Actually, all available data points to the reverse. CPUs tend to get a higher percentage of their peak than GPUs, with differences ranging from 30% to 6x. Look at it this way, GPUs have trouble getting 50% of peak on linpack, which is about the easiest thing in the world to make parallel.
 
Actually, all available data points to the reverse. CPUs tend to get a higher percentage of their peak than GPUs, with differences ranging from 30% to 6x. Look at it this way, GPUs have trouble getting 50% of peak on linpack, which is about the easiest thing in the world to make parallel.

Linpack is not representative of the types of algorthims that likely to be run on a mobile GPU and it's results are counter to the majority of the image processing algorithms I've look at to date.

John.
 
Back
Top