NV got DS2 contract according to BSN

I won't go into detail, but rendering in emulation.

You may be thinking that 3D acceleration is suitable for this, but unfortunately OpenGL ES 2 is not well suited to this task, both due to technical incompatibilities and trends towards high overhead for certain tasks (like reading back the framebuffer into CPU visible userspace memory). OpenCL et al may prove differently but I'm not sure how useful it'll be in the most immediate next generation of handhelds.

It's not a game breaker or anything, I'd just prefer to have NEON.

Which handheld parts have separate video memory? Tegra is unified, and there are extensions for things like render to vertex buffer etc. There may be issues implementing an emulator, but memory hierarchy isn't one of them.
 
At the cost of recking your state batching which remains one of the most critical things for pretty much all architectrures out there.
State multiplicity is adressable by equally simple measures (texture atlasing, shader reuse, etc)
 
Which handheld parts have separate video memory? Tegra is unified, and there are extensions for things like render to vertex buffer etc. There may be issues implementing an emulator, but memory hierarchy isn't one of them.

I didn't say non-unified, I said CPU userspace (this is the key word) visible, provided by the drivers. Being unified doesn't alleviate this in typical use cases because the rendering pipeline on platforms like SGX is very deep. So at any given moment you expect the 3D rendering to be quite a ways behind you - closing the loop on this and waiting for the framebuffer turns all that latency into bandwidth throttling.

How much this is physically mandated is uncertain to me, but current drivers are certainly don't perform well on framebuffer reads. Go ahead and try this on any IMG sporting handheld out there, I'll be thrilled if you get something within usable overhead (let's say < 2ms for some decent resolution and scene complexity).. even a decently asynchronous solution would be interesting, but only about as useful as the CPU can really get ahead of things which in this case is not very.
 
Clearly not that addressable as batch size remains one of the biggest issues for game engines.
There's very little a game engine can do there. It's mainly about smart production pipelines and purposeful scene/asset design. As long as those are not in place (and game engines do not generate their own assets) there will always be a problem with state batching.
 
There's very little a game engine can do there. It's mainly about smart production pipelines and purposeful scene/asset design. As long as those are not in place (and game engines do not generate their own assets) there will always be a problem with state batching.

Exactly, and doing any kind of loose front to back sort will only compound the problem, so games generally don't do it.
 
I didn't say non-unified, I said CPU userspace (this is the key word) visible, provided by the drivers. Being unified doesn't alleviate this in typical use cases because the rendering pipeline on platforms like SGX is very deep. So at any given moment you expect the 3D rendering to be quite a ways behind you - closing the loop on this and waiting for the framebuffer turns all that latency into bandwidth throttling.

How much this is physically mandated is uncertain to me, but current drivers are certainly don't perform well on framebuffer reads. Go ahead and try this on any IMG sporting handheld out there, I'll be thrilled if you get something within usable overhead (let's say < 2ms for some decent resolution and scene complexity).. even a decently asynchronous solution would be interesting, but only about as useful as the CPU can really get ahead of things which in this case is not very.

I can see where SGX would cause you to jump through a few extra hoops (I'm very familiar with chunkers)

Tegra isn't a chunker, and in any case there are very few use cases where you are constrained by chained data dependencies (i.e. you can usually read buffer N after sending commands for N+1 so at least the pipeline doesn't stall out). There are Tegra extensions to render directly to a mappable buffer, so apart from some page-table munging you get copy-free access to rendering results, and I seem to recall they support async readbacks via one of the ARB extensions though I'd have to research that a bit more. The biggest win is that with GLES2.0 you can usually write shaders that handle everything in the pipeline once the initial data has been sent, freeing the CPU to run game logic etc.
 
Exactly, and doing any kind of loose front to back sort will only compound the problem, so games generally don't do it.
Spatial sort is imminent the moment you have translucent elements in your scene. The more those are, the more your scene's draw order will be governed by spatial propeties, and less by state batching.
 
Spatial sort is imminent the moment you have translucent elements in your scene. The more those are, the more your scene's draw order will be governed by spatial propeties, and less by state batching.

Engines generally handle translucency completely differently to opaque data, alpha test continues to also be used extensively (for example in foliage simulations) to avoid unnecessary sorting. Translucency itself is also often handled without Z updates in order to avoid the need to sort, although some effect might be implemented with a loose sort. The fact remains that modern applications still tend to sort by state not depth, we still see in the data that the drivers receive from those applications, I don't see that there is any debate over this.

John.
 
I can see where SGX would cause you to jump through a few extra hoops (I'm very familiar with chunkers)

Tegra isn't a chunker, and in any case there are very few use cases where you are constrained by chained data dependencies (i.e. you can usually read buffer N after sending commands for N+1 so at least the pipeline doesn't stall out).

In practice for games I'm sure you're right, but we're talking about emulating platforms that have, for instance, the capability to read back framebuffer. Determining when the emulated system really needs the data is problematic.

There are Tegra extensions to render directly to a mappable buffer, so apart from some page-table munging you get copy-free access to rendering results, and I seem to recall they support async readbacks via one of the ARB extensions though I'd have to research that a bit more.

I don't doubt that Tegra is better at this, and nVidia tends to be more proactive regarding extensions. I realize that this thread is about Tegra, but the notion of developing Tegra specific software for some platforms and some other solution for others isn't exactly appealing, especially when I don't expect Tegra to dominate. For game vendors, maybe, for emulator authors less likely.

The biggest win is that with GLES2.0 you can usually write shaders that handle everything in the pipeline once the initial data has been sent, freeing the CPU to run game logic etc.

OGL ES 2 shaders just don't cover everything when emulating these platforms.
 
Engines generally handle translucency completely differently to opaque data..
If by completely differently you mean translucen data coming after opaque, sure. From the POV of draw logic, everything undergoes the same draw sorting by (usually) multiple keys - normally a combination of state, spatial, etc characteristics.

..alpha test continues to also be used extensively (for example in foliage simulations) to avoid unnecessary sorting.
Unfortunately alpha test does not address translucency. Alpha test is a scissor operation by its nature, not a blending one. The foilage example is not a good one as foilage is normally opaque in its visible pixels. The one translucent-heavy scenario where depth sorting is omitted is particles. In most other cases loose sorting of translucent draws is unavoidable.

Translucency itself is also often handled without Z updates in order to avoid the need to sort, although some effect might be implemented with a loose sort.
The majority of the translucent scenes/assets coming from artists to me here would go to hell if I they were not depth-sorted.

The fact remains that modern applications still tend to sort by state not depth, we still see in the data that the drivers receive from those applications, I don't see that there is any debate over this.
I'm not disputing the fact *some* modern applications may get way sorting only by state. That does not mean others don't do other things. From my POV, I don't see the point of debate regarding spatial sorting - it's a necessary operation that addresses fundamental problems of current reasterizers. Conversely, the sort-by-state problem can (and often is, from my background) addressed by deliberate state-multiplicity control at stages of the pipeline much earlier than drawing.
 
I doubt they have sufficient design expertise for semiconductors, let alone IP. Apple doesn't even do much of the IP.
 
Since Apple's in-house designed the A4, is there a slight chance Nintendo could also develop their own chip with only a manufacturer to print them? (in other words, no ATI/AMD/nVidia involvement)

Putting aside your reference to Apple, yes, Nintendo could license intellectual property (IP) from various sources, and use those building blocks to assemble a suitable device, either themselves or outsourcing the task. Their volumes are certainly large enough to make this feasible, and their coffers more than deep enough.
However, using something like an OMAP, Snapdragon, or indeed a Tegra could conceivably save them for instance time to market. This is not a given though, particularly if they want backwards compatibility, in which case they have to spend time doing an really solid and performant emulator without corner cases or just as likely simply drag along the old hardware along with the new (a la early PS3).

Nintendo has said that the new device would be backwards compatible. One way of achieving that would be to extend their current architecture rather than emulate or duplicate their old hardware. This wouldn't earn them any architectural beauty pageant points, but it would be a rational option.
 
Last edited by a moderator:
Since Apple's in-house designed the A4, is there a slight chance Nintendo could also develop their own chip with only a manufacturer to print them? (in other words, no ATI/AMD/nVidia involvement)

We don't actually know who did the 3D in DS, but it doesn't even moderately resemble any existing design from AMD, nVidia, IMG, etc so it's pretty moot. I believe ARM themselves implemented the GBA's 2D (probably under very specific technical specifications from Nintendo) so it's not impossible that they had similar involvement in implementing DS's 3D, which extends/duplicates the 2D portions and implements 3D in a manner that shares a lot in common with traditional 2D sprite rasterizers.

So doing handheld 3D that doesn't resemble a big licensed design is certainly not without precedent for Nintendo.

Entropy said:
Nintendo has said that the new device would be backwards compatible. One way of achieving that would be to extend their current architecture rather than emulate or duplicate their old hardware. This wouldn't earn them any architectural beauty pageant points, but it would be a rational option.

I talked a lot about this earlier in the thread; this is certainly the Nintendo approach to things (applies to both DS and Wii, and to a lesser extent GBA). The belief is that the DS die itself is not very big and some portions such as RAM blocks could be easily shared in another design, so Nintendo might not have a problem bolting it on the side.

The way I see it, with Nintendo advertising the glasses-free 3D display and not making any mention of high performance graphics it would seem that they're not very interested in that again. And if they really don't care about high performance graphics again then I doubt they'd want to go for an nVidia license.
 
I can see where SGX would cause you to jump through a few extra hoops (I'm very familiar with chunkers)

I'd be surprised if you wouldn't be with that screen-name. Reminds a lot of the Bitboys alike smoke and mirrors of the past.
 
Back
Top