NV got DS2 contract according to BSN

If you have plenty of computing power available and only free time to spend you write an interpreter, if you get paid to create a BC layer for a new architecture you write a binary translator. Two different things.
Either of these address mainly the CPU emulation, though. I believe what Exophase was getting at (hey, Exophase) was that the overall ds architecture is fairly complex. Which I can only agree with - it's an asymmetric, highly exotic piece of pocket tech. The GPU is particularly exotic.
 
If you have plenty of computing power available and only free time to spend you write an interpreter, if you get paid to create a BC layer for a new architecture you write a binary translator. Two different things.

So you think paid people are the only ones capable of writing a recompiler for DS? Interesting.

"Binary translator" is not magic. See the performance deltas in Mupen64plus for Cortex-A8, which is employing one of the best dynarecs out right now.

Nintendo may have lots of money, but they can't hire passion. So far if I were to compare Nintendo's emulation showing (Virtual Console, Zelda bonus disk, lack of software emulation for one-off generations) vs the best emulator authors I know of I would say they're not sitting on top of the mountain. There's a reason why Sega has a history of poaching the best independent emulator authors to do emulators for them. This doesn't appear to be something Nintendo has picked up yet, or at least I see no solid evidence of it.

For that matter, if you compare the officially available DS emulator, Ensata, to what enthusiasts - often single people such as Martin Korth - have done then Nintendo's money is way behind. From what I hear Ensata is pretty much useless and loses in every way to the big players in DS emulation that are out right now.

Simply put, you might be surprised at what money alone can't do.
 
Actually, the more i think of the problem the more i start to wonder if nintendo would not just go with a (partial) silicon carry-over this time. Do we have any idea what the ds GPU die area is? And that's sans the multitude of memory banks, as those can actually be 'borrowed' from non-BC silicon.
 
What I'm curious about is what the backwards compatibility situation will be like if Nintendo indeed uses Tegra 2, or really anything "Gamecube level" like the rumors have specified.

Nintendo has already announced that 3DS will be backwards compatible with DS. Until now Nintendo has only implemented previous-generation backwards compatibility with compatible hardware, and with the current generation (DS and Wii) Nintendo has done this by augmenting the previous-generation hardware (GBA and Gamecube, respectively).

If Nintendo chooses to augment existing hardware again then it won't be using Tegra 2, or any third party IP for that matter. I also think that the basic DS design has some limitations that will make it difficult to scale significantly. ie, I don't think that it will be bumped from 2048 polygons per frame to over 100,000, which it would have to be to really be Gamecube level. This is also not taking into account the need to render things twice for stereoscopic viewing.

Another alternative is redundant hardware. Nintendo could be including the original DS 2D and 3D hardware, but I doubt this because it'd be all but useless in conjunction with a much higher end GPU. So I think it'd be a waste of die space.

Finally, there's software emulation. It's not that much less likely, but I think Nintendo would be introducing compromises, especially if they tried to emulate it with the 3D hardware of the 3DS. I'd be pretty impressed if they actually pulled off an emulator so soon after Tegra 2 would have been sampled, but it's certainly not impossible. If Nintendo has done this then it'd be a good indication of what other handhelds can do.

The "Gamecube level" rumors came a day before the official announcements, which included several details but not this one. I think its omission is somewhat telling. I wouldn't be especially surprised if 3DS hardware was based on the same core DS design and not really even Gamecube level.
Who claimed that ARM11 could handle DS emulation? I don't know what you mean by "leaving the actual GPU to do something else", but there's way more that has to be done in emulating DS than CPU. A conventional GPU won't have a very easy time emulating either the 2D or 3D of DS without missing features. At the end of the day it could probably be done, but with quite a lot of shader resources that an embedded GPU may not have. Being able to quickly read back the framebuffer into user memory is also a must. Having direct access to the GPU hardware would help, but we don't know if that's something you get, even if you're Nintendo.

DS emulators out for x86 require at least around 1.8GHz Core 2 Duo and even then tend to run with some degree of frameskip a lot of the time. I don't think Nintendo wants frameskip in their backwards compatibility. And a 1.8GHz Core 2 Duo is grossly more powerful than even the Cortex-A9s in a Tegra 2. Granted, none of these emulators are state of the art in terms of efficiency, but are you experienced enough to say exactly what it takes to emulate DS?

DS hardware actually is pretty complex, just not that powerful. And if you try to virtualize ARM code you'll run into different problems.. just "slapping on a more powerful ARM" isn't a good solution - in DS, for instance, the ARM7 runs in a full GBA compatibility mode that makes the rest of the hardware behave exactly like a GBA. The DS memory map in particular is pretty GBA compatible. This is just not something they're going to get with Tegra 2, which I doubt they'd have any leverage to customize.



Agreed completely. Though that's probably no surprise to you lol. Its not as if I disagree with you often
 
So you think paid people are the only ones capable of writing a recompiler for DS? Interesting

......

Simply put, you might be surprised at what money alone can't do.

Ahahah. I'd listen to him guys. He is an expert on this. Moreso than anyone else here.
 
Ahahah. I'd listen to him guys. He is an expert on this. Moreso than anyone else here.

I don't recall anyone claiming to be an expert here, but that's most definitely an interesting post especially the last sentence. Now can we get back on topic here since I'm not particularly fond of that kind of tone?
 
I don't recall anyone claiming to be an expert here, but that's most definitely an interesting post especially the last sentence. Now can we get back on topic here since I'm not particularly fond of that kind of tone?

I don't recall saying anyone claimed to be an expert. I'm just saying I know Exophase is.

There was no tone, my post was not meant to offend anyone. I find your response kind of funny, was it meant to be?
 
Last edited by a moderator:
I don't recall saying anyone claimed to be an expert. I'm just saying I know Exophase is.

There was no tone, my post was not meant to offend anyone. I find your response kind of funny, was it meant to be?
You know Exophase, but you don't know other people so how can you claim "He is an expert on this. Moreso than anyone else here."? And don't forget that no matter how clever one is, one will say stupid things and do mistakes (no I don't mean to claim Exophase did so here, he raised interesting points).

I agree with Ailuros, let's get back on topic.

Regarding nVidia doing a dedicated chip for Nintendo (which could include enough of hardware to support DS games), does anyone know if nVidia ever did such a thing for another company?
 
Btw, I kept thinking this morning of what effort it would take nintendo to carry out BC on the new device, so I'll jump right back. But first a few things i omitted yesterday.

Having direct access to the GPU hardware would help, but we don't know if that's something you get, even if you're Nintendo.
Nintendo would not have to have direct access - it wouldn't be nintendo buying off-the-shelf silicon and then trying to figure out how to implement their target feature set. It would be the silicon vendor's task to make sure the silicon meets nintendo's features bill. From my experience in my end of the embedded markets, It's usually a multi-stage process of the sort:

1. Client gives rough specs to vendor, vendor decides which of their silicon could meet those. Vendor offers to client a list of candidate silicon.
2. Client checks those and gives detailed requirements to vendor, with potential contention points. From there on it's the vendor's task to meet those, be that by pushing their silicon (in drivers/HAL) in ways beyond what they normaly do for the off-the-shelf market, or even customize it if the deal justifies that.

What I'm saying is that it won't be nintendo trying to solve the roadblocks on their own - it will be mainly the silicon vendor. Nintendo will just point out the roadblocks to the vendor.

.. in DS, for instance, the ARM7 runs in a full GBA compatibility mode that makes the rest of the hardware behave exactly like a GBA. The DS memory map in particular is pretty GBA compatible. This is just not something they're going to get with Tegra 2, which I doubt they'd have any leverage to customize.
Well, other than the fact nintendo have not mentioned anything about GBA compatibility, shaping memory maps to look in a certain way from the POV of emulated software is not the crux of the problem. Nintendo could afford this logic to be integrated in a unit of otherwise legitimate non-BC usage, like they did in the Wii - there the Starlet takes care of all memory mappings for BC, among other things.


Ok, let me get back now to the fun part - speculation mode.

Of the contemporary mobile GPU designs, there are a few that do very non-trivial things, like shaders exporting results to main RAM, reading inputs from weird places, etc. I think nintendo could start with some such part, and then work with the vendor to further bring it to ds GPU emulation levels. The curious part is, none of the mobile parts I'm aware of being able of such things comes from nvidia ; )
 
You know Exophase, but you don't know other people so how can you claim "He is an expert on this. Moreso than anyone else here."?

Quite easily lol. Even if you worked for Nintendo he'd still be more of an expert. Can't really emphasize it enough, you're talking about Exo's specialty.

I didn't mean to start an argument by saying he's an expert.

darkblu said:
Well, other than the fact nintendo have not mentioned anything about GBA compatibility,

GBA compatibility (processor wise, not the slot) is required for DS compatibility.
 
Nintendo would not have to have direct access - it wouldn't be nintendo buying off-the-shelf silicon and then trying to figure out how to implement their target feature set. It would be the silicon vendor's task to make sure the silicon meets nintendo's features bill. From my experience in my end of the embedded markets, It's usually a multi-stage process of the sort:

1. Client gives rough specs to vendor, vendor decides which of their silicon could meet those. Vendor offers to client a list of candidate silicon.
2. Client checks those and gives detailed requirements to vendor, with potential contention points. From there on it's the vendor's task to meet those, be that by pushing their silicon (in drivers/HAL) in ways beyond what they normaly do for the off-the-shelf market, or even customize it if the deal justifies that.

What I'm saying is that it won't be nintendo trying to solve the roadblocks on their own - it will be mainly the silicon vendor. Nintendo will just point out the roadblocks to the vendor.

Well, okay - I think my general response to this is that if nVidia is doing custom hardware for Nintendo, which I agree would make the most sense (see PS3 for instance). It's just that the buzz (in this topic, particularly) has been "Nintendo is going to be using Tegra 2 in upcoming DS", when if it's really "nVidia is doing hardware for upcoming DS" it could be anything. It makes much more sense for it to be "Tegra-like" than not, but if it's incorporating much DS functionality in hardware then it could be pretty different. It could also be much lower performance than we'd otherwise expect from something Tegra-like; Nintendo likes their chips cheap.

Well, other than the fact nintendo have not mentioned anything about GBA compatibility, shaping memory maps to look in a certain way from the POV of emulated software is not the crux of the problem. Nintendo could afford this logic to be integrated in a unit of otherwise legitimate non-BC usage, like they did in the Wii - there the Starlet takes care of all memory mappings for BC, among other things.

I was just using GBA compatibility as an example of how Nintendo did things in DS. Changes are the next handheld won't have GBA compatibility, like the DSi doesn't. I think the memory mapping changes between GC and Wii are small enough that this could easily work; I'm thinking of something less 1:1 translation.
 
So you think paid people are the only ones capable of writing a recompiler for DS?
No, I think they are the only ones who would bother.
"Binary translator" is not magic. See the performance deltas in Mupen64plus for Cortex-A8, which is employing one of the best dynarecs out right now.
Quick look ... dynamic recompilation for addition for instance translates into replacing it with this :

Code:
  gencheck_cop1_unusable();
   mov_xreg64_m64rel(RAX, (unsigned long long *)(&reg_cop1_double[dst->f.cf.fs]));
   fld_preg64_qword(RAX);
   mov_xreg64_m64rel(RAX, (unsigned long long *)(&reg_cop1_double[dst->f.cf.ft]));
   fadd_preg64_qword(RAX);
   mov_xreg64_m64rel(RAX, (unsigned long long *)(&reg_cop1_double[dst->f.cf.fd]));
   fstp_preg64_qword(RAX);

It might be some of the best in the console emulator world, but it's still pretty primitive compared to IR translators like good old FX!32.
 
No, I think they are the only ones who would bother.

HA! You should find out who you are talking to lol. He has bothered. You don't need to tell him how recompilation works.

Also, Nintendo has been buddy-buddy with ATI for years.
N64 used a GPU made by a team that went to ATI after leaving SGi.
GCN used ATI
Wii used ATI.

I don't see them going with nVidia.
 
No, I think they are the only ones who would bother.

You're wrong. There is already a Nintendo DS emulator that employs dynamic recompilation, NeonDS, released a few years ago. There will most likely be more (I have perfect confidence in this, in fact..)

Quick look ... dynamic recompilation for addition for instance translates into replacing it with this :

Code:
  gencheck_cop1_unusable();
   mov_xreg64_m64rel(RAX, (unsigned long long *)(&reg_cop1_double[dst->f.cf.fs]));
   fld_preg64_qword(RAX);
   mov_xreg64_m64rel(RAX, (unsigned long long *)(&reg_cop1_double[dst->f.cf.ft]));
   fadd_preg64_qword(RAX);
   mov_xreg64_m64rel(RAX, (unsigned long long *)(&reg_cop1_double[dst->f.cf.fd]));
   fstp_preg64_qword(RAX);
It might be some of the best in the console emulator world, but it's still pretty primitive compared to IR translators like good old FX!32.

Please look at source for the emulator I mentioned, Mupen64plus, in particular its ARM output. The author of the dynarec himself says that Mupen64's original dynarec was very slow. Mupen64plus is anything but primitive. In my experience, recompilers employing intermediate languages suffer in performance vs the more sophisticated ones using annotated analysis because the ILs tend to be a common subset (or worse) between both endpoints, and ends up filtering out a lot of instruction operation that then has to be reconstructed through propagation analysis (or not, taking an even bigger hit)
 
... so apparently these boards don't have an edit option. Or at least I don't have one, or can't find it :<

A couple other things:

You shouldn't judge the quality of N64 recompilation based on how it does double precision floating point addition, an uncommon operation in N64 games.

Comparing FX!32 to a console emulator is pretty apples to oranges because FX!32 is a user mode emulator and a console emulator is system-wide (even for the "HLE" styles of N64 emulation). FX!32 has the benefit of being able to abstract all system hardware and the operating system itself, but at least as important is that the Windows binaries it executes will operate within a safe memory environment that can also be easily abstracted by the OS without having to have equal pointers for dynamically allocated resources. This makes memory emulation much cheaper.
 
That is Mupen64plus. If you mean Ari64's port be more specific ... here is some of his code :
Code:
void emit_add(int rs1,int rs2,int rt)
{
  if(rs1==rt) {
    assem_debug("add %%%s,%%%s\n",regname[rs2],regname[rs1]);
    output_byte(0x01);
    output_modrm(3,rs1,rs2);
  }else if(rs2==rt) {
    assem_debug("add %%%s,%%%s\n",regname[rs1],regname[rs2]);
    output_byte(0x01);
    output_modrm(3,rs2,rs1);
  }else {
    assem_debug("lea (%%%s,%%%s),%%%s\n",regname[rs1],regname[rs2],regname[rt]);
    output_byte(0x8D);
    if(rs1!=EBP) {
      output_modrm(0,4,rt);
      output_sib(0,rs2,rs1);
    }else{
      assert(rs2!=EBP);
      output_modrm(0,4,rt);
      output_sib(0,rs1,rs2);
    }
  }
}
More efficient mapping, but still primitive ... IRs don't have to be extremely high level, but without at least trying to get back to an infinite register set representation optimization is very hard. AFAICS the assembler doesn't really do any optimization in Mupen64plus. As I said though, just a quick look.
 
Last edited by a moderator:
I'd like to know the more efficient transformation of add than an add/lea, save for one eliminated entirely due to liveness analysis. Such optimizations are usually wasted on recompilers that are coming from code generated by a compiler that's anything less than terrible.

There are other propagation optimizations that actually do make sense merely cross-arch, and it's not as if Ari64's recompiler doesn't make use of any (ie, constant propagation). Or is there something else you're looking for here? Is it scheduling? What is it that you allege FX!32 is doing?
 
I'd like to know the more efficient transformation of add than an add/lea
That's not really my point, my point is that there is 1:1 mapping without higher level optimizations.
Or is there something else you're looking for here? Is it scheduling? What is it that you allege FX!32 is doing?
It does some block level transformations for stack manipulation and to try to get rid of needless register spills through memory on the IR, and it does indeed reschedule the assembly (or rather the IR, but not a huge difference).

BTW, the original post I responded to claimed DS emulators require a very fast PC (and thus emulation for BC is not an option). To what extent is that true for NeonDS which you mention uses BT (regardless whether it's primitive or not, it's still at least not an interpreter like the more popular ones).
 
That's not really my point, my point is that there is 1:1 mapping without higher level optimizations.

There is translation that is beyond 1:1 instruction mapping though, for instance register allocation/liveness analysis for register file writeback, constant propagation, 32-bit reduction analysis, etc.

It does some block level transformations for stack manipulation and to try to get rid of needless register spills through memory on the IR, and it does indeed reschedule the assembly (or rather the IR, but not a huge difference).

What you're describing is basically just memory->register allocation, which is only a worthwhile optimization if you have more registers in the target language. This is not the case with N64->x86 (or ARM).

On most modern x86 with decent OOE hand scheduling doesn't really do that much, especially if you avoid heavier instructions that would overburden the decoders (and that are generally not that useful either, unless you're especially fetch limited instead). You might benefit on Atom, but the two-operand addressing and overall lack of registers would make things pretty painful, when your emulated register file is already so much larger than your target one.

BTW, the original post I responded to claimed DS emulators require a very fast PC (and thus emulation for BC is not an option). To what extent is that true for NeonDS which you mention uses BT (regardless whether it's primitive or not, it's still at least not an interpreter like the more popular ones).

I think you're drawing a little more from what I said than what I intended. I said that software emulation could involve compromises and might prove itself difficult, not that it's "not an option" - bear in mind, this is for far more reasons than just CPU emulation. sfried responded by saying that it is well claimed that ARM11 class CPUs can emulate DS, and I disputed this. He used the fact that available DS emulators can run full speed on PCs as evidence to this, and I said that it's irrelevant since these emulators require much higher spec computers. I didn't say DS emulation requires a very fast PC.

Unfortunately I don't know how NeonDS ran, and it's a bit of a loose comparison since it used inaccurate hardware accelerated 3D and didn't emulate sound, and I doubt the compatibility was that amazing. I just brought it up to show that enthusiasts are in fact willing to program a DS emulator with dynamic recompilation, and I know of two more that are in development.
 
Back
Top