Modern GBA port of Original 3D Tomb Raider

milk

Like Verified
Veteran
I know a full new thread about one recent homebrew "tech demo" for the niche scene of contemporary GBA development might feel a bit uncalled for, but this one seems like such an exceptional and impressive instance, that I though it may be new-thread worthy enough to bring it to the attention of some retro-programing enthusiasts and spark some potentially interesting conversation.

So, some wizard managed to port the original sega saturn/psone/pc tomb raider to the GBA. Not a demake, but an actual port running the original game (or rather: the open source fan-made reverse-engineered replication of the game) on the old Game Boy Advance. That means, running the code and rendering the 3D polygon-based environment and characters of the game all through software on the relatively weak (for the given task) processor of the GBA. A system with no 3D rendering acceleration, meant to run SNES/Mega Drive level 2D games, which had a graphic mode to run fully software-driven graphics mostly as a bonus feature for edge cases. Oh, GBA also lacks robust sound hardware, so most sound has to be run in software too.

The port is still using the original assets, and it does run at playable speeds. Talk about a case-study of extreme software optimization.

 
Well if you think about 1996 PC, running the game in DOS with plain framebuffer access, the GBA hardware doesn't seem that bad...
 
Eh, I'm not so sure. Comparing to the GBA hardware, PCs in 1996 were highly likely to have floating point processors, a larger 32-bit datapath to a bigger and faster main memory pool, helped by bigger and faster L1 and potentially L2 caching tiers, and SVGA cards with basic VESA offload abilities backed by (again, compared to the GBA) a decent-sized framebuffer.
 
Last edited:
Eh, I'm not so sure. Comparing to the GBA hardware, PCs in 1996 were highly likely to have floating point processors, a larger 32-bit datapath to a bigger and faster main memory pool, helped by bigger and faster L1 and potentially L2 caching tiers, and SVGA cards with basic VESA offload abilities backed by (again, compared to the GBA) a decent-sized framebuffer.

i486sx CPUs that were lacking FPU capabilities were still fairly common. The Cyrix 6x86 was also relatively common around that time with abysmal FP performance but pretty good integer performance. AMD's K5 and K6 CPUs also weren't really competitive with Intel on the floating point front, although they were at least better than the Cyrix 6x86. Most of the consumer market was made up of these CPUs. Pentiums were quite rare due to their high price and i486dx CPUs were less common in consumer households than their sx counterparts.

So, I wouldn't necessarily say "highly likely". For PC gamers it was certainly more common to find i486dx (higher budget gamers) or Pentiums (high end enthusiast machines). But at least for the LAN gaming group (roughly around 100 people) that I was a part of, probably half or a bit more of them were running a CPU that was either lacking FPU (i486sx) or with pretty bad FP performance (Cyrix and AMD CPUs).

Lack of FP performance wasn't all that much of a hindrance in most PC games of the time. Exceptions being Flight Simulators and a few other games.

Regards,
SB
 
i486sx CPUs that were lacking FPU capabilities were still fairly common.
Remember, they were missing hardware floating point, however even the 386 had floating point emulation in hardware. Granted. it certainly wasn't fast, however it was still substantially faster than the literally-no-floating-point-at-all GBA. Even if you want to hand-wave off the FPU emulation, the 286 and above supported integer division, also absent from the GBA CPU. Despite being hardware emulated, the floating point and integer division functions in x86 would radically accelerate the OpenLara platform over the timeline-equivalent GBA hardware.

You also missed the part where literally everything else surrounding the CPU was still faster on the PC platform, too ;)
 
was the ngage more powerful than the gba ?
nGage had a 104MHz ARM CPU while GBA had a 16.7MHz ARM CPU. I don't know enough about the various ARM variants from back then, and they use really strange model numbers (I think some with higher model numbers are older or worse), but I have a hard time believing that the GBA's CPU can make up speed in a model or two's instruction sets to match the clock speed of nGage. I also doubt Nintendo would spring for the newest processor in the stack, and given that nGage was released later, and used a fairly performant cell phone/PDA CPU, I would guess it's faster. nGage would have to run the OS of the phone, though, and I don't think it has any specialized graphics hardware.
 
Remember, they were missing hardware floating point, however even the 386 had floating point emulation in hardware. Granted. it certainly wasn't fast, however it was still substantially faster than the literally-no-floating-point-at-all GBA. Even if you want to hand-wave off the FPU emulation, the 286 and above supported integer division, also absent from the GBA CPU. Despite being hardware emulated, the floating point and integer division functions in x86 would radically accelerate the OpenLara platform over the timeline-equivalent GBA hardware.

You also missed the part where literally everything else surrounding the CPU was still faster on the PC platform, too ;)

What I find most impressive is running it with the little memory GBA has. I have a feeling that CPU processing power difference isn't _that_ dramatic, and e.g. integer multiply in terms of cycle counts might actually be faster in wall clock time on GBA than on 486...
 
Oh we are in complete agreement on how impressive this is on the GBA hardware, no questions whatsoever. My statements about x86 being bigger, faster and "better" than GBA in this same timeframe was intended as a testament to how insanely good this work is to get OpenLara even working on this platform, let alone at playable framerate.

I'm curious if we can find any cyclecount details on that old GBA processor regarding specific ops, such as the integer multiply. Got any good sources to look at? I'm not knowledgeable at all in that space...
 
Thank you! Looks like a signed 8-bit integer MUL could complete in as little as two cycles (one for the fetch, one for the MUL) which is RADICALLY faster than I expected to find. Looks ilke I have some wakling back to do!

Comparison: this is whoppingly faster than the 386 without a math coprocessor, which needed a minimum of 9 clocks <!> to do the same. Source: 80386 Programmer's Reference Manual -- Opcode MUL (mit.edu) The 486SX line is actually slower than the 386 line, needing a minimum of 14 <!!> cycles. Source here: http://www.penguin.cz/~literakl/intel/m.html However, there are faster ways to get this done... Fast multiplication methods by bit-shifting can carve off a LOT of this time, to the tune of half or less of the "native" speed. Source: http://users.rowan.edu/~tang/courses/ref/Assembly/Lecture 7.ppt Examples in there showing a 486 doing the same integer math op in as little as six cycles.

The math coprocessors for the 386 and 486 were mostly focused on floating point acceleration, although the 486DX / i487 FPU did squeeze two more cycles out of the unsigned MUL.

TL;DR version: The ARMv4 proc in the GBA was pretty damned fast at multiplying compared to the x86 stuff at the time. With the GBA running at 16MHz and a 486 running at 33MHz, the GBA would still be faster at math. Not until the double-clcoked 486/66MHz and later lines would the Intel processors finally be able to outperform (in certain cases) the ARMv4 on Integer MULs.

I hereby retract my statements and concede to the power of the ARM :D :D
 
Back
Top