Normal Mapping Wii demo

DRS · Dec 30, 2009

volmer said:
What a nice project!

I imagine you want to take a stab at this yourself, but if you need any pointers I'll be happy to lend a hand or a snippet of code.

Hi Volmer, I couldn't download the link you posted somehow (it says it's 15KB file, but does nothing when I open it). But I'd be happy to receive some of your snippets since the only other source I have on this is YAGD. Besides, I have experienced so many issues with the hardware and my own bugs so it would be nice if for once I don't have to spend weeks to get such stuff to work. I have sent you a private message with my mailaddress.

Normal maps indeed don't compress well. I have read some articles on Doom3. Their 'simple' solution for maintaining a low mem footprint was using low res normal and gloss maps but even then they required 64MB of vid mem. Makes me kind of wonder how they fitted it in XBOX version of doom.

BTW At this point I'm experiencing some dotproduct problems with alias shadow optimizations. Some backfacing (to the light) triangles do get through the checks. Resulting in annoying gaps in the shadow since they draw inversed volume walls (try to debug a 400 polygon model, not funny at all). So if I can't find the cause quicky enough I'll just release a version which allows for disabling alias shadows.

DRS · Jan 12, 2010

Mmm, it turned out I had to restart IE... I have the file now.

Ok, here is my BETA release. It is still full of issues, but it works. Here is what you need to run it:
* WiiBrew Quake GX.
* Water vissed maps.

Then download http://www.sdengineering.nl/quake/boot.dol and replace the file in Quake GX folder.

Issue list:
* framerate: 10-30fps, depending on amount of overdraw and complexity. For example, E1M6 has lots of overdraw, which results in too many non visible lights being rendered. I'm currently working on implementing portals to determine visibility. According to some old quake articles by Michael Abrash the amount of overdraw may vary between 50 and 150% (or 1.5 to 2.5 times). So being able to cull out those surfaces may help quite a bit actually

* black flickery noise: it seems that when I use the HW to clear the framebuffer after a stencil to texture export this seems to happen. Perhaps a timing issue or something. Anyway it doesn't happen if I clear the framebuffer by drawing a rect on top of it. However, using the hardware is about 10% faster, so that's the way it is done. Hopefully I can find a fix for that sometime
* minor z-fighting issues: it seems that Zbuffering isn't that precise. This sucks when doing volumes. Most of it could be solved by just projecting volume frontcaps a pixel behind the casting surface. However, in some cases you may still see it.
* visible seams in shadows: not sure why, because in my opinion all edges overlap. But perhaps that's the issue
* memory starvation: I'm currently on the edge of memory usage. Mostly because I precalc all static volumes. And since there are many lights, there are many static volumes. The problem is that Quake seems to load models into cache without freeing enough memory to fit em. This result in a HANG. Haven't looked into the mechanics yet.
* A bit too tight culling: some surfaces may be culled from the light while they shouldn't. If you can get through E1M6 you'll see it happen
* Water not completely okay
* Shadowing is done using frontfaces only. Backfaces is about 20% faster, but sometimes results in holes.
* When there isn't enough stencil textures available you may see some shadow popping
* Issues with brush models (i.e. doors and ammoboxes) lighting. It seems to be viewposition based, but weird thing is that I don't use player position at all when it comes to lighting (lights are in world space, and are transformed back to model space. So when the model doesn't move this transformation is constant and therefore should affect lighting this way)
* smoothshading sometimes doesn't look that smooth at all (perhaps not all normals, tangents and bitangents are calculated properly)
* [edit] no dynamic lighting yet.
* [edit] only lights with radius bigger than 200 are being processed. These have shadows enabled. However, it results in some rooms being dark because quake uses lights with smaller radi in those rooms

Some new shots:

Note: below is taken from custom map The Occursed by Elek.

Have fun, let me know if you find any issues other than listed above!

swaaye · Jan 12, 2010

Nice work. Your efforts with the Wii hardware have been very interesting and impressive.

I'm interested in trying this but I'd really like to see more framerate so I'll probably hold off until you tweak it some more.

DeadlyNinja · Jan 13, 2010

This works on dolphin emulator, correct? I wonder if mouse controls are possible.

DRS · Jan 14, 2010

DeadlyNinja said:
This works on dolphin emulator, correct? I wonder if mouse controls are possible.

Didn't succeed in that. I'm not sure if the latest version of Dolphin supports SD card emulation. If so, it might work. In my case Dolphin doesn't go beyond "loading DOL". If you do get it to work in Dolphin, let me know how you did it.

BTW: if I'm not mistaken you asked for triangle performance figures a while ago right? Well it's not impressive: about 5-6M verts/s when generating 8 texcoords per vertex and about 35M-40M when using per vertex colors or a single texcoord. In all cases, you are looking at about 24 pixels/triangle. A smaller number of pixels results in TEV waiting for T&L, a larger number results in T&L waiting for TEV to finish.

DeadlyNinja · Jan 14, 2010

How does it compare to the GC? I'm not a very tech savvy person so I don't get the polygon count this way. I'm not sure what's 24 pixel/triangle means. Nintendo puts the GC at 9-12 million with everything turned on. That's a little easier for a fool like me to understand.

DRS · Jan 15, 2010

DeadlyNinja said:
How does it compare to the GC? I'm not a very tech savvy person so I don't get the polygon count this way. I'm not sure what's 24 pixel/triangle means. Nintendo puts the GC at 9-12 million with everything turned on. That's a little easier for a fool like me to understand.

The 24 pixels/triangle is a crappy idealistic calculation: the Wii has 960MPix fillrate, so if we draw 40M triangles, each triangle can have 960/40 = 24 pixels. In real world, you can't draw 40M triangles using only 40M vertices since a triangle consists of 3 vertices. However, you could draw several thousands of triangle strips so most vertices can be reused. Also, the rasteriser produces 2x2 pixel blocks per 'cycle' and at the triangle edges it is likely that a 2x2 block isn't completely covered, thus resulting in an actual limit of less than 24 pixels.

When generating 8 texture coordinates the vertex rate decreases. However, we need to use those 8 coords in TEV. This would require at least 8 TEV stages, thus reducing the fillrate to 960/8=120MPix. With a maximum of about 5M triangles, this calculates to 24 pixels/triangle as too.

About the triangle performance: I based it on some documention and some measurements I did (though I'm not sure if I can thrust the numbers libogc reports; they have a limited implementation). I guess Nintendo was refering to real world usage with their figures. Back in 2000 games didn't have much effects going on and I think 12M would be a good estimate if you use like 4 texture coordinates and some fixed light attenuation functions.

swaaye · Jan 15, 2010

Rogue Squadron 2 definitely has that "DirectX 7 T&L" look to it with simplistic lighting. That's the Cube fan's favorite poly count example.

I also remember it being said that some types of lighting had to be done on the CPU (Luigi's Mansion did this I believe). The IBM guys talked about that at a talk I went to at college back then.

Care to compare and contrast this thing to a DX8 style shader 1.1 GPU?

Specifically on my mind are games like Far Cry Vengeance and CoD WaW looking really basic whereas on such PC hardware you could run with most effects especially with FarCry being built for DX8 essentially. Even The Conduit, a graphically hyped game, reminds me of Halo 1 more than anything really modern.

DRS · Jan 17, 2010

swaaye said:
Rogue Squadron 2 definitely has that "DirectX 7 T&L" look to it with simplistic lighting. That's the Cube fan's favorite poly count example.

According to their article on Gamasutra they use vertex lighting to paint the polys and embossmapping to detail it. That's only 2 texture coords... And another to do selfshadowing.

Care to compare and contrast this thing to a DX8 style shader 1.1 GPU?

Not really since I'm not a DX programmer, so I'm not aware of sm1.1's restrictions (if any). But sm1.1 is more flexible; you can perform texture lookups using color register's as texture coordinates. On Wii you can only lookup textures using texture coordinate registers. The latter wouldn't be much of a problem if you would be able to store modified texture coordinates to corresponding registers but you can't (though, you can reuse the result of a previous texcoord calculation in your shader). This isn't a problem for stuff such as normal mapping. It becomes a problem for, for example, parallax mapping. For parallax mapping you could perform like 8 heightmap samples, but you will never be able to store the texture coordinate that represents the sample with greatest height. So, to do such a thing on Wii, you must sample another texture to convert the picked coordinate to color data. Since you can't reuse this color data for texture lookup you must write it to the framebuffer (creating a scene graph with displacement data). Then in a second pass export it to a texture and use it to displace the texels. In shader model 1.1 this probably would be easier from a programming point of view.

stiftl · Feb 19, 2010

DRS said:
Mmm, it turned out I had to restart IE... I have the file now.

Ok, here is my BETA release. It is still full of issues, but it works. Here is what you need to run it:
* WiiBrew Quake GX.
* Water vissed maps.

Then download http://www.sdengineering.nl/quake/boot.dol and replace the file in Quake GX folder.

Thank you very much, your affords are impressive.

Could you please tell me where I can get the water vissed maps? Thx.

Nightz · Mar 2, 2010

Can the engine handle more complex models?

DRS · Mar 12, 2010

Have been a bit busy lately.
Water vissed maps is a patch that modifies your original pak files. I'm sure you'll find it using Google.
You can find a lot "registered" Quake maps as well.

Anyway, you can download quakeGXSDshareware.zip. It's the executable for Homebrew channel with watervissed shareware maps (PAK0). I haven't tested it, but you can directly extract the zip to your SD card and it should be recognized by Homebrew Channel. This also is a slightly improved version (I got rid of the black noise, though there still is some z fighting artifacts). Haven't finished the portal stuff yet either, so no framerate improvements.

The engine is default quake. It handles character models up to 2000 verts and several thousands of environment brushes. The Wii can be pushed a little more too when it comes to vertices but obviously the load on T&L and pixelpipelines must be kept in balance, otherwise the framerate goes down. I got that info from playing around with drawing quads as 4 sided trianglefans (5 vertices instead of 4) and keeping T-junctions (to reduce artifacts with fast normalmapping). There was no degrade in framerate.

Flux · Apr 17, 2010

DRS said:
Haven't measured yet, but it takes 2400 frames to rotate a cube and with vsync turned off it takes 5-6 seconds for a complete rotation.

So, we're looking at at least 400 frames a sec. I think the cubes fill about half the screen, so full screen we would be looking at 200 frames a sec. Since SD resolution is only 1/3 Mpix this means the fillrate is reduced to about 67Mpix/sec. Now, the gamecube has about 640Mpix fillrate using a single tev stage. I suppose that includes a texture read as well. So mapping that to the 67Mpix, it means it takes about 10 gpu cycles to render the pixel. This number includes screen blanking, vertex operations and CPU overhead.

To add a little about the hardware: it is able to perform a texture lookup, which results are 'blended' with a specified second texture coordinate using the indirect unit (3x2 dot product and scaling), then lookup a texture using those 'blended' coordinates and then pass that to the recirculating shader unit. From a logic point of view this would most likely take 3GPU cycles. But since the Wii has 960Mpix fill and 960Mpix texel rate (see above) I figure that the GPU might perform a texture lookup and shader operation in the same cycle (which means an indirect stage would take only 2 cycles to perform).

So based on those assumptions and figures, it takes 2 or 3 cycles to perform diffuse mapping and an additional 1 or 2 cycles to blend in a material texture. The demo also uses reflection and fade-out, so it takes either 7 or 10 cycles to perform.

Besides that, don't forget that a game such as Quake ran well on a 50Mpix Voodoo board, and also ran well in full software rendering on a pentium 90 and also had dynamic lighting. The Wii is FAR more capable than that. And Quake looks quite nice using normal mapping, there is a PC version going around=)

There is a ARM926EJ-S on hollywood.I mean could you use the arm9 for anything? On arm's website the inclusion of a vector floating point unit is optional.If there is a vfp10 included running at around the same clock as the gpu, which by their website's documents says supports, how would it improve anything as far as picture quality? At 1.3Mflops/per MHz thats an estimate of 315MFlops. Is it useless?

There are several articles that have hackers compile and run code on the "starlet" core on the gpu like this one...

http://www.wehackwii.com/2008/05/reading-data-on-wii-and-usb-mass.html

Is there a way to use this dsp to enhance hollywood's performance in anyway?

DRS · Apr 23, 2010

I doubt it. Havent really read anything about that unit, but the GX API doesn't allow TEV to read or write to the DSP for sure. It's all separate cores. Also, the DSP's RAM might not be direct accessible by the CPU meaning you get a single (or a few) I/O address to read/write data to the thing. In the best case it would have FIFO's to read or write data. In the end you will also have to cope with synchronisation issues (how to synchronize the program running in DSP with the main thread).

So this thing should probably be just used to decompress and/or mix several streams of audio data and output to the speakers. Still, some wizards may find other uses for it. If it would be fast enough it could be used for decompressing vertex data along with audio for example. I don't think it's running 240Mhz though, the number 80Mhz does pop up though.

Flux · Apr 23, 2010

DRS said:
I doubt it. Havent really read anything about that unit, but the GX API doesn't allow TEV to read or write to the DSP for sure. It's all separate cores. Also, the DSP's RAM might not be direct accessible by the CPU meaning you get a single (or a few) I/O address to read/write data to the thing. In the best case it would have FIFO's to read or write data. In the end you will also have to cope with synchronisation issues (how to synchronize the program running in DSP with the main thread).

So this thing should probably be just used to decompress and/or mix several streams of audio data and output to the speakers. Still, some wizards may find other uses for it. If it would be fast enough it could be used for decompressing vertex data along with audio for example. I don't think it's running 240Mhz though, the number 80Mhz does pop up though.

80Mhz were did you get that from?

maybe you can use it to decompress textures or stream pre-optimized low-res textures from DVD-ROM.

swaaye · Apr 24, 2010

I have a feeling that it isn't just sitting around idle..

The Hollywood also contains an ARM926 core, which has been unofficially nicknamed the Starlet[3]. This embedded microprocessor performs many of the I/O functions, including controlling the wireless functionality, USB, the disc drive, and other miscellaneous functions. It also acts as the security controller of the system, performing encryption and authentication functions. The Hollywood includes hardware implementations of AES and SHA-1, to speed up these functions. Communication with the main CPU is accomplished via an IPC mechanism. The Starlet performs the WiiConnect24 functions while the Wii console is in standby mode[3].

Flux · Apr 24, 2010

swaaye said:
I have a feeling that it isn't just sitting around idle..

Can you put the arm core to use in a game program?

How did you confirm the clock of the arm9?

DRS · Apr 25, 2010

I got that from some gamecube doc. But that is the DSP, not the ARM. And it is GC, not Wii. So perhaps the Wii DSP is clocked 1.5 times faster too.

I have seen some homebrew articles on the ARM stuff. Mostly about hacking it. I don't expect that Nintendo intends programmers to mess around and upload code in there (especially since it also does security). If we could though, we would indeed be able to do the stuff you mentioned, Flux.

Flux · May 1, 2010

So they might tell you to not bother using the dsp for original code?

Liquifiedpizzas · Jan 28, 2023

Does anyone know or would anyone be able to contact DRS for some help? I am trying to continue where they left off, and I've managed to implement normal mapping the way they describe in a devkitpro post: https://devkitpro.org/viewtopic.php?t=1564 but this is not the full per-pixel lighting implementation, as it still relies somewhat on per-vertex lighting. I may try and reverse engineer DRS's quake demo using Dolphin's FIFO viewer, but the TEV stages come out a bit out of order and hard to understand. I would love to have their help or be able to contact them at all, as I'm considering starting a long term project of porting DOOM 3 to Wii, and DRS's demo here has convinced me that it is possible: Doom 3 throws around far less lights and shadows than DRS's demo, even if it has much higher poly geometry. So if anyone knows how to get in touch with them, that would be very appreciated. They also never published the source code for the demo as far as I can tell which is a bit odd, but yeah.

Normal Mapping Wii demo

DRS

DRS

swaaye

Entirely Suboptimal

DeadlyNinja

DRS

DeadlyNinja

DRS

swaaye

Entirely Suboptimal

DRS

stiftl

Nightz

DRS

Flux

DRS

Flux

swaaye

Entirely Suboptimal

Flux

DRS

Flux

Liquifiedpizzas

Similar threads