I know your more of a r&d type of guy and that these test don't really display anything (meant as raw benchmarks) and have released code/tools to to help with dc.will you ever show off the fruit of your of labor maybe in a tech demo of any sort. Always been curious if you would ever construct a visual representation of what you have achieved.
I'm not doing these measurements for nothing, I'm planning to release stuff. My end goal is to make a game (a platformer, similar to the no-FLUDD stages from Mario Sunshine), and most of what I'm doing is working towards that.
At this moment, I'm working on three things for the Dreamcast: a new PVR driver, a C interpreter, and a port of Micropolis (SimCity Classic).
The new PVR driver has several improvements over the KOS driver:
- Better performance metrics. KOS lists a couple stats (in integer milliseconds, so the precision is low), my driver gives a list of events in nanoseconds (relative to last page flip) so it's possible to see exactly when and how long things like renders, DMA, vertex submission, etc happened.
- Basic draw commands (lines, rectangles, monospace text)
- Basic model rendering, with a converter based on the Open Asset Importer.
- KOS's render-to-texture only supports framebuffer sized textures, my driver supports any size. It also has better performance with back-to-back R2T (for things like blurs). The performance increase is from avoiding a stall caused by double buffering the scene data.
If you want to do something like bloom, you need multiple R2Ts (create the texture containing the bright areas, a couple blur passes, then combine with the regular framebuffer). Normally, the PVR has two buffers to store scene data, but with back-to-back R2T you can get into a situation where the PVR is rendering the first buffer, the second buffer is full with the next render, but the CPU is ready to send a third scene (or more). In this case, the CPU would have to stall until a buffer frees up. My driver allows you to combine multiple renders in a single buffer, as well as have more than two buffers. So instead of each render consuming its own buffer, you could combine all renders for a single frame into one buffer, or you could pregenerate individual scene buffers for the blur passes and reuse them between frames.
- List redirection and tile multipass. The PVR has reads separate lists for modifier volumes for opaque and transparent polygons; normally if you want the same volumes to affect both, you have to submit the polygons twice. The driver allows the list for one set to point to the other.
When the PVR renders a tile, it normally draws all the opaque and alpha-tested polygons, then the transparent polygons, and writes it out to the framebuffer. Tile multipass lets you render multiple lists of opaque and transparent polygons, while retaining the existing color and depth buffers, before writing to the framebuffer. One thing this allows is mixing order-independent transparency and presorted transparency, to reduce the overhead of having many levels of per-pixel sorting.
Probably the most useful thing it can do is mid-render depth clearing by drawing transparent opaque polygons set to always pass the depth test. Yes, transparent opaque; opaque on the PVR means “draw only the visible pixel”, blending still works and will blend onto whatever was already in the internal color buffer. (Try to guess what happens if you use transparency in the first opaque pass.) The obvious uses for clearing the depth buffer mid-frame would be to prevent the player’s gun from clipping into the map in an FPS, or avoid depth precision issues with a flight simulator’s landscape and the cockpit.
The PVR’s write-opaque-pixels-once approach to rendering also allows for very efficient OGL accumulation buffer/3dfx t-buffer effects, to implement antialiasing, motion blur, and depth-of-field. For a four pass scene, draw the first opaque pass at 25% brightness, then draw additional 25% brightness opaque passes with additive blending turned on. (Alternatively, you can draw the first pass normally, then draw additional passes with regular lerp alpha blending, reducing the alpha each pass.) There is zero additional cost for combining the passes this way, but correctly combining actual transparent polygons is tricky/impossible depending the method used and blend mode. You could also do anaglyphs with this.
List redirection and tile multipass can be combined to allow single submit dual textured polygons. The PVR has a single TMU and does not have a multitexture vertex format. It is possible to specify a pair of textures and UVs/color with “affected by modifer volume” polygons, but only one set is drawn, depending on whether the pixel is inside a modifier volume or not. By settings up two passes and using list redirection, it’s possible to get the modifier mode vertex format to work as a dual texture format. The CPU only has to write the data once, and the scene buffer only stores one copy of the vertex positions. The first pass doesn’t have any modifier volumes, so everything gets drawn once with the outside-volume settings. The second pass opaque list points to the same data from the first pass, with a full screen modifier polygon so everything gets redrawn with inside-volume settings. Set the depth compare to be >=, so that things get drawn a second time. To mix in single texture polygons, set the depth compare to ==, and you won’t waste fillrate on the second pass.
I think there are some other things you can do with tile multipass and modifier volumes, but I haven’t tested that and don’t know for sure if the PVR works in a way that makes it possible.
- Better render state management. With KOS, it's impossible to change global settings like fog color or palettes without potentially causing rendering glitches. My driver buffers changes and applies them in between renders. KOS’s functions for handling per strip settings are very inefficient. You give the KOS driver one giant struct with all per strip settings and have a function go through the it and pack it into the PVR’s format. One minor change requires reprocessing and packing everything else too. My driver gives you a bunch of functions to manipulate the PVR command directly. They’re designed as inline functions so that the compiler can optimize and combine changes.
- OpenGL style texture management, which allows the driver to do video memory defragmentation (KOS just works with raw pointers and the allocator can't defragment). Setting the texture with this is much faster than KOS’s giant function; depending on cache misses, it should only take 15-40 cycles. This texture management isn't mandatory, you can still do KOS style raw pointers.
- More flexible framebuffer support. It has support for 24/32 bit color framebuffers. You can render to a window of a frame buffer, to do pixel scissoring. Support for dynamic resolutions by adjusting supersampling and vertical scaling per frame. I plan to add triple/quadruple buffering (with the way the video display hardware reads the framebuffer, you are basically forced to allocate framebuffers in pairs; R2T does not have this limitation). I want to see if I can get strip buffers working, which renders the screen in sync with the video out to avoid require storing the entire frame buffer in memory, similar to the Nintendo DS or certain sprite based systems.
There's several other miscellaneous changes, but those are probably the most interesting ones.
The driver won't really affect rendering performance (render-to-texture and dual texture mode aside) as that depends on T&L code quality. The basic rendering code included should be better than most of what's out there, though.
When I release the driver, I plan to have examples, but I’m not sure about any demoscene-style demo. Maybe?
The C interpreter is for scripting and faster development. It's to replace a Forth interpreter I had created. Being able to try out code without having to shut down the game, recompile, and reload it is great. There are some existing C interpreters, but none of them did everything I wanted: compile to position independent bytecode, good preprocessor support (I want protothreads to work), and low memory usage (with no risk of fragmentation). I’m basing the bytecode off the Forth interpreter, with some changes to handle the different coding style better. The bytecode can easily be embedded into level data. Since the script sources are C, there’s also the option to give the finished code to GCC and bake it into the executable for greater speed. One feature I want to add is to be able to run the interpreter on the host, feed it all the headers for the game engine, and have it automatically generate the code required to hook everything up (structs, #defines, global vars., function defs.) into the in-game interpreter.
I'm also working on a port of Micropolis, the open source version of the original SimCity, to the Dreamcast. I started it years ago, but I recently decided I was going to finish it. The original SimCity is kind of basic compared to its sequels, so I was kind of worried that the port would be forgettable, but I came up with a hook: split screen multiplayer, with either cooperative building or a competitive race to see who gets the best score in a certain amount of time (or beat a scenario first).
Right now, what I’m actively working on is the preprocessor for the C interpreter. Macro expansion works, and I almost have #ifs working. Once I get the C preprocessor finished, I was planning to go back to focusing on Micropolis and finishing that, but after writing this I’m thinking I should finish the driver first.