A new benchmarking tool...

Colourless

Monochrome wench
Veteran
Maybe not the exact right forum for this, but it's a 'technology'.

I've been working on a new benchmarking tool for a while now and it's at a stage where people might be interested in it. It's exactly usefulness at this stage would be debatable, but it does what it's supposed to. Consider it a tech preview if anything.

So, what is it?

Quite simple, it's captures the output from Direct3D programs and writes it to disk allowing you to playback the exact same sequence as much as you like! Textures, vertex buffers, index buffers, shaders and function calls... it all gets written to disk! Well, almost all of it, some of the more obscure features are unimplemented.

If you are curious, download here: http://www.users.on.net/triforce/d3dbench.zip
Just a note it will ONLY work under WindowsXP with Direct3D8 and Direct3D9 apps. Eventually I'll like OpenGL support too, but it's not coming anytime soon.

Using it is pretty simple. To record the output of a program, copy the dlls to the applications directory. Then run the application. It's that simple! :) The process will create 3 files, bufstream.lcs, comstream.lcs and texstream.lcs. These files can get VERY large very quickly, especially if the program does skinning in software or, heaven forbid, it does it's own TnL.

When you are in the application you can then use the 'Scroll Lock' key to create 'benchmark sequences'. The sequences just slightly change how things the csv file is formatted when doing a benchmark.

To playback what you recorded you should use LCSPlayer. Just a note that LCSPlayer is purely a frontend for SimpleStreamPlayer so they must be in the same directory to work. LCSPlayer lets you set a number of options including the mode you want to playback in. Demo mode will replay the streams using the exactly same timing that was recorded. Benchmark mode will attempt to playback the streams as fast as possible and will output benchmark timing info in the file "benchmark.csv". The file will be created in the directory containing the 3 stream files.

Here is what LCSPlayer looks like. Pretty basic, but this is only sort of tech preview release.
lcsplayer.png


Ideally when you do a benchmark you should have all three Precache options enabled. However note the amount of memory required, precaching can require a hell of a lot of memory. If you don't precache the files will be streamed from disk instaed. While the streaming is fairly efficent, I've noticed that on my system there is about a 5% penalty to framerates in benchmarks. You should really aim to have more than 128 MB of memory unused after precache. There is no rule about how much memory is required though, as each application that has been recorded has different requirements. Lots free is always good though.

SimpleStreamPlayer is the program that actually interperates the streams and sends their data to Direct3D. While the program does accept command line args, you can't do anything with tham that you can't already set with LCSPlayer. If people want to know them, I will give details.

ComStreamDump is something that only programmer types would be interested in. It will read comstream.lcs and convert the 'bytecode' into a pseudo code. The code is 'almost' C++, but not quite. Many of the function arg values are not printed so there is no chance you'd actually be able to compile it. What it will do is show you what the recorded app was doing. It's only useful to a certain degree though since the volume of text that is produced is phenomenal.

Now for a little bit of info on the fields in the benchmark.csv file. Firstly, timing information for every frame is always printed. I will make an option in the future to only output the details for frames that are part of 'benchmark sequences'
Frame: The absolute frame number from the start
Bench ID: The number of the Benchmark Sequence. This is incremented for each new sequence you create.
Bench Frame: The framenumber in the sequence.
Time ms: The abolute time in ms from the start during the recording phase
Rec ms/frame: How long it took to render the frame when recording
Rec FPS: The FPS for the frame when recording
Bench ms/frame: How long it took to render the frame when benchmarking
Bench FPS: The FPS for the frame when benchmarking

If a frame is not part of a benchmarking sequence, Bench ID and Bench Frame will be blank. The first frame of a Benchmarking sequence will have the word Begin to the right of the frame data. The last frame of a benchmarking sequence will have the word end to the right of the frame data. There will also be a set of averages and totals written next to the 'end' marker.


If you are going to do anything with this stuff, here are a few notes:

When you make a recording, make sure you have VSync on. The reason is basic 2D user interface screen can render in the thousands of frames per second. This tends to create an excessive amount of useless data. You can find that more than half of the benchmark.csv file ends up being frames you don't want. Keeping Vsync on keeps the amount of data down to a useful amount. Then when you do a bencmarks make sure VSync is off.

You shouldn't record for more than a few minutes. You'd be surprised at how quickly the stream files can grow. Don't be surprised if <insert your selected program here> generates hundereds of megabyes of data every few minutes. If this starts to happen, it might seriously impact the ability to make benchmarks as you will not be able to precache all the files. This depends on the app though. Older apps that do software skinning are likely to produce huge amounts of vertex data. New apps that use vertex shaders for skinning are likely to produce far less data.

Be warned with Halo, it's cinematics are rendered to D3D surfaces. D3DInterceptor will capture these and will attempt to write them to disk. Not only is the process really really slow, you'll generate an excessive amount of data. Just skip through them quickly. Other games might be similar, so, skip though all cinematics as quickly as you can.

A limitation with how things currently are, benchmark.csv will be overwritten when you begin to benchmark again. SimpleStreamPlayer/LCSPlayer will bring up a dialog box before is starts the benchmark after loading giving you an opportunity to Cancel before the file if overwritten. You should rename your existing csv file as soon as the benchmark is finished, or convert it to an XLS.

The SimpleStreamPlayer benchmark dialog box also serves as something to hold up execution till windows stops playing with it's Virtual Memory. You should wait till the hard drive stops being busy before actually starting the benchmark.

The app does NOT do any sort of Caps or Format checking. It assumes that eveything is fine. This means it's possible that a sequence recorded on one card or even driver set will not work properly with another. Not much can be done about this though.

Lastly, do not be an idiot. The 3 stream files will contain copyrighted materials so do NOT distribute them!


The compatibilty of the tool seems pretty good at this stage. A few programs have had a few problems, some of them it would appear is unfixable. What seems to be the biggest problem is there tends to be a huge about of 'noise' in the perframe times. I'm guessing this is being caused by frame buffering by the cards drivers. I'll probably add features in future version to hopelly work around that problem. Here is an example of how bad things can be. It is a benchmarking run from 3DMark2001's Dragothic High Detail. I added in a trendline to show the approximate actual FPS.

draghigh.png


It would pretty much appear that whenever the scene is too simple and the FPS climbs that things go a bit crazy. Here is a run from Halo. The first part is when I was indoors, and the rest was entirely outside. Note how much faster the apps are running when benchmarking from when they were recorded. I'm on a Athlon 1GHz w/ Radeon 9700 Pro here and it would appear that I am quite CPU limited (like i didn't already know that).

halo.png


And here is a final graph showing FarCry. Yeah... at the recorded speeds it's pretty much totally unplayable, but hey it's only one of the demos. what is partially interesting is how high the FPS goes when benchmarking. 4 times faster in many places.

farcry.png



Now onto some game specific info. These are a few of the apps i've tested, and a few notes about them:

Unreal Tournament 2004: Seems to work fine.

Far Cry: Seems to work fine.

Halo: Seems to work fine. Seems to produce an above average amount of function call data. Might indicate a slightly inefficent engine.

Max Payne 2: Crashed for me on playback BUT i've made a number of changes since i last tried it

War Craft 3: Works, but produces lots of vertex data. Hits gigabytes of data after about 5 minutes. The faster your recording FPS, the more data it will make. You'll probably end up being limited by hard drive speed. The game itself is hugely CPU limited.

Giants Citizen Kabuto (with GeForce 3 patch): Totally unusable. Produces about a hundered megabytes of data per FRAME! Seems to be due to issues with Vertex and Index buffer locking. Also vertex/index buffers seem to become corrupted.

GTA 3: Works fine. Only sends post transformed vertices to Direct3D so it does all of the TnL itself.

GTA Vice City: Same as GTA 3.

3DMark2001: Seems to work fine. Scroll Lock key stops functioning though if you attempt to use it in more then on test.

3DMark03: Grass and Leaves in Mother Nature test don't render properly (they turn black at a distance). Other then that it appears to be fine. Wings of Fury is mostly CPU limited for me. Proxycon seems to be almost entirely VPU bound on my system. Troll's Lair tends to be limited a bit by both for me. Mother Nature looks VPU limited.

Well, that's it from me for now...
 
Call me a lying bastard, but I was thinking about this the other day. "Why couldn't someone write a program to capture all D3D calls and such like 3DAnalyze captures shaders?" And lo and behold, Mr. Genius over here did it. Now here's my question that I'm pretty sure I already know the answer to. There's no sort of dependency on any files included with the application, it's just all recorded... right?

Man, I need to play with this.
 
Great job. To be honest, I have been searching for such a tool for a long time that records Direct3D events. It might be usefull for analysing performance bottlenecks in certain games. :)
 
...or sending bug reports to an IHV without needing to send the actual application and explaining complex reproduction instructions...
 
Wow. Uhm, would this in any way shape or form help with driver cheats and what not by intercepting what D3D is asking for vs what the driver is actually telling the card to render?
 
Hyp-X said:
Colourless said:
Just a note it will ONLY work under WindowsXP with Direct3D8 and Direct3D9 apps.

What are the chances you make it work with Win2000?

I can do it. I'm just using a few WinXP specific function calls in a few places. I can work around it though.
 
Natoma said:
Wow. Uhm, would this in any way shape or form help with driver cheats and what not by intercepting what D3D is asking for vs what the driver is actually rendering?
No, because the driver is at a lower level than where the captured data came from. You can capture data at the API or DDI level, but neither of these is what the HW is receiving.
 
OpenGL guy said:
Natoma said:
Wow. Uhm, would this in any way shape or form help with driver cheats and what not by intercepting what D3D is asking for vs what the driver is actually rendering?
No, because the driver is at a lower level than where the captured data came from. You can capture data at the API or DDI level, but neither of these is what the HW is receiving.

Ok then my next question is, could you write something like Colourless has done that captures that data, or are you saying that driver level data is off limits to any kind of tool such as this?
 
Natoma said:
Wow. Uhm, would this in any way shape or form help with driver cheats and what not by intercepting what D3D is asking for vs what the driver is actually telling the card to render?

That would depend on exactly what the driver was doing. This will stop some level of application detection. However if the driver it looking for certain combinations of function calls or specific textures, then this might not do anything.

It is something I am curious about though. My experience has been the benchmark runs have always been faster than the recorded runs. However, a telling sign would be if the benchmarked run was slower than the recorded. This sort of thing though might only be noticable on a fast CPU.
 
Colourless said:
Natoma said:
Wow. Uhm, would this in any way shape or form help with driver cheats and what not by intercepting what D3D is asking for vs what the driver is actually telling the card to render?

That would depend on exactly what the driver was doing. This will stop some level of application detection. However if the driver it looking for certain combinations of function calls or specific textures, then this might not do anything.

I was thinking about that. Isn't the problem with driver optimizations that we've seen over the past few years been that it only works with pre-recorded demos, and that if you "go off the rails," as it were, this fails? So for a tool such as this, why would this necessarily be an issue?

Colourless said:
It is something I am curious about though. My experience has been the benchmark runs have always been faster than the recorded runs. However, a telling sign would be if the benchmarked run was slower than the recorded. This sort of thing though might only be noticable on a fast CPU.

Why exactly would this be a telling sign?

<-- Not a 3D junkie. :)
 
86: lpD3DDev9_2->Unknown command (filepos 0x27865)

It crashes both the player and ComStreamDump on playback.
 
Natoma said:
OpenGL guy said:
Natoma said:
Wow. Uhm, would this in any way shape or form help with driver cheats and what not by intercepting what D3D is asking for vs what the driver is actually rendering?
No, because the driver is at a lower level than where the captured data came from. You can capture data at the API or DDI level, but neither of these is what the HW is receiving.

Ok then my next question is, could you write something like Colourless has done that captures that data, or are you saying that driver level data is off limits to any kind of tool such as this?

I think it's rather a limitimation caused by the fact it isn't an "open" driver.... :D
 
Hyp-X said:
86: lpD3DDev9_2->Unknown command (filepos 0x27865)

It crashes both the player and ComStreamDump on playback.

Ok, I can fix ComStreamDump since 86 isn't implemented, but the player has that implemented so it's another problem. I'll update the player to give better error messages, which should help find the problem, IF if gave you an error.

Just curious, what program did you capture from? If I have access to it, it might help fix the problem.
 
Colourless said:
Hyp-X said:
86: lpD3DDev9_2->Unknown command (filepos 0x27865)

It crashes both the player and ComStreamDump on playback.

Ok, I can fix ComStreamDump since 86 isn't implemented, but the player has that implemented so it's another problem. I'll update the player to give better error messages, which should help find the problem, IF if gave you an error.

Ok, I think 86 is CreateDepthStencilSurface just analizing the output.

The player just exited without displaying anything or giving any error message.

Just curious, what program did you capture from? If I have access to it, it might help fix the problem.

It's our internal build of Panzers, so unfortunately no.
I could send some of the captured stuff altough 'bufstream.lcs' is pretty big.

I'll try another recording with shadows disabled...
 
Can you check to see if you have a lcs_errors.txt being created. It's created by the capture process when 'bad' things start happening. Though if it's just crashing on playback i don't think it will help.
 
Back
Top