Porting to DX9 Issues

Jamm0r

Newcomer
In the process of porting a D3D7 engine to D3D9, I have, as expected, encountered a few issues.

First and foremost amongst these, is performance. The D3D9 engine runs approximately 30-40% slower than the DX7 one.

One the major goals for the new engine was to keep the fundamentals as close to the original as possible. I was able to pretty much accomplish this, albeit one major hiccup.

To make a long story a little shorter, in the original implementation, the landscape engine builds a certain number of tiles (textures), depending on too many factors to get into here, by rendering to a single render target texture, created at app initialization in video memory, and then blitting it across vid mem to the appropriate landscape texture, also located in vid mem. The landscape texture table uses up to 64 MB of video RAM, and the textures are created in DXT1 format if supported.

Now, in DX9, I have not been able to find any method to copy from a render target texture, obviously located in vid mem, to another texture also located in vid mem. It seems that this is not possible. Therefore, I render directly into the landscape texture table. Every landscape tile is now a render target texture, and directly receives the rendering.

Unfortunately, I can no longer use DXT compression on these tiles, as they are render targets. For each tile, which are of differing sizes (1024x1024 max - 64x64 min), I grab the Level 0 surface, create a matching DepthStencil surface, set them as the current RenderTarget, and begin rendering. At EndScene, I Release these two surfaces, switch to another tile, and repeat.

Are there any obvious problems with this new implementation that could explain the performance hit? Is there any fast way to copy a render texture into another texture across vid mem in DX9? Are there any other snafu's to keep in mind when porting to DX9?

Thanks in advance.
 
I use D3DXLoadSurfaceFromSurface to upload textures into video memory on app initialization, but have found that it is way too slow to call dozens of times per frame.
 
Jamm0r said:
Now, in DX9, I have not been able to find any method to copy from a render target texture, obviously located in vid mem, to another texture also located in vid mem. It seems that this is not possible. Therefore, I render directly into the landscape texture table. Every landscape tile is now a render target texture, and directly receives the rendering.
How about IDirect3DDevice9::StretchRect ? Although I think this currently requires both source and dest to be marked as a render target which might cause various problems depending on the HW you're running on....

Edit...
Hmm, ok you're using every texture as a render target now which should be fine. You could be seeing perf problems due to the graphics card you're using having problems when using a render target as a texture (e.g. probs with page breaks due to surface being untwiddled etc)...


John.
 
John,

The performance hit is similar on all cards we have tested it on so far, including my 9700 Pro, some Radeon 9500 and 8500s, and a few GeForce 4s. Could you elaborate on which cards have these problems, the problem itself, and how, if possible, to solve / avoid it? This is the first I hear of such an issue. Many thanks.
 
It seems all methods for copying texture/surface to another texture/surface have something against non-render target textures or format conversions, StretchRect, UpdateSurface and UpdateTexture included...
 
Jamm0r said:
John,

The performance hit is similar on all cards we have tested it on so far, including my 9700 Pro, some Radeon 9500 and 8500s, and a few GeForce 4s. Could you elaborate on which cards have these problems, the problem itself, and how, if possible, to solve / avoid it? This is the first I hear of such an issue. Many thanks.

From what you've described I don't think this is your issue, but anyway...

Basically some cards cannot render in a format that can be used efficiently as a texture source e.g. you might only be able to render in strided linear format (each pixel in a line and each line is sequentialy in memory), this format is prone to page breaks when addressed arbitrarily. Other cards simply cannot directly read a render target as a texture (many 3dfx, TNTx, GF1-2 etc), in these cases the driver is doing a data xfer behind the scenes to get things to work. There's probably other reasons as well.
----------------------------------------

Re-reading your post, are you sure you're not just thrashing vid mem now that your landscape textures aren't compressed ?

Edit... One other comment I have heard that some cards take a big hit when setting render target repeatedly. Not sure how true/bad this is in reality, but this could perhaps be alleviated by grouping multiple 64x64 tiles into a singel texture.

John.
 
JohnH said:
Jamm0r said:
John,

The performance hit is similar on all cards we have tested it on so far, including my 9700 Pro, some Radeon 9500 and 8500s, and a few GeForce 4s. Could you elaborate on which cards have these problems, the problem itself, and how, if possible, to solve / avoid it? This is the first I hear of such an issue. Many thanks.

From what you've described I don't think this is your issue, but anyway...

Basically some cards cannot render in a format that can be used efficiently as a texture source e.g. you might only be able to render in strided linear format (each pixel in a line and each line is sequentialy in memory), this format is prone to page breaks when addressed arbitrarily. Other cards simply cannot directly read a render target as a texture (many 3dfx, TNTx, GF1-2 etc), in these cases the driver is doing a data xfer behind the scenes to get things to work. There's probably other reasons as well.
----------------------------------------

Re-reading your post, are you sure you're not just thrashing vid mem now that your landscape textures aren't compressed ?

Edit... One other comment I have heard that some cards take a big hit when setting render target repeatedly. Not sure how true/bad this is in reality, but this could perhaps be alleviated by grouping multiple 64x64 tiles into a singel texture.

John.

John,

In reality, my latest tests indicate a HUGE geometry processing hit. I disabled the landscape altogether, and believe it or not, frametimes are almost identical! When looking straight up at the sky, FPS are obviously through the roof. With just a few aircraft or buildings in view (100 - 1000 faces each), framerates drop drastically.

Funny thing is, besides changing the DX calls and interfaces, everything in the poly processing / drawing is identical. We use the fixed-function pipeline and software vertex processing, using a non-indexed D3DUSAGE_DYNAMIC|D3DUSAGE_WRITEONLY + D3DPOOL_DEFAULT vertex buffer with a 1024 vertex limit. We batch polys by material, and draw triangle fans, locking the buffer once per 1024 vertices with D3DLOCK_DISCARD to prevent stalls. We then DrawPrimitive for every fan in the vertex buffer.

This is obviously not the most optimal method, but is lightning fast using DX7. The same code, using, as I said, DX9 interfaces, slows to a crawl. Any ideas?
 
[maven said:
]
We use the fixed-function pipeline and software vertex processing

Did you use software VP in DX7 as well? Also check the Renderstates, as DX9 has some different defaults IIRC.

Yes, we do. I realize that many of the Renderstates have different names and such, and I spent quite some time converting them, looking each one up, etc. I'll have another look, thanks.
 
Right, so I disabled the landscape, then turned wireframe mode on (just replaces all D3DPT_TRIANGLEFAN with LINELIST), and the problem is still there, very evident.
 
Are you still mixing SW VP with HW fixed FN T&L ?

Your use of VB's seems fine. Hmm you're not using the same VB's for SW VP and HW T&L are you ? Or maybe for index buffers (which didn't exist in Dx7) ? Using the same buffer for both could account for your problems...

You say you've removed the landscape stuff, does that include the setting of the render targets ?

Otherwise without closer understanding of what your code does reasonably stumped. Have you asked the guys up on the Dx dev mailing list, someone may have had the same problem in the past....

John.
 
Back
Top