I've got a D3D9 app that needs to load textures on the fly while maintaining a solid 60fps (16.66ms) framerate. There are limited hardware targets that we control, but they are currently all low-end (5200, 6200 turbo cache)
I'm pretty stuck. So, thanks in advance for any ideas/pointers.
-Zach
First the questions:
Are there any examples/tutorials on loading textures and minimizing pipeline stalls online?
Is there a way to disable the 'virtual memory' on the 6200 turbo cache card for testing?
Can you pick out anything goofy below?
Are there any (free?) profiler tools to help me pinpoint my bottleneck more than I have? I'm about to try NVPerfHUD.
And here's a pile of details:
Currently, the app has a 'draw thread' that just draws like mad in a window with vsync on. A second 'control thread' receives instructions over the network and does all the setup/teardown of textures/shaders, etc. We were using D3DXCreateTextureFromFileEx(), but it was making a lock that stopped the draw thread and caused 'hitching' in the frame rate. So, I've separated the disk I/O from the d3d stuff. We use 3-5 large non-pow2 textures at a time - 640x480, 800x600, 1280x720, etc.
The draw thread basically spends all of it's time inside device->Present(NULL, NULL, NULL, NULL))
Currently the control thread works like this. Keep in mind that this texture is completely unknown to the draw thread. I'm just fighting the d3d global lock.
load png to gdi+ bitmap
D3DXCreateTexture(D3DUSAGE_DYNAMIC)
tex->LockRect(D3DLOCK_DISCARD)
memcpy rows to fill d3d texture
tex->UnlockRect(0)
application lock (enter critical section)
add new objects/textures to draw thread work queue
application unlock
This has allowed the hated 5200 (bad pixelshader perf) to run smoothly at 640x480 even under an artificial worst-case stress test (continuous texture churn), but the 6200 TC is still has a small 'hitch' in framerate when textures load. Under the stress test it drops to an unsmooth 30FPS, and appears to be texture/window res independent. (I tried forcing all 256x256 textures in a 640x200 window.) The 6200 TC can spend ~60ms inside device->Present() when adding a texture. I think the shader is fine, when textures aren't changing it's happy running the pixel shaders at 720p.
I'm using QPC for timings, but it can be hard to tell what is happening in the draw thread since the Nvidia driver appears to buffer more than 1 frame of commands. But, it looks like the d3d locks are only during texture create, and lock/fill/unlock. When running multithreaded, it can take quite a while for the command thread to acquire the d3d lock (CreateTexure and LockRect), but when running a single thread it is ~0.5ms. So, I think it's fine if the control thread waits on the draw thread. The texture fill/unlock takes only 2ms after the lock is acquired.
I'm pretty stuck. So, thanks in advance for any ideas/pointers.
-Zach
First the questions:
Are there any examples/tutorials on loading textures and minimizing pipeline stalls online?
Is there a way to disable the 'virtual memory' on the 6200 turbo cache card for testing?
Can you pick out anything goofy below?
Are there any (free?) profiler tools to help me pinpoint my bottleneck more than I have? I'm about to try NVPerfHUD.
And here's a pile of details:
Currently, the app has a 'draw thread' that just draws like mad in a window with vsync on. A second 'control thread' receives instructions over the network and does all the setup/teardown of textures/shaders, etc. We were using D3DXCreateTextureFromFileEx(), but it was making a lock that stopped the draw thread and caused 'hitching' in the frame rate. So, I've separated the disk I/O from the d3d stuff. We use 3-5 large non-pow2 textures at a time - 640x480, 800x600, 1280x720, etc.
The draw thread basically spends all of it's time inside device->Present(NULL, NULL, NULL, NULL))
Currently the control thread works like this. Keep in mind that this texture is completely unknown to the draw thread. I'm just fighting the d3d global lock.
load png to gdi+ bitmap
D3DXCreateTexture(D3DUSAGE_DYNAMIC)
tex->LockRect(D3DLOCK_DISCARD)
memcpy rows to fill d3d texture
tex->UnlockRect(0)
application lock (enter critical section)
add new objects/textures to draw thread work queue
application unlock
This has allowed the hated 5200 (bad pixelshader perf) to run smoothly at 640x480 even under an artificial worst-case stress test (continuous texture churn), but the 6200 TC is still has a small 'hitch' in framerate when textures load. Under the stress test it drops to an unsmooth 30FPS, and appears to be texture/window res independent. (I tried forcing all 256x256 textures in a 640x200 window.) The 6200 TC can spend ~60ms inside device->Present() when adding a texture. I think the shader is fine, when textures aren't changing it's happy running the pixel shaders at 720p.
I'm using QPC for timings, but it can be hard to tell what is happening in the draw thread since the Nvidia driver appears to buffer more than 1 frame of commands. But, it looks like the d3d locks are only during texture create, and lock/fill/unlock. When running multithreaded, it can take quite a while for the command thread to acquire the d3d lock (CreateTexure and LockRect), but when running a single thread it is ~0.5ms. So, I think it's fine if the control thread waits on the draw thread. The texture fill/unlock takes only 2ms after the lock is acquired.