If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#1 |
|
Junior Member
Join Date: Mar 2008
Posts: 21
|
Hi all,
I'm using DX9 to render and capture the image to system memory. The capture is done using IDirect3DDevice9::GetRenderTargetData(). I'm using Windows Vista and GF8800GT. The render target size is 782x160x4 (= ~0.5Mb). The capture time on my system is 0.5 ms, which gives a bandwidth of 1Gb/s. The real bandwidth, using CUDA bandwidthTest is ~3Gb/s. I always wondered why this was, and I assumed there was some kind of constant overhead to each GetRenderTargetData call. Now, using the new GPUView tool, I finally could look into this. To my surprise, I discovered that GetRenderTargetData is implemented using 3 separate command buffers submitted to the kernel driver and GPU. Moreover, there's quite a lot of "dead" GPU time in between, where the UMD is working. Actually, looking at the GPU time of these 3 command buffers, their total is 170 us, which gives us almost exactly the real 3Gb/s bandwidth. So, the questions are: 1. Why 3 submitions (with kernel mode switch overhead, Dxgk overhead etc.)? 2. What is the UMD doing during the rest of the time (330 us), given that there's no format conversion or any processing involved? Thanks. |
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|