Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 27-Jun-2009, 13:25   #1
eigers
Junior Member
 
Join Date: Mar 2008
Posts: 21
Default Driver overhead in GetRenderTargetData implementation

Hi all,

I'm using DX9 to render and capture the image to system memory. The capture is done using IDirect3DDevice9::GetRenderTargetData(). I'm using Windows Vista and GF8800GT.
The render target size is 782x160x4 (= ~0.5Mb). The capture time on my system is 0.5 ms, which gives a bandwidth of 1Gb/s. The real bandwidth, using CUDA bandwidthTest is ~3Gb/s.
I always wondered why this was, and I assumed there was some kind of constant overhead to each GetRenderTargetData call.
Now, using the new GPUView tool, I finally could look into this.
To my surprise, I discovered that GetRenderTargetData is implemented using 3 separate command buffers submitted to the kernel driver and GPU. Moreover, there's quite a lot of "dead" GPU time in between, where the UMD is working. Actually, looking at the GPU time of these 3 command buffers, their total is 170 us, which gives us almost exactly the real 3Gb/s bandwidth.

So, the questions are:
1. Why 3 submitions (with kernel mode switch overhead, Dxgk overhead etc.)?
2. What is the UMD doing during the rest of the time (330 us), given that there's no format conversion or any processing involved?

Thanks.
eigers is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 17:59.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.