Multithreading rendering engine

Hi all,

I've recently added multithreading rendering support to my engine (W32,D3D9). The tasks are organized in 3 threads (logic, load, render).

The logic thread is the main thread (this thread creates the window and the device) and runs the mainloop (and the windows procedure events).

All rendering calls are done only from the rendering thread (VSYNC is enabled).

The performance is really good in windowed modes (logic runs 15000-20000 times per second in a small scene), but when I switch to fullscreen, the logic thread greatly reduces the performance (between 30 and 120 times per second) and I don't understand why.

Should I run the rendering in the main thread and the logic in a specific thread? Before running a profile application, I would like to ask you if there is any conceptual error in this architecture

I know windows changes the execution of the threads between the available processors and I've read somewhere that the main thread should be executed in the same processor for all the execution (SetAffinityMask).

I know there are some calls that should be only done from the logic thread (as for example QueryPerformanceCounter) in order to avoid timing problems (negative delta times). Any other hint?

Thanks a lot for your time and happy coding,

s.
 
I do not see any conceptual problems with the way you have organized your threads. However I would move all DirectX code, including the device creation and resource creation (vertex buffers, textures, etc) to the same thread as rendering, and only access DirectX from that thread. DirectX is not normally thread safe (but can be made so with a define), so accessing it from multiple threads is not advisable.
 
However I would move all DirectX code, including the device creation and resource creation (vertex buffers, textures, etc) to the same thread as rendering...
It's already working in that way.

On the other hand, DX9 multithreading feature has a noticeable impact in performance and it is better not to use it in gaming applications (the main target of my engine).
 
DX9 multithreading feature has a noticeable impact in performance and it is better not to use it in gaming applications (the main target of my engine).

Agreed completely. Better use one thread only for rendering. You can do scene management, object culling, animation setup, physics, etc in other threads.

It's already working in that way.

Hmm... In your earlier post you said that the main thread creates the d3d device, and the rendering happens in rendering thread.

The logic thread is the main thread (this thread creates the window and the device) and runs the mainloop (and the windows procedure events).

All rendering calls are done only from the rendering thread (VSYNC is enabled).
 
No more than you will be doing with D3D calls it's probably best to leave all of them in one thread. Use other threads for any sort of sorting, culling, etc.

If you're using any locking threads it's possible vsync is he problem. In windowed mode vsync might be disabled and when going fullscreen it's turning on and cutting the framerate from something really high to 60fps.

Also you have to be doing some really light rendering if you're getting 15k-20k frames. In my experience, with nothing in the scene, getting over 1k FPS usually means something is going wrong.
 
As Microsoft states http://msdn.microsoft.com/en-us/library/bb147224(VS.85).aspx

IDirect3DDevice9::Reset, IDirect3D9::CreateDevice calls can only be called from the same thread that handles window messages. Other D3D calls can be called in a different thread without deadlock problems.

You can always move your window creation and window message handling to the rendering thread (or move the rendering to the main thread and the game state handling code to it's own thread). Or are you planning to implement a synchronization mechanism to handle your IDirect3DDevice9::Reset() and IDirect3DDevice9::TestCooperativeLevel() calls in separate thread from the rendering?
 
Thanks for the replies.

Also you have to be doing some really light rendering if you're getting 15k-20k frames. In my experience, with nothing in the scene, getting over 1k FPS usually means something is going wrong.

I get 15000-20000 times per second in the logic thread, the rendering thread runs 60 fps (as maximum) because vsync is enabled. I have checked windowed performance with FRAPS and also gives 60 fps. If I disable VSYNC I get 1000-2000 fps in the rendering thread with a small scene (ambient pass only).

My logic thread uses a green thread (software threads) philosophy to limit the execution of tasks n times per second. Obviously it's insane to run object culling 20000 times per second but I let this happen only for debugging purposes (algorithm performance).

Or are you planning to implement a synchronization mechanism to handle your IDirect3DDevice9::Reset() and IDirect3DDevice9::TestCooperativeLevel() calls in separate thread from the rendering?

I already have a synchronization mechanism to handle window mode changes and buffer lost events. When the system needs to apply these changes, the logic thread "stops" the rendering thread (with a critical section), video resources are recreated from the logic thread and finally the rendering thread is enabled again. In this way, my rendering thread only has to worry about render description buffers (a vector with a description of the objects to render).
I don't want to bother the rendering thread with windows message event handling if I can avoid it because windows message events are usually involved with logic tasks.

However I don't know if (with D3D) it's better to handle window message events in the rendering thread in order to achieve better performances.
 
My initial assumption would be that you're tying your logic to the rendering. Be that some actual usage detail of the API or OS or some unintentional blocking operation with inter-thread communication. Your sudden drop in performance and the numbers you're reporting seem very VSYNC-related ;)

I'd personally keep the Win32 and D3D code in the same thread. Yes there is some logical separation between the two, but in practice the majority of W32 messages you'll be looking for are going to be tied with the rendering behaviour (minimize/maximize/lock etc..etc..) with the main exception being input processing (keyboard/mouse).

Conversely, other than for clean design I can't see much advantage to having the two seperated. My impression is that the real gains for multithreading are from having game/physics/ai (etc..) logic and processing running in parallel to code that directly interfaces with the API or hardware.

hth
Jack
 
My initial assumption would be that you're tying your logic to the rendering. Be that some actual usage detail of the API or OS or some unintentional blocking operation with inter-thread communication. Your sudden drop in performance and the numbers you're reporting seem very VSYNC-related.

I think you are right. I will check my synchonization mechanisms (I am already using a triple description buffer though).

I'd personally keep the Win32 and D3D code in the same thread. Yes there is some logical separation between the two, but in practice the majority of W32 messages you'll be looking for are going to be tied with the rendering behaviour (minimize/maximize/lock etc..etc..) with the main exception being input processing (keyboard/mouse).

I decided to create a dedicated rendering thread by 2 reasons:
- 16 ms dedicated only to rendering tasks (render shadow maps, impostors, reflections, cubemaps, viewports, etc)
- enable VSYNC without blocking main thread (triple buffer is not an option for me)

My target application (a game) runs usually in fullscreen so minimize, maximize, and change display mode events are rarely needed. If the render thread has to worry about user input events, the previous conditions can never be achieved.

IMHO, I think the user input events (mouse, keyboard) have to be processed the sooner the better so the changes that cause are viewed 2 frames after they are done (the logic thread has to process these events before the results are rendered).
 
Last edited by a moderator:
As Microsoft states http://msdn.microsoft.com/en-us/library/bb147224(VS.85).aspx

IDirect3DDevice9::Reset, IDirect3D9::CreateDevice calls can only be called from the same thread that handles window messages. Other D3D calls can be called in a different thread without deadlock problems.

I've inherited an multiple-viewport application that has a dedicated rendering thread. It tries to call CreateDevice and Reset from this dedicated thread, and it seems to work.

How can I let the render thread call CreateDevice while keeping everything else in the app on the main GUI thread? Or, what will cause deadlock in this situation?

MS says that we can do better than their multithreaded implementation, so I'd rather not use the multithreaded device creation flag.
 
I've inherited an multiple-viewport application that has a dedicated rendering thread. It tries to call CreateDevice and Reset from this dedicated thread, and it seems to work.

It's doomed to fail. Maybe not now as you test it, but later as you try to expand the application to handle full screen.

How can I let the render thread call CreateDevice while keeping everything else in the app on the main GUI thread? Or, what will cause deadlock in this situation?

Some explanation from Microsoft :
http://msdn.microsoft.com/en-us/library/bb147224(VS.85).aspx

MS says that we can do better than their multithreaded implementation, so I'd rather not use the multithreaded device creation flag.

The multithreaded device creation flag doesn't prevent an error such as the one above (calling createDevice from the wrong thread). So you have to follow the rules and actually enforce them in your code BY DESIGN (not by sprinkling mutexes here and there).

For example if you called destroy on the device you cannot call into it anymore from another thread, no matter what: so you have to actually stop new commands to be issued from the render thread as soon as you destroyed it in the window thread (or let the render thread signal to the window thread to destroy the device and then go to sleep).

Here are some thoughts :
Multithreaded devices in Direct3D 9
 
I finally found the problem. It was a timing synchronization problem. I've changed the affinity mask of the threads in order to resolve the problem. The logic thread stays in the processor 1 and the render thread stays in the processor 2 during all execution.

Thanks all for your hints!!!

Cheers,

s.
 
Hi there,
I was fascinated by your post, I am working on a large project and I looking for people that have a strong programming background and are aware of the development possibilities of isolating individual cpu threads. Do you know anything about working in OpenGL and Linux? Thanks,
Eric
 
Back
Top