How graphics drivers affect isolated realtime processes

Hi,

I'm new, but failed to find this topic on this site, so if this question is answered, please point me in the right direction.

I'm trying to understand how a 3D visualization program and graphics driver affects a realtime process that runs on an isolated cpu. See my setup, test results and questions below.

Computer Setup:
  • Computer: Dell T7810
  • Graphics Card: NVIDIA Quadro K2200 4GB
  • Operating System: Ubuntu 14.04.1 LTS, Linux 3.13.0-44-lowlatency x86_64
  • Processors: 16 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
  • Desktop Manager: Unity or Gnome (compiz) or Gnome (metacity) (tried all three)

Realtime Process:
I have a realtime process that is isolated to CPU 0, it's address space (current and future) is mapped to physical memory, and it is given the highest realtime priority in the low-latency kernel (99) with SCHED_RR (round-robin scheduling. also tried FIFO). It's control loop is running at 1 kHz (1-ms loop). The timing is controlled and checked by using clock_gettime(CLOCK_MONOTONIC, &current_time) and clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &wakeup_time, NULL).

3D Graphics Process:
Simple OpenGL 3D rendering of a robot on a grid

Investigation:
How the NVIDIA and Nouveau graphics drivers affect the realtime process?
  • NVIDIA binary driver - version 340.76 from nvidia-340 (open source)
  • X.Org X server - Nouveau display driver from xserver-xorg-video-nouveau (open source)
Test Results:
  • NVIDIA driver: 3D graphics is very fast, but the realtime process exceeds its 1-ms loop time occasionally. Opening up a web browser also causes the realtime process to exceed its 1-ms loop time.
  • Nouveau driver: 3D graphics is very laggy, but the realtime process can run at 2 kHz without exceeding its 0.5-ms loop time.
Questions:
  1. What fundamental differences exist between the two types of graphics drivers.
  2. How does the NVIDIA driver cause an isolated process to slow down?
  3. Is there a way to use the NVIDIA driver while fixing the timing problem?
Thanks a lot!
Pete
 
I don't know much about linux but in regards to #3 have you tried/can you move the realtime process to a different core? Maybe one of the threads of the driver has an affinity for CPU 0 for whatever reason.
 
Thanks for the response. I have tried core 15. Didn't notice any difference. I also tried writing a different cpu affinity to the nvidia irq smp_affinity file, but that didn't seem to change anything and it wasn't permanent.
 
Looking at /proc/interrupts I notice that with the nvidia driver running, there's an nvidia irq, but with the nouveau driver I don't see any irq for it.
 
Your CPU is 8 core 16 thread, try cores 1 through 7 just to be sure.
edit - 1 through 8 for a total of 9 cores including 0.
 
Last edited:
Pete, any reason why you are using Nvidia version 340.76? The slowdown might be driver specific and may have been fixed in a more recent version.
 
Pete, any reason why you are using Nvidia version 340.76? The slowdown might be driver specific and may have been fixed in a more recent version.

Not at this point. The latest available from the Software & Updates menu's Additional Drivers tab is 352.21, which is the latest for my graphics card on the NVIDIA website. I tried this, and if anything it made timing worse for the realtime process.
 
I know recent Nvidia drivers have had "timing" issues and also affect Quadro cards as well. A hotfix was released (driver ver. 353.38) a day ago that seems to resolve the issue, but not sure there is a Quadro hotfix yet. The official version (which will include a Quadro version) should be released shortly (I think they said within a week).

If doing any serious troubleshooting I'd definitely wait for the hotfix driver. Quadro user below has the issue ...
what about those of us who have quadro cards? I have been expreincing all the issues described in the thread yet there is no hotfix for quadro drivers.
https://forums.geforce.com/default/...ce-hotfix-driver-353-38/post/4584713/#4584713
 
Last edited:
Might I inquire as to the nature of the realtime process? Is it CPU heavy? Is it memory bandwidth heavy? Did you try a different OpenGl application and see if it still happens?
 
Might I inquire as to the nature of the realtime process? Is it CPU heavy? Is it memory bandwidth heavy? Did you try a different OpenGl application and see if it still happens?

The realtime process runs at about 20% CPU for it's isolated core. But I replaced everything in the control loop with a usleep between 100 and 500 us, so that there's just a usleep and then the higher level clock_nanosleep(), and the overruns are still persistent.

Just opening a web browser or doing anything with a GUI window can causes overruns, so it seems like the culprit has something to do with the interaction between the graphics driver and the windows manager. Are you suspecting the OpenGL application is buggy?

Thanks
 
I know recent Nvidia drivers have had "timing" issues and also affect Quadro cards as well. A hotfix was released (driver ver. 353.38) a day ago that seems to resolve the issue, but not sure there is a Quadro hotfix yet. The official version (which will include a Quadro version) should be released shortly (I think they said within a week).

If doing any serious troubleshooting I'd definitely wait for the hotfix driver. Quadro user below has the issue ...

https://forums.geforce.com/default/...ce-hotfix-driver-353-38/post/4584713/#4584713

That'd be pretty convenient. The hotfix is Windows only currently. I guess I'll just have to wait to try it out.

Thanks
 
Back in the day of AGP graphics accelerators, graphics drivers sometimes hogged the bus to such an extent (to improve benchmarks) that sound buffer overruns occurred.

It might be something similar happening here, in the chase for additional frames per second, CPU processes get displaced.
 
Back in the day of AGP graphics accelerators, graphics drivers sometimes hogged the bus to such an extent (to improve benchmarks) that sound buffer overruns occurred.

It might be something similar happening here, in the chase for additional frames per second, CPU processes get displaced.

I wondered if this was possibly happening, I just don't know enough about the details. If this is happening, what options are there besides switching graphics drivers or waiting for an update that fixes this intentional "issue"?
 
Back in the day of AGP graphics accelerators, graphics drivers sometimes hogged the bus to such an extent (to improve benchmarks) that sound buffer overruns occurred.

Ah yes... I remember those days. Need for Speed 3 (iirc...) was a majorly problematic here. Back then, I always though my CPU was too slow to handle "awesome graphics and sound". But that explanation makes way more sense now (nearly 20 years later and having studied CS in the meantime^^)
 
I do noticed some Linux Quadro drivers were released June 23, ver. 346.82. Not sure if you tried these but the changelog is below:

  • Fixed a bug in nvidia-settings that caused the application to crash when saving the EDID to a file.
  • Fixed a bug that prevented the "mkprecompiled" utility included in the driver package from reading files correctly.
  • Fixed a bug that could cause an Xid error when terminating a video playback application using the overlay presentation queue in VDPAU.
  • Updated nvidia-installer to avoid recursing too deeply into kernel source trees under /usr/lib/modules, mirroring an existing restriction on recursion under /lib/modules.
  • Fixed a rare deadlock condition when running applications that use OpenGL in multiple threads on a Quadro GPU.
  • Fixed a bug which caused truncation of the EGLAttribEXT value returned by eglQueryDeviceAttribEXT() on 64-bit systems.
  • Fixed a kernel memory leak that occurred when looping hardware-accelerated video decoding with VDPAU on Maxwell-based GPUs.
  • Fixed a bug that caused the X server to crash if a RandR 1.4 output provided by a Sink Output provider was selected as the primary output on X.Org xserver 1.17 and higher.
  • Fixed a bug that caused waiting on X Sync Fence objects in OpenGL to hang indefinitely in some cases.
  • Fixed a bug that prevented OpenGL from properly recovering from hardware errors or sync object waits that had timed out.
http://www.nvidia.com/download/driverResults.aspx/86821/en-us
 
usleep between 100 and 500 us, so that there's just a usleep and then the higher level clock_nanosleep()
Maybe its a timer/measurement issue? Or maybe the drivers have some interaction with timer code. Just guessing at this point. Have you tried doing actual work in a loop and seeing how much work gets done in a fixed amount of time with both driver sets?

Just opening a web browser
On the windows side both firefox and chrome and maybe opera (now based on chrome/chromium) are gpu accelerated (IIRC for compositing). I think, don't quote me but IIRC they use opengl on linux. Look into it I guess.

or doing anything with a GUI window can causes overruns
Are you using wayland I seem to recall that it might use opengl, I'm pretty sure compiz uses opengl. Try different windowed apps, one that is only text and widgets and see if you still get the overruns. edit - IIRC LXDE doesn't use a compositing manager by default, you might want to look into that. I heard XFCE might be the same as well.

Are you suspecting the OpenGL application is buggy?
Buggy or maybe just the way it is written is heavy on the CPU side of resources.

One last avenue to look into is if the driver has optional components you don't have to install or can uninstall. If you're lucky it's not the main drivers causing you problems. I will say I find it hard to believe the drivers cause problems on all the cores (especially with the number you have)... if so maybe pharma is right and it'll be fixed in a new driver version. I don't really know enough about linux to really help you in depth.
edit - If I were you I'd post on a linux board, one specializing in the low latency kernel you're using preferably.
 
Last edited:
Amusingly we have some nv dll that does nasty things and parasite an app that doesn't do any 3D rendering, making it totally unusable, it seems some functions have been overriden or such.
I don't remember the details atm, but we found out it last week.

Using another version of the gfx drivers of the video card solved the problem, but it's still odd they can have an impact on an app that doesn't do any 3D rendering.
(Although with composite desktop I can guess that might happen.)
 
Some nvidia drivers raise the DPC latency and the input lag in games, have had a recent thread on AT and geforce forums.
 
Back
Top