dual core affinity performace gains

RFC Rudel

Newcomer
I finish my research on CPU affinity performance for games on dual core or dual CPU systems.

I use windows w2k3 enterprise because is the only available OS that can run the windows resource manager. http://www.microsoft.com/windowsserver2003/downloads/wsrm.mspx

To check the process affinity and CPU usage I use sysinternals process explorer
http://www.sysinternals.com/Utilities/ProcessExplorer.html


The windows resource manager allows you to make affinity policies for process and services.

EJ: you configure WRM to run all the OS on CPU0 and the .exe of your games or other single treaded apps on CPU1.

The WRM works as a service so if you have the service on manual your OS starts like the average Joe dual system and all of your games and app run on both cores or wherever the OS tread scheduler thinks they should run.

The WRM do not allow you to manage the OS process, but a quick registry edit cleaning the System Exclusion List you could make one of your cores totally empty and ready for your CPU hungry single treaded apps/games.

I chose to make CPU1 free because CPU0 is the default for several things.


Performance Gains

I use PI calculations, Sandra, 3dmark 2003&2005, CPU only benchmarks etc.

On PI calculations or any other single treaded CPU benchmark if you make it run with the WRM properly configured you get 5% or less gains in performance, but your OS stays totally responsive because the tread of the bench app do not interfere with all the OS and user process. (the score of single treaded is half the score of a dual cpu capable app.)

3dmark give some 5% or less.

In games is a totally different story.

If the game is pretty much VGA limited (like BF2) your gains will be on parts where the CPU became the limit, I get more max FPS number but my average FPS gains in BF2 are 8/10% at the most.

In BF2 I pass from a max 80FPS to 92FPS, as I said before on games that are seriously VGA limited the gains are limited but they exist.

On HL2 my x800xl is not too much of a bottleneck so in normal mode (no affinity optimizations) I put Dr freeman steady at 150FPS and when I start the WRM and the HL2 run alone on CPU1 my FPS jump to 182!!!

I use a creative X-FI so in theory the board unloads my CPU from audio calculations; I think that people with onboard audio should get better results.

UPDATE
using WRM and my X-FI the unused core (cpu0) uses 10% cpu, I uninstalled my X-FI and use the onboard sound and configure the bf2 tu use software audio and high quality.

the unused core went to 15/20 with spikes of 25% of CPU usage!!!! and the audio quality have no inpact on FPS when runing with the WRM.

this shows that the X-fi really work and the redirection of all OS operatios to CPU0 have clearly advantage for onboard audio users.


IRQ

I use a resource kit utility to redirect all IRQ to CPU0 but I was unable to get a tangible performance gain.


Application Needed


I change the WRM .msi to run on XP but it fails to install the WRM service.
The only way to manage OS process affinity in XP is using sysinternals process explorer, but is not automatic.

I think that a application is needed because it is a pain to set up all the affinities by hand not to mention that there is a IRQ and tread process priority performace gains not deeply explored.

I found that NCR SMP Utilization Manager very awkward but has some nice features.
http://www.ncr.com/support/pcfiles/Utility/NTTOOLS/SMPUT200.EXE


So far the WRM is the best to manage process affinity.


WE NEED SOMEONE TO MAKE A APP LIKE THE WRM FOR XP.

please if you want to recreate this feel free to ask.


my machine
LANPARTY UT nF4 SLI-DR
AMD 3800 Dual core@2.8Ghz
Thermalrigh si-120
2x1G Ram TCCC
ATI X800XL
Creative X-FI EM
3x sata2 hitachi 80GB Raid0
1 WD 80GB
Nec DVD-R
Samsung DVD-CDR combo
Thernaltake Armour
Thermaltake 680W PSU
Aerocool Coolpanel (Front Panel)
Windows 2003 enterprise
 
Also I was thinking...with dual dual-core machines now coming out...do you think we are reaching a point of diminishing returns? Do you think XP or Win 2k3 is ok in handlinig 2 x 2 cores? Will we honestly start seeing better performance with the increase in the number of cores? I think it is quite dependent on how the software is written. I am guessing 2 cores is the max to which most softwares can work and exhibit performance increases especially since Intel has HT in it which makes the software think that it is running on a multi processor setup. I think having a computer with 2 dual cores is not a sound idea especially right now.
 
This may be of interest

"The WSRM CD will install the administrative client on any 32-bit version of Windows newer than or equal to Windows 2000 Service Pack 3 and any 64-bit version of Windows at the SP1 level or higher."
 
YeuEmMaiMai said:
This may be of interest

"The WSRM CD will install the administrative client on any 32-bit version of Windows newer than or equal to Windows 2000 Service Pack 3 and any 64-bit version of Windows at the SP1 level or higher."


the administrative client is for administer the computer that have the WSRM service.


The WSRM is an aplication to shows that in the corrent state of OS and single treaded games the dual cores needs more than the OS suckbalancing the treads.
 
suryad said:
Also I was thinking...with dual dual-core machines now coming out...do you think we are reaching a point of diminishing returns? Do you think XP or Win 2k3 is ok in handlinig 2 x 2 cores? Will we honestly start seeing better performance with the increase in the number of cores? I think it is quite dependent on how the software is written. I am guessing 2 cores is the max to which most softwares can work and exhibit performance increases especially since Intel has HT in it which makes the software think that it is running on a multi processor setup. I think having a computer with 2 dual cores is not a sound idea especially right now.


The problem lies in that the windows NT kernel has been developed, since 3.1, with SMP designed for muti-threaded applications that are less demanding on actual CPU power rather than file and ram I/O power. Database applications, Application Serving, and Media Serving is where the Windows Kernel shines at SMP. Games are a different breed of applications. In theory SMP is just SMP but the demands that games press on CPU, calculation algorithms such as AI, Physics and sound, aren't the same as Database Bubble Sorts. AMD is not to blame here I think it is a case of the CPUs demanding more than the OS can handle reliably. Same thing happened when we moved 1ghz and off of Windows 98/ME. The platform was not built to scale with the hardware for the types of apps being used and system reliability was iffy at best.
 
Can you check the context switches variables in process explorer?

In theory, the scheduler should keep them really low when a single threaded app is given a full core to play with, but the threads will still be going through the selection process.
OTOH, if they're similar in both test cases, what you're experiencing is somewhat like an increase of the process priority because there really isn't anything else to run in CPU.

IIRC the cache is completely shared so that cross cpu context switching doesn't trash it, right? (not sure about that one)

You also have to keep in mind that the quanta for threads in XP-server is higher than in the desktop variants to prioritize threads' responsiveness over pure number-crunching. Results may be different on both kernels depending on the context switch frequency and cost.

Nevertheless, very interesting test and results... ;)
 
So Rudel you are suggesting then XP has its limitations as well just as 98/ME did. Well this time it is not with processors but the number of processors? This is of massive interest for me because i am building a DCC and instead of going through a vendor I am deciding to build it from scratch. I have been drooling over the Tyan Thunder K8WE and it has support for 2 CPU sockets where we can stick in 2 AMD Opteron 280 procs. This is why I posed my question...because if XP and other software applications are not able to take advantage of all those processors, then maybe it would be wise to just stick with dual proc or look to a different OS maybe? Maybe wait till OS X becomes available on AMD platforms? Problem is I do not know of any good software for editing and doing other graphics intensive stuff in Linux.
 
t0y said:
IIRC the cache is completely shared so that cross cpu context switching doesn't trash it, right? (not sure about that one)

Each core has 2x64KB L1 (D$ and I$) and 1MB (or ½MB) L2 cache. Only the core that the cache is attached to can demand load it (fill it). When a process is scheduled to another core, all loads will miss initially, these misses will be served by the caches of the first core (or the main memory subsystem). This causes the performance loss that is experienced by the first poster.

IMO, Win XP's scheduler is broken, plain and simple.

Cheers
Gubbi
 
suryad said:
So Rudel you are suggesting then XP has its limitations as well just as 98/ME did. Well this time it is not with processors but the number of processors? This is of massive interest for me because i am building a DCC and instead of going through a vendor I am deciding to build it from scratch. I have been drooling over the Tyan Thunder K8WE and it has support for 2 CPU sockets where we can stick in 2 AMD Opteron 280 procs. This is why I posed my question...because if XP and other software applications are not able to take advantage of all those processors, then maybe it would be wise to just stick with dual proc or look to a different OS maybe? Maybe wait till OS X becomes available on AMD platforms? Problem is I do not know of any good software for editing and doing other graphics intensive stuff in Linux.


The XP wil support SMP , but for aplications that use only one cpu the XP/w2k3 try to balance the load, I use a second monitor to run task manager duringn testing and the load balancing really sucks.

Soemtimes you have 100% spikes in each procesor and sometimes the game run on only one cpu.
Why the tread sheduler make those decisions is a mistery.

Nothing beats an empty core tu run youe single treaded APP/Game.
 
Another OS that would be great if you could do it would be MS XP x64. Apparently it is a better performer than 32 bit XP. Though I fail to understand why. MS just used the base code from Windows 2k3...and created the 64 bit OS....
 
Pardon my ignorance, but is this the reason single threaded apps seem to use 50% of both CPUs (on task manager's performance graph)? I've seen this on my dual core Opteron and have wondered why that happens. Obviously it's not really utilizing SMP if it's only using 50% of each.
 
suryad said:
That makes sense Rudel. It would be great if you could do a similar thing with Linux.
You can set cpu affinity in linux since kernel 2.5-something days. That said, I don't know if it's really useful for that, as it looks like the kernel tries to avoid migrating processes between cpus anyway, and might potentially do a much better job than what windows does (just speculation, feel free to flame me...).
 
swaaye said:
Pardon my ignorance, but is this the reason single threaded apps seem to use 50% of both CPUs (on task manager's performance graph)? I've seen this on my dual core Opteron and have wondered why that happens. Obviously it's not really utilizing SMP if it's only using 50% of each.
yes, actually by etting affinity on such task you can gain few (2%-5%) percent speed up. I think in the thread posted by me above (where Diplo posted the tool) there were some experiments .
 
mczak said:
You can set cpu affinity in linux since kernel 2.5-something days. That said, I don't know if it's really useful for that, as it looks like the kernel tries to avoid migrating processes between cpus anyway, and might potentially do a much better job than what windows does (just speculation, feel free to flame me...).
Well, Linux does migrate processes between the CPU's, but it waits much longer than Windows does (on the order of a minute).
 
Back
Top