Bring back high performance single core CPUs already!

with that specific game, yes. There are other engines that scale better with multiple cores.
Yes, you are correct. I chose to drag Skyrim in here as a game that had previously demonstrated scaling beyond 4 cores, and someone rightfully asked if that held true after all the recent patching.

I felt it necessary to properly answer the question, and the answer was generally "no", scaling did NOT hold true after the newer patches, at least when playing using graphics settings that the ultra-enthusiast is probably going to use. I guess you could say that I was doing the proper due diligence to either support or refute my claim, and it kinda went 50/50 for me :D

Negatives; six-core scaling appears to be zero (or perhaps even slightly negative?) Meh.
Positives: it still needs a minimum of two cores to be playable, preferably four.

If I can figure out how to get vsync turned off this weekend, I'll re-run all these tests without the FPS cap. I think we may still find some additional data lurking under there...
 
I find Skyrim to run ok on a dual core. I know some kids who've put hundreds of hours into it on old low-clocked Core 2 Duo CPUs with a 3850 and a 4670. Granted they play at 1360x768 / 1280x1024 and they settle for medium detail. Also, the fixed compiler settings made nice gains for old CPUs.

The game engine seems to scale similarly to Oblivion with core count (ie, maxes out with 2 cores essentially). I'm sure you wouldn't want to play on a middling Athlon 64 X2 or a Pentium D. The graphics are probably the most demanding aspect compared to Oblivion, and you really want at least a 4850 / GTX 260. An 8800GT tears up Oblivion - not so with Skyrim.
 
Yes, you are correct. I chose to drag Skyrim in here as a game that had previously demonstrated scaling beyond 4 cores, and someone rightfully asked if that held true after all the recent patching.

I felt it necessary to properly answer the question, and the answer was generally "no", scaling did NOT hold true after the newer patches, at least when playing using graphics settings that the ultra-enthusiast is probably going to use. I guess you could say that I was doing the proper due diligence to either support or refute my claim, and it kinda went 50/50 for me :D

Negatives; six-core scaling appears to be zero (or perhaps even slightly negative?) Meh.
Positives: it still needs a minimum of two cores to be playable, preferably four.

If I can figure out how to get vsync turned off this weekend, I'll re-run all these tests without the FPS cap. I think we may still find some additional data lurking under there...

Skyrim has an internal 64hz limitation. This limit coupled with the usual 59/60hz refresh rate is the reason for the odd and occasional studdering (not stuttering) where the game appears to skip frames while displaying "60fps" on your favourite fps counter.

There are some options for getting around it but most of them have the side effect of throwing the physics out of whack (even more than normal for TES).

Anyway, great machine and thanks for your contribution. It also matches my anecdotal evidence: makes use of multiple cores, doesn't use up all my 8 threads. Do you play at uGrids=7? I play at 9 and have a mere i7 2700k + 8gb + R6970 @ 1080p max details + ini tweaks. You ought to be able to raise it to 11 (started crashing for me). Can you indulge me a little bit more and try setting your shadow buffers to 8192? It's playable for me and it noticeably increases the quality since I'm using "real-time" shadow updates (0 delay, 0 interval) but it means an average fps of 30 instead of 60.
 
I find Skyrim to run ok on a dual core. I know some kids who've put hundreds of hours into it on old low-clocked Core 2 Duo CPUs with a 3850 and a 4670. Granted they play at 1360x768 / 1280x1024 and they settle for medium detail. Also, the fixed compiler settings made nice gains for old CPUs.
It is my observation that there is still performance left on the table if you're only using a dual core. I've severely bottlenecked my rig on the GPU by my excessive use of SSAA and high res. If I flip to MSAA, there is a measurable change between 2c / 2t and 2c / 4t or similarly 4c / 4t. But this is not to say that you couldn't play (and enjoy) Skyrim on a truly dual core rig, especially if you're at lower settings.

Skyrim has an internal 64hz limitation. This limit coupled with the usual 59/60hz refresh rate is the reason for the odd and occasional studdering (not stuttering) where the game appears to skip frames while displaying "60fps" on your favourite fps counter
Ah, that makes more sense now. I hadn't done any research on it yet to discover why....

Do you play at uGrids=7? I play at 9 and have a mere i7 2700k + 8gb + R6970 @ 1080p max details + ini tweaks.
Yeah, the game settings that I used are my normal play settings, to include the SSAA as there's a lot of shader aliasing in this game and it drives me nuts. I think I tried ugrids=9 a while back, but I encountered performance issues on my old Q9450 + 5850 rig. Makes sense to go back, so I'll check it out.

Can you indulge me a little bit more and try setting your shadow buffers to 8192? It's playable for me and it noticeably increases the quality since I'm using "real-time" shadow updates (0 delay, 0 interval) but it means an average fps of 30 instead of 60.
Yeah, I can do that. Maybe I'll run 1024, 2048, 4096, 8192 buffers through a few core options. Need to find a place with shadows all over the place, but that's not hard. Do you recall the settings for 'real-time' shadow updates? If not, I'm sure a small bit of time on Das Google will get me straight.
 
Last edited by a moderator:
Regardless, I did some testing on my rig with Skyrim as I said I would. First, the machine specs:
CPU: i7-3930k (24 x 125 = 3.0GHz, 36 x 125 = 4.5GHz)
MB: Intel DX79SI Firmware 430
RAM: Mushkin Redline Enhanced 8 x 4Gb 8-9-8-24-1T @ 1666MHz
VID: Sapphire 7970 OC edition (1150/1575, fastest it will do at stock volts)
DISK: Highpoint 2720SGL SAS PCI-E 8x card with 6 x 240GB OCZ Agility 3 SSD's
OS: Win7 64-bit Ultimate

Very interesting and thanks for running that. So basically for an enthusiast who overclocks, looking at your 4.5 ghz numbers, all you need is a dual core without HT to just about max performance in game at least with that OC'd video card. For non-overclocking situations then a 2 core with 4 threads seems optimal.

With 2 cores and 4 threads there's no difference. Moving to 4 cores with 4 threads gives you ~1.7% more perf. Moving to 4 cores with 8 threads gives your max perf. bump with ~4.2%.

That last setting is a bit weird as higher core/thread counts revert back to the 4 core 4 thread speed. I wonder if the CPU is throttling at those higher core counts due to increased heat generation from more cores being active.

But it basically reinforces what Carsten was saying about him not really needing to move up from his dual core CPU, assuming he can reach high enough clockspeeds.

Regards,
SB
 
That last setting is a bit weird as higher core/thread counts revert back to the 4 core 4 thread speed. I wonder if the CPU is throttling at those higher core counts due to increased heat generation from more cores being active.
I want to stress again, we're hitting a GPU bottleneck around 47FPS with my SSAA usage. There is still performance on the table after 2c/2t if you're not choking your card to this degree.

Also, the CPU temperature peaked 56*c during all this exercise, and "turbo" is effectively null on the K and X series CPU's when you overclock them. It operates at 4.5Ghz under *any* load, although it will still idle at 1.5Ghz if you're not doing anything.
 
I want to stress again, we're hitting a GPU bottleneck around 47FPS with my SSAA usage. There is still performance on the table after 2c/2t if you're not choking your card to this degree.

Also, the CPU temperature peaked 56*c during all this exercise, and "turbo" is effectively null on the K and X series CPU's when you overclock them. It operates at 4.5Ghz under *any* load, although it will still idle at 1.5Ghz if you're not doing anything.

On my K series CPU, turbo still works fine. I think support for it is entirely up to the motherboard vendor. On my motherboard when I set overclock speeds I set speeds for 1 core active, 2 core active, etc... So I can have very high single core speeds without being limited to the max OC for 4 cores active.

But anyway, back to that. Yes. That's true so obviously not throttling on the CPU. But still hits the limit where an enthusiasts GPU is going to limit any potential benefits of more than 2 cores.

And yes, as I said before there are games that are an exception to that. And as sebbbi mentioned, it's quite likely a byproduct of the majority of AAA PC games being ports from console.

So when the next generation of consoles hit, hopefully we'll also see better use of multiple cores in the PC space.

Also, that isn't to say that more than 2 cores aren't useful in non-gaming situations. :) You don't have to preach to me about multi-core. I was using dual CPU's in the desktop space all the way back when the Celeron 300A could not only overclock from 300 to 450 mhz reliably, but also worked with certain server motherboards sporting dual sockets. :) I've been a convert ever since.

Regards,
SB
 
On my K series CPU, turbo still works fine. I think support for it is entirely up to the motherboard vendor. On my motherboard when I set overclock speeds I set speeds for 1 core active, 2 core active, etc... So I can have very high single core speeds without being limited to the max OC for 4 cores active.
Oh yeah, I guess I forgot about that pain in the ass ;) The Intel DX79Si does allow for that kind of tweaking for six cores, but doing that per-core for a six core rig sucks. I just flipped the bit to allow one turbo multiplier for everything, and set it to 36. There is also a 'non-turbo' multiplier which is still set to 32, and then the processor will still continue to downclock below that point if you aren't using it.

Actually, by capping the "All Turbo" multiplier to 24, it limits the entire CPU to 3Ghz even though the "non turbo" multiplier is still set at 32. I didn't expect it to work that way...
 
Yeah, I can do that. Maybe I'll run 1024, 2048, 4096, 8192 buffers through a few core options. Need to find a place with shadows all over the place, but that's not hard. Do you recall the settings for 'real-time' shadow updates? If not, I'm sure a small bit of time on Das Google will get me straight.

The worst perf hit with 8k buffers happened in the dwemer ruins of Markarth. But you can see the difference immediately with shadows in a town.

4K:
89491A8D3521ACDDAAB68C49070C1A029940B0F7


8K:
3D8E826E8FDABBB447B5536BC563661E3A22F659


For having the sun shadows update continuously you need to edit these lines on your skyrim.ini (not skyrimprefs.ini)

Code:
fSunShadowUpdateTime=0.000 
fSunUpdateThreshold=0.000

It's a little weird at first.
 
Unfortunately most games are mainly designed for consoles, and do not properly scale up on PC. It's very easy to draw wrong conclusions by using modern multicore PC CPUs to run game code that is designed for ancient (7 year old) in-order console CPUs.

1.6 GHz 17W Sandy Bridge is (considerably) more powerful than an old 6 thread in-order PPC CPU at 3.2 GHz. Basically if you are running a direct port designed originally for consoles, a high end Sandy Bridge could execute all the six threads sequentically every frame using just a single core, and still hit the required frame rate. And that's why you don't see any scaling when you add more cores, even if the game is programmed to utilize up to six of them.

If you want to properly test multithread scaling of games, you should get an entry level CPU with lots of cores/threads. For example a 4 thread ATOM or a lowest clocked 6 core Phenom. Of even better downclock a 8 core Bulldozer to less than 1 GHz. The scaling will be much better, and you will see huge gains by enabling extra cores.

Cheers sebbbi, that's the kind of post I come here for, great insight into the relative performance of those CPU's
 
@Davros: Netburst was a very interesting CPU architecture. Not a good one, but very interesting. Pushed to the limits, 4Ghz base clock, it was running internally at an amazing 8Ghz frequency(!!). Problem is, the delta gained with an higher base clock (something like 33% if I remember well) was lost due to the compromises (32 stages pipeline) required to push such clock. AMD has done the same with BD for raising its clock, bringing its pipeline to more or less to the same length of the original netburst architecture (around 20-23). A risky choice, considered its precedent, at least (but BD sucks hard because of the shared decoder, anyway).

@Albuquerque: you missed my point. In order to discuss on the same basis using a complex toy like Skyrim, you should be able to:
* be able to isolate the multitasking parts in Skyrim (usually sound, AI, script engine).
* be able to isolate the memory/cache subsystem impact.
* analyse the % of the multithreaded work (i.e. the single-thread part of the rendering engine + the time spent multithreading i.e. the parallel octree descenents for occlusion+ the syncro/issue time spent for threads).

I was referring to the boundaries you get when trying to maximize the performance of an application, if you want to measure them you can just write a simple INT app that uses a #pragma parallel for in order to issue a % of its work to threads (and inlining prefetchx's!). There you can see 'in clean' such data - Skyrim (or win bootup!!) is just too complex to do it, unless you can comply with the points above...

The chart I attached implies that the benefits obtained in a multithreaded application more than linearly decrease with the core number due to a number of factors, which is probably why intel didnt get out with a 12+HT cores CPU for the consumer market..
So, once the benefits of adding multiple cores scale down to minimal values -at that point any IPC increase can affect on average the system speed more than adding another core.

In a sense, it can apply to GPU also, when speeding up the clock can result in a better performance than adding more cores, if the time wasted for scheduling/issuing the additional work to the added cores eats too many of their advantages.
 
@Davros: Netburst was a very interesting CPU architecture. Not a good one, but very interesting. Pushed to the limits, 4Ghz base clock, it was running internally at an amazing 8Ghz frequency(!!).

Ahh, now i understand where your getting confused , the alu's were double pumped so at 4ghz they were running at 8ghz effective not actuall, they were still clocked at 4ghz
like ddr 200 doent actually run at 200mhz its 100mhz. but because it deals with 2 lots of data per cycle its the equivalent of sdr running twice as fast.
 
Ahh, now i understand where your getting confused , the alu's were double pumped so at 4ghz they were running at 8ghz effective not actuall, they were still clocked at 4ghz
like ddr 200 doent actually run at 200mhz its 100mhz. but because it deals with 2 lots of data per cycle its the equivalent of sdr running twice as fast.

No you are wrong, sorry.
Dual-pump DDR is just a way of transferring more data in the same wave, and has nothing to do with it.
Netburst ALU was running at double frequence - let me quote you the IA arch manual on my desk:
"Netburst... Arithmetic Logic Units (ALUs) run at twice the processor frequency", Vol1, 2-7.
 
Sebbi brought up a point that I hadn't considered -- low power (ie, low speed) processors that try to 'make up for it' by having more cores; do they succeed? My next batch of testing now has a 1.5Ghz speed to try and test that out. I also liked Richard's 8192 shadowmap resolution, but I couldn't get uGridsToShow=9 to be stable... So I went for uGridsToShow=11 :D Don't ask me why the higher one worked and the lower one didn't...

I also turned of SSAA (so I'm only using 4xMSAA + FXAA now) for this group of tests, to leave a bit more room for the CPU to show us what's going on. Besides, the hardest-core enthusiasts would probably trade off my love for SSAA and go back to MSAA to get their framerate into the 60's.

Here are the pertinent changes to Skyrim.ini:
Code:
[General]
uExterior Cell Buffer=144
uGridsToLoad=11
iPreloadSizeLimit=126877696

[Display]
iShadowMapResolutionPrimary=8192
fSunShadowUpdateTime=0.000 
fSunUpdateThreshold=0.000
And here are the pertinent changes to SkryimPrefs.ini:
Code:
[Display]
iShadowMapResolutionSecondary=8192
iShadowMapResolutionPrimary=8192

I also added a new 'cave' location, actually a chunk of the ruins under Markarth. It's almost purely fillrate limited, as it's just an active shadow cast against an otherwise static backdrop. I put this in here to see if the CPU could bottleneck even something as 'simple' as this scene...
ScreenShot46.png


And here are the results:
Code:
c/t	Ghz	City	Cave
----------------------------
6/12	1.5	29.5	59.1
	3.0	59.1	59.1
	4.5	59.1	59.1
	
6/6	1.5	29.5	59.1
	3.0	58.1	59.1
	4.5	59.1	59.1

4/8	1.5	29.5	59.1
	3.0	58.1	59.1
	4.5	59.1	59.1

4/4	1.5	29.5	59.1
	3.0	58.1	59.1
	4.5	59.1	59.1

2/4	1.5	23.3	59.1
	3.0	50.1	59.1
	4.5	59.1	59.1

2/2	1.5	16.5	59.1
	3.0	40.5	59.1	
	4.5	59.1	59.1

1/2	1.5	15.5	43.3
	3.0	32.0	59.1
	4.5	52.2	59.1

1/1	1.5	10.5	32.5
	3.0	24.5	58.1
	4.5	34.4	58.1

Look at the 1.5Ghz data! Sebbi is on to something, I believe :) The "cave" scene shows that even a mostly fillrate limited scene still needs two physical cores, and so does the "City" scene (same one from my first test but now with the enhanced ugrids and shadows) although four threads is best if you're not going to overclock.
 
Very nice! Either my 6970's 2GB or 256bit bus (or both) is the bottleneck for 8K buffers. Thanks for the test, and thanks for some hard numbers on core/thread versus clock scaling.
 
Get Core i7 2600k. Overclock to 5GHz, should be easy with any decent cooling, then disable cores you don´t need. Problem solved ;)
 
Get Core i7 2600k. Overclock to 5GHz, should be easy with any decent cooling, then disable cores you don´t need. Problem solved ;)

You don't even need the i7-2600k; your best bang for the buck is more likely the i5-2500k. Use the extra $80 to buy more video card, or one of the Corsair H80 watercooler setups on sale. Lots of clock without lots of noise :)

The 3930k will do 5Ghz with some VRM cooling, but there's zero reason for me to run it that fast. At the highest settings, I run out of GPU before I run out of CPU.
 
Very nice! Either my 6970's 2GB or 256bit bus (or both) is the bottleneck for 8K buffers.
Can't be the video RAM, if the GPU was redrawing basically its entire on-board memory space each frame there wouldn't be enough bandwidth to maintain even a semi-decent framerate.

Besides, maxing out 2GB is quite hard. A framebuffer at 2560*1440 and 8x MSAA "only" eats 112,5MB, so there's loads of room left.
 
Back
Top