View Full Version : HyperThreading: Overrated for Games?
Hi,
I was just wondering what you guys think of HT. Am I wright when saying HT is just overkill for games most of the time?
Most games run one thread which works very nice for all of them. An extra thread could be usefull if you have an extra processor. With HT, you only use unused cycles to run other threads which is of no use for most games as they normally use all cycles they can get their hands on.
So my question: Is HT just overkill for games?
Jurgen
HT might actually be pretty decent for games. Though, I'm sure you'd have to code for it. That is to say, you could potentially have situations where graphics and physics are chewing up the FPU/vector units while the ALU is handling game and AI logic.
Again, the question is what the returns would be like. The thing is that's if the P4 happens to be running said threads at the same time. That is controlled by the OS and in which case, I don't think there is any way you can hint at which threads should run simultaneously.
As a game developer you can pretty much count on having 99% of CPU time, so if you have 2 active threads you can be fairly certain they will be running.
Of course writing coarse grained parallel code is a drag, not something which compilers can help with, so most developers will want to have a big installed base of HT processors available before they go to the trouble.
Crusher
15-Nov-2002, 20:43
You've never listened to MP3's while gaming?
97% then, point being that at any given time while playing a game the game will almost certainly have the only active threads.
You've never listened to MP3's while gaming?
sure, but Winamp2 takes 0-1% CPU time so that isnt a problem
Encoding DivX and playing a game would be an interesting test
LeStoffer
16-Nov-2002, 12:06
I think the main benefit will be that HT in some cases can help out when some of those OS background tasks pop up.
If we're lucky it could reduce stuttering in games and level average FPS more out so we get fewer sudden dips in framerate.
It'll be interesting to see some real world gaming experiences with HT (and maybe even running on a Granite Bay chipset! :wink: )
I agree with LeStoffer. The first set of benchmarks I saw a month back or so only measured synthetic tasks or were the typical benchmarks. They didn't talk about responsiveness. With HT enabled, most showed lower performance. They only measured upper bounds maximums. What would be nice to see is a histogram or graph of frames over time. I'm thinking that the highs may not be as high, but the lows should be higher. Overall performance should be smoother. I'd happily settle for that.
--|BRiT|
I would say that just about all problems with smoothness in games, if you have your computer reasonably configured, comes from the game engine ... HT can give a performance boost, but it wont make it any easier for developers to maintain an even framerate.
Tagrineth
17-Nov-2002, 05:42
Well P4 can only issue three instructions per thread per cycle... so HT could allow developers to issue a full four instructions. :) That would probably give a somewhat good performance increase...
Well P4 can only issue three instructions per thread per cycle... so HT could allow developers to issue a full four instructions. That would probably give a somewhat good performance increase...
How are you working that out and where did you get your information?
Tagrineth
18-Nov-2002, 00:42
Well P4 can only issue three instructions per thread per cycle... so HT could allow developers to issue a full four instructions. That would probably give a somewhat good performance increase...
How are you working that out and where did you get your information?
It's common knowledge at this point.
Oh, and a few Intel docs.
arjan de lumens
18-Nov-2002, 01:55
According to the Intel documentation (http://developer.intel.com/technology/itj/2002/volume06issue01/art01_hyper/p05_front_end.htm) I could find, the trace cache can, when doing hyper-threading, only supply one of the running threads with instructions during any given clock cycle. Also, the trace cache is limited to 3 instructions per clock total.
arjan de lumens,
Thanks, that's what I thought.
Tagrineth,
As I was eluding to before, only one thread can have it's instructions decoded by the trace cache which decodes 6 instructions every other clock, then it goes to the reorder buffer. From there, the uOPs are issued, which means uOPs from either thread can be issued.
As I was eluding to before, only one thread can have it's instructions decoded by the trace cache which decodes 6 instructions every other clock, then it goes to the reorder buffer.
Nitpick: The instructions in the trace cache are already decoded (that's the purpose of the trace cache). But you're right about it being able to fetch and issue 6 instructions every other cycle.
Cheers
Gubbi
Most current games won't benefit, but I'm curious if Dungeon Siege will. I've got a dual proc system (P3 733) and Dungeon Siege only gains a couple of frames per second in the benchmark, but watching the benchmark shows considerably less studdering. I'm curious how HT looks. Although 3GHz is probably fast enough to not studder regardless so it probably doesn't matter.
The instructions in the trace cache are already decoded (that's the purpose of the trace cache). But you're right about it being able to fetch and issue 6 instructions every other cycle.
Either, I don't understand the trace cache properly or you misunderstood my statement.
Doesn't the trace cache basically store uOPs from previously decoded x86 instructions, as x86 instructions come in their uOP equivalent are retrived. This is done rather than putting down lots of ugly decoder logic in the chip?
The instructions in the trace cache are already decoded (that's the purpose of the trace cache). But you're right about it being able to fetch and issue 6 instructions every other cycle.
Either, I don't understand the trace cache properly or you misunderstood my statement.
Doesn't the trace cache basically store uOPs from previously decoded x86 instructions, as x86 instructions come in their uOP equivalent are retrived. This is done rather than putting down lots of ugly decoder logic in the chip?
Ok, but I was thrown by your statement that said the tracecache decodes the instructions (up to 6 at a time). It doesn't. The tracecache only stores the decoded instructions. The decoder stage sits in front of the trace cache and decodes instructions from the lvl 2 cache, and then stores the decoded instructions in the tracecache (uOps, nothing micro about them though, with 6 times expansion over regular instructions).
Un carefully scheduled code (like games), I think HT will gain little, in particular because of trace cache pressure, the 96KB is only equivalent to 12 KB normal I-cache, having two threads with no spatial locality running out of it seems to be pushing things.
Cheers
Gubbi
Most current games won't benefit, but I'm curious if Dungeon Siege will. I've got a dual proc system (P3 733) and Dungeon Siege only gains a couple of frames per second in the benchmark, but watching the benchmark shows considerably less studdering. I'm curious how HT looks. Although 3GHz is probably fast enough to not studder regardless so it probably doesn't matter.
At home right now I'm running a single Athlon 2000+ (1666MHz), and with the last two 3D cards I've owned, a GF4 Ti4600 (now the wife's) and my current Radeon 9700 Pro, I've never experienced stutter in the game (or any 3D games, actually.) I did note that between the cards my peak frame rate with the GF4 was about 45 fps--with the 9700P it's about 75fps. (Note that this is without FSAA and AF for the GF4--with the 9700 Pro I can get the same frame rates with 4x FSAA and 16x AF @ 1024x768x32. However, I actually played the game in 1280x1024 x2FSAA, 16xAF with the 9700 Pro--have to edit the config files to do so.) So while cpu bound, it's nowhere near as cpu bound as I originally thought. I don't think faster cpu processing will address your stuttering problems with DungeonSiege. Need to look in other areas.
http://www.flickerdown.com/phpBB2/viewtopic.php?t=2303
http://www.pcworld.com/news/article/0,aid,107492,00.asp
Some info against HT.
vBulletin® v3.8.6, Copyright ©2000-2013, Jelsoft Enterprises Ltd.