Why does Intel lose in Gaming

Tahir2 said:
We are not talking about per clock, and even then it is debatable whether the A64 is faster per clock at SSE2 code than the Pentium 4 architecture.

Per clock is also irrelevant, the Pentium 4 has more clocks than the A64 as it is.

You have to talk about per-clock performance if you are talking about SSE2.

It is not irrelevant at all. NetBurst is designed to run at high frequencies for a reason and this is it! If it had half the per-clock performance at SSE2 to the K8 design it would not have been interesting to go high frequency. That's the whole point. SSE2 is simple logic that can run at high frequency quite easily, but of course IPC still matters.
 
What is it?
The Pentium 4 Netburst architecture is designed to run at high frequencies at the expense of IPC as Intel thought this was the best way to get increased overall performance.

What you said was that SSE and SSE2 were designed to combat the IPC issue, but since A64 has SSE and SSE2 anyway we are back to square one even if A64's implementation is not as fast as Intel's.

I dont really know what point you are making... or maybe I am not being clear.
 
Tahir2 said:
What is it?
The Pentium 4 Netburst architecture is designed to run at high frequencies at the expense of IPC as Intel thought this was the best way to get increased overall performance.

What you said was that SSE and SSE2 were designed to combat the IPC issue, but since A64 has SSE and SSE2 anyway we are back to square one even if A64's implementation is not as fast as Intel's.

I dont really know what point you are making... or maybe I am not being clear.

I believe the point is Intel gaining more from SSE/SSE2 usage > Intel gains more performance on games > they're more even on games
 
wireframe said:
Are you sure about this? I don't think this is right at all. I think the Athlon 64 actually does a bit better in SSE2 per clock than a Pentium 4.

Well, I don't know about per clock, but that's pretty uninteresting anyway. If a comparable P4 and A64, say a 3.2GHz P4 vs a 3200+, runs the same SSE2 code, you'll probably find that the P4 runs it faster. If you use 3DNow on the A64 however, it will probably be faster.
 
And if you run at 1600x1200 with lots of eye candy, you won't see a lick of meaningful difference between the two.. :)
 
Himself said:
And if you run at 1600x1200 with lots of eye candy, you won't see a lick of meaningful difference between the two.. :)

Not true, P4s often drop down really low out of no where for some weird reason. Also, there is often times meaningful differences between the two. And then there's price to performance which when it comes to gaming AMD owns the shit out of Intel. But you know, that's all not important......
 
I can't believe no one has mentioned this, but they only looked at ONE game! I seriously doubt that most games are the same, and everyone know Battlefield 2 was sloppily coded. I mean there is absolute proof that the Doom 3 engine uses SSE and all that, and I have no doubt that UE3 does too. I'm pretty sure that AMD will still have the advantage there, so it's obviously not the use of SSE that's the problem.

Imho, it's branches. Now the P4 may have a great predictor, but that's not the point. The point is that when the P4 does need to take a branch it's WAY more expensive then on an AMD, and there's your slowdown.
 
DudeMiester said:
I can't believe no one has mentioned this, but they only looked at ONE game! I seriously doubt that most games are the same, and everyone know Battlefield 2 was sloppily coded. I mean there is absolute proof that the Doom 3 engine uses SSE and all that, and I have no doubt that UE3 does too. I'm pretty sure that AMD will still have the advantage there, so it's obviously not the use of SSE that's the problem.

Imho, it's branches. Now the P4 may have a great predictor, but that's not the point. The point is that when the P4 does need to take a branch it's WAY more expensive then on an AMD, and there's your slowdown.
http://techreport.com/reviews/2005q3/athlon64-x2-3800/index.x?pg=4
 
DudeMiester said:
I can't believe no one has mentioned this, but they only looked at ONE game! I seriously doubt that most games are the same, and everyone know Battlefield 2 was sloppily coded. I mean there is absolute proof that the Doom 3 engine uses SSE and all that, and I have no doubt that UE3 does too. I'm pretty sure that AMD will still have the advantage there, so it's obviously not the use of SSE that's the problem.

D3, HL2 and Far Cry are all almost exclusively x87, at least on AMD-64 processors.

DudeMiester said:
Imho, it's branches. Now the P4 may have a great predictor, but that's not the point. The point is that when the P4 does need to take a branch it's WAY more expensive then on an AMD, and there's your slowdown

IMO, it's the memory latency.

Cheers
Gubbi
 
Here are some branch statistics for Doom3, Hell level just before the quad.

Event legend:
c0 is retired instructions (all instructions)
c3 is mispredicted branches
c4 is taken branches
c5 is taken branches, that were mispredicted.

So roughly one instruction in 10 is a branch. And rougly one in seven of these are mispredicted. (actual number is one mispredicted branch per 66 instructions). P4's (Prescott) branch predictor is likely to do better. Even though the mispredict penalty is larger, the overall (real, wall clock) time spent cleaning up after mispredicts is likely to be lower because of the better prediction and the much higher clock (more resources to replay instructions).

Cheers
Gubbi
 
http://www.extremetech.com/article2/0,1558,1900386,00.asp?kc=ETRSS02129TX1K0000532

Article continues

Pretty rough text, this is from a mouth of someone who worked on some fairly high profile MMORPG game:
I personally have been using AMD CPUs since the Thunderbird days. Thus my optimization is flavoured by that CPU. I don't have time to optimize for both AMD and Intel, I hope that an improvement on AMD also will be an improvement for Intel. Obviously we hope for best performance on both platforms, but the last couple of % performance is not that important. Maybe more 'performance-freaks' than myself prefer AMD over Intel? Maybe that's why games get optimized for AMD?

We use the 'Blended' optimization because on AMD that generally gives best performance. We know that G7 is supposed to give better performance on Intel, but as long as we don't see it ourselves (we use AMD remember), we don't use it. We also optimize maximum, with global-opt, max speed, inline functions, no frame-pointers, etc. I certainly hope that is the case for BF2, although I have not had the pleasure of debugging it. And yes, we have played around with vector-operations in assembly, both MMX, 3DNOW, SSE/2. But the overall improvements have been to small to actually ship.

We did not use the SSE(2) switches because older CPU's (Thunderbird etc) don't support it. Although we know Intel performs ~30% better with SSE2, for AMD there is zero difference (in the case of scalar operations). We don't have time to test and ship two versions of the game
 
There would also be a lot of questions why a 3DNow path wasnt created. Either way, its much easier for the game developer to go for middle of the road performance.
 
Last edited by a moderator:
Back
Top