Why does Intel lose in Gaming

Kaotik

Drunk Member
Legend
Supporter
We all know AMD dominates the gaming market at the moment when it comes to performance - but exactly why is this?
Part of the reason surely lies in the long pipes of P4, which take a hit when there's a lot of branches - when the programs are more or less linear, P4's rock.

However, this might not be the only reason - take a look at this ExtremeTech article:
http://www.extremetech.com/article2/0,1697,1895945,00.asp

It suggests, that by choosing a bit different options when compiling the game, especially P4's would get huge (they mention 5-10, 10-15 and even 20%) benefits, while possibly not harming AMD performance at all (it's a bit unclear apparently wether this would affect them at all, or would it affect some certain models or what)
The main point is, that the games (based on BF2) apparently are compiled using Pentium Pro / Pentium II / Pentium III optimizations, instead of taking real advantage of SSE/SSE2 etc, which would only need one switch "turned" by the developer.
 
Riddick has several builds for plan x86, sse and sse2, as well as amd64, so some games do do this.

I'd say it's because p4 was really designed around rambus and without it, it's not working in an ideal environment. It hasn't upgraded it's bus since forever, and doesn't have an on die memory controller. Games are about moving data to the video card mostly, same reason the quake games ran so much better on the celerons vs the K6.

Also could be that at the resolutions people play at, the cpu isn't the determining factor anymore so developers don't need to super tweak cpu performance.
 
AFAICS, after hardware T&L, deep command queues etc started appearing, the processor effort involved in pushing data to the GPU has dropped by a rather large amount and is not by itself very important these days. The difference between Intel and AMD performance in games is AFAIK mainly due to branch mispredict penalties, memory latency and perhaps use of dynamic dispatch; in particular, AI and physics code tend to have poor branch/memory access predictability, which hurts P4 much more than A64.

The P4 also has some odd, more technical performance issues; e.g. if two data elements are spaced exactly 64 Kbytes (or a multiple) apart, the P4 cannot hold both in L1 cache at the same time, causing you to randomly lose 20-30% performance every once in a while. IIRC, this particular issue has caused in particular Nvidia a lot of headache; for every driver change they do, they run an extensive set of performance regression tests, and this issue basically ensured that no matter what part of the driver they changed, some other unrelated part would go 20% slower.
 
Kaotik said:
We all know AMD dominates the gaming market at the moment when it comes to performance - .
Not realy at at highres. If you run the cpu only to bench then the fx57 is 20 frames faster! Huge! BUT, when you run at 1600/1200 its a wash.... But the amd64 still out performs just not very much.
 
Sxotty said:
A64 supports SSE2 as well so why would the P4 get a boost and not the A64?

The Athlon 64, and its predecessor, were designed to handle floating point math very well by having multiple FPUs. The FPU pipeline is not as straight forward as the integer pipeline or SSE (Streaming SIMD). What the K7 and K8 actually do well is mask the intermediate steps giving you high floating point throughput. However, once you switch to SSE you have a simpler processing pipeline and all that logic that the K7/K8 has is no longer needed. That is to say, SSE, or streaming SIMD operation, can use reduced logic with a more straighforward pipeline . It becomes about clock rates again because the NetBurst and K8 SSE units/pipeline is roughly the same. So, the K8 gains a little over FPU but the NetBurst gains a lot, especially because it doesn't like FPU operations in the first place.

Another factor is the Athlon 64 memory controller. This controller and its proximity close to the core/logic makes it excellent for highly granular data. However, it doesn't do a whole lot for streaming memory operations.

Of course, both, or perhaps I should say a new architecture, could have multiple parallel SSE units but these are very bandwidth intensive and would require a lot of bandwidth to feed. Another solution would be to have separate clock domains. By using streaming operations you are, in many ways, paying for reduced complexity with bandwidth.
 
Because NV and ATI are just now figuring out that HT can help performance if you multi-thread the drivers?

P.S. This was mostly a quip, btw, so AMD-lovers don't pin my ears back too badly --it did blow my mind tho to see ATI going "Oh, yeah, huh --works with HT too."
 
Kaotik said:
We all know AMD dominates the gaming market at the moment when it comes to performance - but exactly why is this?
Part of the reason surely lies in the long pipes of P4, which take a hit when there's a lot of branches - when the programs are more or less linear, P4's rock.

However, this might not be the only reason - take a look at this ExtremeTech article:
http://www.extremetech.com/article2/0,1697,1895945,00.asp

It suggests, that by choosing a bit different options when compiling the game, especially P4's would get huge (they mention 5-10, 10-15 and even 20%) benefits, while possibly not harming AMD performance at all (it's a bit unclear apparently wether this would affect them at all, or would it affect some certain models or what)
The main point is, that the games (based on BF2) apparently are compiled using Pentium Pro / Pentium II / Pentium III optimizations, instead of taking real advantage of SSE/SSE2 etc, which would only need one switch "turned" by the developer.

I would say due to the fact tht the AMD has a shorter pipe line has a lot to do with it.
 
geo said:
Because NV and ATI are just now figuring out that HT can help performance if you multi-thread the drivers?

P.S. This was mostly a quip, btw, so AMD-lovers don't pin my ears back too badly --it did blow my mind tho to see ATI going "Oh, yeah, huh --works with HT too."
Yeah, but ideally your CPU would already have been maxed out with AI, physics and other stuff. I think this is more of a stopgap measure personally but we will see, maybe with quad cores etc. coming out it will always be useful as developers of games won't be up to using all the CPUs perhaps, but then perhaps not. (maybe I just don't like wasting system resources to check whether there are excess resources available ;) not to mention the whole thermal throttling bit if you use certain CPUs at max load.)
 
Superior FPU power, larger L1 cache, shorter pipelines, onboard memory controller, DDR1 latencies inherently lower.

It is only recently (relatively) that AMD's processors have shown their advantage against the Intel based processors. This is not only down to AMD but the direction Intel has taken with its Prescott core.
 
Kaotik said:
The main point is, that the games (based on BF2) apparently are compiled using Pentium Pro / Pentium II / Pentium III optimizations, instead of taking real advantage of SSE/SSE2 etc, which would only need one switch "turned" by the developer.

The problem with that is that it will crash on CPUs that don't support SSE/SSE2. Compiling for P2/P3 is probably done because they need the extra performance more than the top CPUs. Still though, it's not that hard to provide several difference executables for a range of different CPUs.
 
Kaotik said:
How's the SSE2 speed on A64?

Not bad, but not excellent either. You'll get better performance with 3DNow, and P4 will generally run SSE/SSE2/SSE3 code faster than A64.
 
Humus said:
P4 will generally run SSE/SSE2/SSE3 code faster than A64.
Are you sure about this? I don't think this is right at all. I think the Athlon 64 actually does a bit better in SSE2 per clock than a Pentium 4.


...Which goes back to my argument above that once SSE code is used the IPC difference between K8 and NetBUrst becomes lower and it becomes a battle of clocks and memory bandwidth.
 
So the problem is that they could complie the games for a number of instruction sets, each one giving a CPU a gain. So instead of doing this they just complie the game for a lower end CPU and expect the higher end ones to bare it out, which AMDs do better than Intels.
 
suryad said:
Really? I wonder why they compile the games like that...lowest common denominator?

To sell to a wider user base. The people with the top of the line CPUs dont need the extra attention, so they complie for the group that does.
 
wireframe said:
Are you sure about this? I don't think this is right at all. I think the Athlon 64 actually does a bit better in SSE2 per clock than a Pentium 4.


...Which goes back to my argument above that once SSE code is used the IPC difference between K8 and NetBUrst becomes lower and it becomes a battle of clocks and memory bandwidth.

We are not talking about per clock, and even then it is debatable whether the A64 is faster per clock at SSE2 code than the Pentium 4 architecture.

Per clock is also irrelevant, the Pentium 4 has more clocks than the A64 as it is.

I don't remember where but there was an article that measured the SSE2 speed of the A64 vs the P4 and the A64 was behind in all instances.

This could be similar to the way that A64 is faster at x86-64 than the Penitum CPU's. are.
 
Back
Top