2005/2006 PC versus PS3/Xbox 360

I can tell you that a 3.2GHz SNB dual core is tremendously faster than my 3.2GHz Conroe, which lacks HT. There are certainly other factors at play which make the SNB so much faster but I would estimate that HT is playing a rather large role here.
I think that the value of HT depends on a few factors. I don't think it's of much use for most games. For example the 2500 and 2600 are essentially the same for gaming. Sometimes the 2500 even wins!

See here for a review that looks at the SNB Pentium which has no HT. It is very strong against the I3 2100 in games. Pretty glorious budget hardware.
http://www.anandtech.com/show/4524/...-review-pentium-g850-g840-g620-g620t-tested/3

So I think it's not HT that makes SNB great for games. It's the other goodies.
 
I think that the value of HT depends on a few factors. I don't think it's of much use for most games. For example the 2500 and 2600 are essentially the same for gaming. Sometimes the 2500 even wins!

I think this is because most games don't scale beyond 4 threads.

See here for a review that looks at the SNB Pentium which has no HT. It is very strong against the I3 2100 in games. Pretty glorious budget hardware.
http://www.anandtech.com/show/4524/...-review-pentium-g850-g840-g620-g620t-tested/3

So I think it's not HT that makes SNB great for games. It's the other goodies.

Sure, but look at the more highly threaded games like Far Cry 2, F1 2011, and Metro 2033 with PhysX.
http://www.xbitlabs.com/articles/cpu/display/pentium-g850-g840-g620_4.html#sect0

The i3-2100 is quite a bit faster than the SNB Pentiums in those cases.

It's also interesting to see how much faster the Pentium G850 is than the equally clocked C2D E7500. SNB is really a spectacular architecture for gaming. It's a shame we won't see anything like it in the next-gen consoles.
 
HT does a huge difference for dual core CPUs in some games
http://www.computerbase.de/artikel/prozessoren/2011/test-intel-core-i3-2100-2120/29/
http://www.computerbase.de/artikel/prozessoren/2011/test-intel-core-i3-2100-2120/31/
http://forums.overclockers.com.au/showthread.php?p=11398817#post11398817
99e07b02-75f3-4a4d-93f8-337c2b409fe1.png
 
I was looking for an optimized LZMA implementation and found an interesting benchmark: http://www.7-cpu.com/

LZMA decompression speed:

Cell PPU (3.2 GHz):
- 1 thread = 1060 mips
- 2 threads (SMT) = 1500 mips

For this kind of code, you get 41.5% speed boost by using SMT (two HW threads per core). That's a bit better than you get usually by hyperthreading (on recent Intel CPUs).

LZMA decompression isn't suited that well for the simple PPC core: It has lots of integer multiplies (not pipelined), variable shifts (microcoded) and hard to predict branches (branch mispredicts), so SMT might benefit it a bit more than code that utilizes the processor better. See here for more info: http://cellperformance.beyond3d.com...iding-microcoded-instructions-on-the-ppu.html.

In comparison, a modern mobile out-of-order processor:

Dual core Cortex A9 Exynos (1.2 GHz):
- 1 thread = 1080 mips
- 2 threads = 2140 mips
 
It would be nice if there were a comparison of an i7 2600k with turbo on and off, and HT on/off.

The i5 had turbo off, so the i7 to i5 comparison is not one of equals, otherwise the improvement would be in the ballpark of what the PPE showed.
 
In comparison, a modern mobile out-of-order processor:

Dual core Cortex A9 Exynos (1.2 GHz):
- 1 thread = 1080 mips
- 2 threads = 2140 mips
Right this is exactly what I mean... that's the most trivial "SMT" implementation ever. It's just switching between instructions from the two threads to cover pipeline/memory latency and thus if you're running not fully occupied, you're getting *half* throughput. That's fine, but it's a trade-off of parallelism for throughput so a theoretical processor that could get 2140 mips on a single thread would be more desirable, all else being equal.

Would be interesting to see the same thing run on the 360 CPU.
 
I know it was stated earlier in the thread that my E6750 has a 28.6% "instruction advantage" over Xenon, but could there be something else at play here? I can run most console ports at a steady 60FPS at console settings. Could it be that most Xbox games are not maxing out the CPU?

Admittedly I run the thing at 3.2GHz but this is not a large enough overclock to make such a huge difference, I think?
 
Considering the CPU often makes very little difference often in PC games, why would this surprise you? You're running with a graphics card that's well advanced of those in the consoles.
 
I would think that the consoles would be maxing out every available resource.
 
I think games would run terribly on a x1800.

I said "why does my theoretically-not-that-much-faster-than-Xenon CPU perform so much better than Xenon in videogames?" Of course I know the GTX260 is far faster than Xenos, but my understanding is that console devs are squeezing every ounce of power out of their systems, including the CPU. And I am interested in the CPU limitations here, nothing to do with GPU.
 
Your CPU may have absolutely nothing to do with why your system runs games faster, it may in fact be hindering them from meeting their true potential and xenon would be better.
 
Right this is exactly what I mean... that's the most trivial "SMT" implementation ever. It's just switching between instructions from the two threads to cover pipeline/memory latency and thus if you're running not fully occupied, you're getting *half* throughput. That's fine, but it's a trade-off of parallelism for throughput so a theoretical processor that could get 2140 mips on a single thread would be more desirable, all else being equal.
Yes, a processor that gets 1500 mips out of a single thread is more desirable than a processor that gets 1000 mips out of a single thread and 1500 mips out of two. However designing a processor with 50% better IPC is a much harder job than just adding SMT (and getting similar 50% extra throughput from the simpler CPU). Also you can add SMT to the 150% IPC core as well (Intel's strategy pretty much), and it seems to work pretty well.
Would be interesting to see the same thing run on the 360 CPU.
We are/were running a modified version of it in Trials Evolution... It doesn't suit current console architectures so well. I have been evaluating dozens of different compression algorithms, as we are streaming all our data during gameplay (textures, meshes, terrain height field, vegetation map, decals, etc). Very good compression rate and very fast decompression speed are a pretty hard combination to achieve.
 
Very good compression rate and very fast decompression speed are a pretty hard combination to achieve.

No doubt! I think I read that SPEs can be quite good at it though.

Tried to find an example of it, but I only have this screen below of Killzone 2, from around or just before release?, and the only compression bit I can recognise in there is the MP3 stuff. I wish we could just pull up screens of this whenever we felt like it in our retail games. :LOL:

http://www.niwra.nl/tmp/killzone_stats.png
 
No doubt! I think I read that SPEs can be quite good at it though.
PPC cores can also be quite good in decompression if the algoritm is designed for it. Pure LZMA is pretty bad, but if you are willing to sacrifice a few percents of compression ratio you can get over 5x performance boost (with a different algorithm). Lossy sound and video compression algorithms are of course quite different beasts. If the sound/video decompression algorithm benefits from vectorization, SPEs could be quite efficient (but as usual, the devil is in the details. Some algorithms could perform very well, and some not so well).
Those GPU/CPU stall times look really high... 10.5 ms GPU stall per frame might be manageable in a 30 fps game, but losing almost 1/3 of your frame time to stalls doesn't sound right. Maybe the debug output is slowing it down.
 
Last edited by a moderator:
However designing a processor with 50% better IPC is a much harder job than just adding SMT (and getting similar 50% extra throughput from the simpler CPU). Also you can add SMT to the 150% IPC core as well (Intel's strategy pretty much), and it seems to work pretty well.
Yup totally agreed - the serial performance path obviously hits brick walls in hardware and power, so you're forced to do parallelism. I just like to note that when comparing two processors that achieve similar throughput, the more serial one is actually desirable, if all else was equal (which it often is not).
 
As some may have noticed, I've been playing with a X1950XTX. Its biggest problem is drivers that are almost two years old. As such it can't even run a lot of recent DX9 games.

But while it's much slower than a 8800GT, I'm sure it's well beyond the consoles. It's easy to lose perspective on how many upgrades have happened since the consoles arrived. Today's midrange GTX 560 Ti is several times faster than a 8800GTX.
 
Back
Top