Most games are 80% general purpose code and 20% FP...

Inane_Dork said:
Shifty Geezer said:
Prior to E3 MS was all FP power of their system. They never mentioned GP until that Major Nelson article. It was, IMO, never a point they considered until finding differences between their system and Cell for their PR campaign.
Uh... I dunno about that. MS would have to be very stupid indeed to have no sense of what the Cell was capable of prior to E3.
I meant that they didn't consider GP an issue until after E3 where Sony trumped all MS's FP figures. MS knew about Cell, yet prior to E3 with '1 teraflop total system performance' and such, never once did MS mention GP or it's importance. Only after E3, when the world was raving about '2 teraflops total PS3 system performance' did MS analyse the differences between XeCPU and Cell (designed on similar philosophies) and say 'Hey, we've got three GP cores, they've only got one. And...um...it's the GP the really matters. Just don't go thinking about that too hard all you start to wonder why on earth we didn't choose GP power or talk about that before now.'

It's all marketting bluster. KK talked 1 teraflop super-computer Cell. MS follow with 1 teraflop XB360. Sony parry with a 2 Teraflop PS3 performance. MS have run out of Flop statements (can't claim more than Sony now), so shift strategy to refocus people's understanding of power away from FP and onto GP.

None of these waffle is worth considering for technical purposes. It's just mindnumbing General Populace (GP) processing, confusing the masses as to what actually isa measure of performance and which console has the bigger number of that measure. Joe Public, having just about got over the MHz Myth, must be reeling from a double blow of Teraflops and GP performance, plus a few slaps about the face with fantastic non-realtime footage and mediocre real-time footage with added '30% of finalised hardware' complexity.
 
What Sweeney said, was:

Tim Sweeney said:
According to Tim, a lot of things aren’t appropriate for SPE acceleration in UE3, mainly high-level game logic, artificial intelligence and scripting. But he adds that “Fortunately these comprise a small percentage of total CPU time on a traditional single-threaded architecture, so dedicating the CPU to those tasks is appropriate, while the SPE's and GPU do their thing."

Looking at the thread title, he says that most games aren't spending 80% of their time executing general purpose logic (although 80% of the program might consist of GP logic), but that it is only a small part of the total execution time. And those other, compute-heavy tasks can be accelerated.

And the main focus on the memory access in this thread is about totally random access to single words, which is the worst case for CPU's with lots of cache as well.

Memory arcitecture being what it is, it is much more efficient to load a whole block of data at a single time, than it is to read only a single word. RAM memory adresses whole rows, and are much better at adressing the next row very fast (especially when you interlace it), than adressing some data at a totally other location next.

So, while the penalty for random accesses is smaller with a large hardware cache, accessing a whole block of data in local memory through DMA is much more efficient.

When running multiple threads, the most important thing is not to have two threads trying to access the same data independently. Which also calls for restructuring your data into independent blocks. And if you do that and want to start crunching, a local memory that is much bigger than a normal level 1 cache has the upper hand.

Which leaves branching. And that is essentially the same problem: loops and other predictable structures are fine, but random branches are bad. Which goes for each architecture as well, although the ones that use branch prediction or hints generally have a smaller penalty. And there are many good ways to minimize random branches.

So, unless you expect them to run any random and badly written code, they can perform very well if you keep those things in mind. And developers who program for the PC exclusively would be wise to take the same things into account, with the transition to multi-core PC's, PPU's and GPU's that can write data back to main memory directly.
 
DiGuru said:
Looking at the thread title, he says that most games aren't spending 80% of their time executing general purpose logic (although 80% of the program might consist of GP logic), but that it is only a small part of the total execution time. And those other, compute-heavy tasks can be accelerated.

But that's exactly the opposite of what Gubbi's profiler data actually shows.

Of the total ops actually executed in modern games like HL2, Farcry, or Doom3, the majority are non-FP. Those that are FP are mostly scalar, not vector, so even then the number of FP ops is inflated, since vectorized code will need fewer FP ops to do the same math.

Assuming roughly similar throughput for each kind of op, there's no way a modern game can spend the majority of its time executing FP instructions, even ignoring things like the fact that branch and memory accesses will eat up the most time out of anything.

That is assuming Gubbi's data are accurate.

Now, there's no doubt that because the next generation platforms are so float-oriented the software will adapt to take advantage of its strengths.

But the interesting question is, like I've already said previously, who made the better tradeoff?
 
aaaaa00 said:
But that's exactly the opposite of what Gubbi's profiler data actually shows.

Of the total ops actually executed in modern games like HL2, Farcry, or Doom3, the majority are non-FP. Those that are FP are mostly scalar, not vector, so even then the number of FP ops is inflated, since vectorized code will need fewer FP ops to do the same math.

Assuming roughly similar throughput for each kind of op, there's no way a modern game can spend the majority of its time executing FP instructions, even ignoring things like the fact that branch and memory accesses will eat up the most time out of anything.

True. But we should take some other things into account here as well.

For starters, it would be better to look at the type of calculation done in the sense if it is executing general purpose game logic, or crunching through a (preferrably stream) of data blocks. While the latter might not be very FP heavy in current / older games, that doesn't mean it could not run very well on a SPE.

And currently, those things run on CPU's, that are much better at integer calculations than at doing complex floating point ops. Which changes when you run those same functions on something that can do those complex FP ops much more efficiently and faster than integer ops. In that case, it makes sense to use as much floating point variables, structures and calculations as possible, and the picture might flip around.

Edit: and it wouldn't surprise me if a lot of those integer ops actually are fixed point ops, for maximum speed.
 
aaaaa00 said:
DiGuru said:
Looking at the thread title, he says that most games aren't spending 80% of their time executing general purpose logic (although 80% of the program might consist of GP logic), but that it is only a small part of the total execution time. And those other, compute-heavy tasks can be accelerated.

But that's exactly the opposite of what Gubbi's profiler data actually shows.

Of the total ops actually executed in modern games like HL2, Farcry, or Doom3, the majority are non-FP. Those that are FP are mostly scalar, not vector, so even then the number of FP ops is inflated, since vectorized code will need fewer FP ops to do the same math.
However, you're not going to see much FP stream based algorithms written to run on x86 ;)
 
Would it be wrong to state that current games are mostly made up of non FP ops because they have to run on architectures that favour that kind of operations?

And would it be wrong to state that obviously, writing applications on machines that are FP monsters will obviously have more FP ops than older games that were designed with INT in mind?
 
london-boy said:
Would it be wrong to state that current games are mostly made up of non FP ops because they have to run on architectures that favour that kind of operations?

And would it be wrong to state that obviously, writing applications on machines that are FP monsters will obviously have more FP ops than older games that were designed with INT in mind?

No, you're correct - applications do mould themselves around the hardware, particularly in closed boxes like a console. Which is why drawing conclusions from data on one platform isn't really possible, IMO.
 
Titanio said:
london-boy said:
Would it be wrong to state that current games are mostly made up of non FP ops because they have to run on architectures that favour that kind of operations?

And would it be wrong to state that obviously, writing applications on machines that are FP monsters will obviously have more FP ops than older games that were designed with INT in mind?

No, you're correct - applications do mould themselves around the hardware, particularly in closed boxes like a console. Which is why drawing conclusions from data on one platform isn't really possible, IMO.

What i thought. So the point of the initial comment, and the point of the thread is...?
 
the original point was Oda asking if the statement was true or not (fair enough) and subsequent debate was how it could or could not be true, turning into expected performance of processors and speculation, as we do.
 
Shifty Geezer said:
None of these waffle is worth considering for technical purposes. It's just mindnumbing General Populace (GP) processing, confusing the masses as to what actually isa measure of performance and which console has the bigger number of that measure.
So, if according to you MS is using FUD to talk about their system performance, what is the real metric they should be using?

.Sis
 
There is no one metric. Just as a car's performance on a race track can't be predicted by BHP or weight figure alone, but a guess needs a collection of BHP, torque, weight, turning rate, tyre grip, yadayadayada.

I've no beef with these company's releasing tech specs, so long as they stop trying to stretch the truth or create new figures just to one-up their opponents. Like politicians who spend all their time talking about how bad their opponent's policies are without describing their own policies. Tell us about the hardware, peak metrics, system architecture, using the same standards across the board. Course, it's not helped by the media wanting conflict and handbag-smackin' bitch-fight

For marketting there'll always be figures like RAM and such, but the key is just to ignore them unless they're known quantities. And these companies should know by now metrics mean sod-all to the masses. PS2 is the most expensive and least numerically endowned of all consoles. Whatever spiel you create before launch, it's people seeing real games running in shop windows that'll sell the system, followed by word of mouth and popular culture.
 
london-boy said:
Would it be wrong to state that current games are mostly made up of non FP ops because they have to run on architectures that favour that kind of operations?

And would it be wrong to state that obviously, writing applications on machines that are FP monsters will obviously have more FP ops than older games that were designed with INT in mind?

I don't think so. FP arithmetic is already faster or as fast as integer arithmetic on current PC processors. I think it's fair to say that all arithmetic in current gen games is FP. As shaderguy pointed out earlier, "integer performance" is a bit of a misnomer since it covers everything that isn't FP: arithmetic, load+store and program flow control.

As for FP usage: I profiled three current gen games, Farcry, Doom 3 and Halflife 2, all of them used alot less FP than they could (ie. they certainly weren't limited by FP performance), the most tell tale sign was that most used x87 FP ops (scalar) instead of the SIMD equivalents. Far Cry showed that animation (skinning) and physics was the two components that had the most FP ops, with 30% and 50% respectively, the rest of the engine have lower FP usage. Doom 3 had the highest overall FP ratio with around 25% FP ops (but exclusively x87), while Doom 3 has limited physics it does shadow volume extrusion on the CPU which would go along way of explaining the relatively high FP usage (compared to the other games). So with these three games in mind 80% GP and 20% FP seems about right.

However, games on PCs are made for a whole host of performance targets and hence can only push the envelope so far in terms of physics etc. The game profiles do show that a significant amount of "integer" foot work is required though.

That doesn't mean that the SPEs will sit idle though. Devs will find things to run on them, since they are champs at crunching dense vectors/matrices stuff that fits that bill will be obvious candidates, for example I'm sure we'll see awesome water (fluid dynamics) one next gen consoles.

But there is alot of non-FP work to be done as well and much of it will be a poor fit for the SPEs.

Cheers
Gubbi
 
Shifty Geezer said:
There is no one metric. Just as a car's performance on a race track can't be predicted by BHP or weight figure alone, but a guess needs a collection of BHP, torque, weight, turning rate, tyre grip, yadayadayada.
You have a good point, but I sympathize with them trying to come up with easy-to-grasp performance metics.

It's interesting, because even with a lot of technical details about both systems out there--beyond just marketing goop--you still find the random poster who asks, "So, uhhh, which system is faster?" and inevitably you get the answer, "We donno."

It holds true even with the car analogy: the specs on paper don't always play out in the actual car (the BHPs didn't convert to enough WHPs where it counts).

I guess my point is: even with full technical specs, you still don't get a clear picture of performance until you put the thing out on the road. And so if determining system performance is that "nebulous" based on specs, can you really blame companies for using dumbed down metrics?

I mean, look at it this way. When MS brings up the "our cpu has 3 times the GP perf of Sony's cell" they don't follow that with, "so we will be 33% faster." Instead, they say, "so they have twice the flops but we have 3 times the GP perf, so it's a wash." And by everything I've heard, this is not far from the truth.

.Sis
 
Slightly off-topic, but what would be the reason they don't use more SSE(2) or 3Dnow! instructions? I would have expected those to be used wherever possible, instead of just x87 FP.
 
Sis said:
It holds true even with the car analogy: the specs on paper don't always play out in the actual car (the BHPs didn't convert to enough WHPs where it counts).

I guess my point is: even with full technical specs, you still don't get a clear picture of performance until you put the thing out on the road.
Indeed, a slower car can beat a faster car in a race by having a better driver. ;)

I mean, look at it this way. When MS brings up the "our cpu has 3 times the GP perf of Sony's cell" they don't follow that with, "so we will be 33% faster." Instead, they say, "so they have twice the flops but we have 3 times the GP perf, so it's a wash." And by everything I've heard, this is not far from the truth.
Save that MS's claim is based on a very weak line of reasoning. They totally discount the SPE's, and count one XeCPU core as equal to one PPE core, and so come to 3 x cores = 3 x GP performance.

As we know, running a dual core CPU in your PC doesn't give 2x the performance of a the same in single core form. So how come 3 cores = 3x the power? And if PPE=XeCore, and a SPE is 1 quarter the effectiveness of a PPE, Cell has a total of 1+(7/4)=nearly 3 cores worth of GP capability.

Maybe XeCPU does have 3x the GP performance of Cell, but if so it's not because of the reasons MS gave!
 
Shifty Geezer said:
Prior to E3 MS was all FP power of their system. They never mentioned GP until that Major Nelson article. It was, IMO, never a point they considered until finding differences between their system and Cell for their PR campaign. And the reason it was never an issue is because stream processors can handle all the work load needed and very well if you learn and write to the hardware.

Yeah, it must have just been an accident that MS designed in a large cache, and three general purpose cores. No doubt IBM was running a three-cores-for-the-price-of-one special that week.

A much more likely explanation is that they didn't realize how much traction Sony would get with their streaming-floating-point-is-the-primary-measure-of-game-console-power PR message.
 
DiGuru said:
Slightly off-topic, but what would be the reason they don't use more SSE(2) or 3Dnow! instructions? I would have expected those to be used wherever possible, instead of just x87 FP.

Probably because these games aren't floating point bound, so there's no reason for developers to waste time increasing the fp performance. It wouldn't make the games noticably faster.

The developers probably just compile their code with the default compiler floating point settings, which use x87 instructions for compatability with older CPUs.

edit Oops. I see Gubbi already explained this far more eloquently than I did. :oops:
 
Shifty Geezer said:
As we know, running a dual core CPU in your PC doesn't give 2x the performance of a the same in single core form.
Is this true in a closed box environment?

So how come 3 cores = 3x the power?
It doesn't, in the same way that PS3 doesn't have 2 tflops of power. But in the dance of system comparisons, they can play loose with technical accuracy as long as the intent is relevant.
Maybe XeCPU does have 3x the GP performance of Cell, but if so it's not because of the reasons MS gave!
But the way they said it--if it's not a lie--is just the easiest way to express it to the public.

I'm not trying to defend marketing lies, but I am saying that MS defense of their system doesn't seem disingenious in the face of Sony trumpeting double flop performance, a media that parrots this as 'double the system performance', and MS only trying to say, "yeah, but it washes out."

.Sis
 
Gubbi said:
As for FP usage: I profiled three current gen games, Farcry, Doom 3 and Halflife 2, all of them used alot less FP than they could (ie. they certainly weren't limited by FP performance), the most tell tale sign was that most used x87 FP ops (scalar) instead of the SIMD equivalents. Far Cry showed that animation (skinning) and physics was the two components that had the most FP ops, with 30% and 50% respectively, the rest of the engine have lower FP usage. Doom 3 had the highest overall FP ratio with around 25% FP ops (but exclusively x87), while Doom 3 has limited physics it does shadow volume extrusion on the CPU which would go along way of explaining the relatively high FP usage (compared to the other games).

Just to clarify, since I didn't grab this earlier - you're looking at the ratio of floating point to other code in terms of code volume? Or execution time?

On a side note, I think with closed boxes, with games designed for a closed box, you do see the code map itself around the hardware. Devs will look at what it can do and play toward its strengths.
 
Back
Top