Most games are 80% general purpose code and 20% FP...

(Anand saying XeCPU was 2x XB CPU)

I'm sure in a worst case scenario this may be true. Which makes me wonder how X360 will emulate Xbox games, which could be very close to the worst case scenario.

BTW, I wonder what cpu Nintendo will use. I doubt they'll go for a stream processor, multicore is a possibility but only if they're going to use multiple Geckos. I doubt IBM will be willing to design a powerful yet low power consumption out of order chip for Nintendo though, they'll probably push their console Power core as often as possible to Nintendo.

Oh, aren't there game developers on this forum? Why not ask one of them how much FP code is in their games?

Slightly off-topic, but what would be the reason they don't use more SSE(2) or 3Dnow! instructions? I would have expected those to be used wherever possible, instead of just x87 FP.

Well, I thought 3dnow would be almost dead by now, but I'm surprised not to see a larger amount of SSE and SSE2 code being used, especially when nearly any processor capable of running current games would support them.

I remember back in the day that 3dnow gave a huge performance increase in games on the K6 processors, perhaps FP use hasn't increased quite as muuch as the FP capabilities of processors.
 
So, it does look like searching through all the data is using up most of the time, and transformations are done by the GPU as much as possible. Which makes sense, but it still surprises me.

I think that the way to go would be to dynamically generate as many objects as possible, and calculating as many textures as bandwith and processing power would allow. That way, scene management can become a lot easier, and the workload is much better to stream and parallelise.

But that would not only require a new programming model, but a different way to make artwork as well, which would be the biggest hurdle, I think.
 
Titanio said:
Just to clarify, since I didn't grab this earlier - you're looking at the ratio of floating point to other code in terms of code volume? Or execution time?

More precise than that. It's retired ops, ie. work actually done (speculated execution of ops are not counted).

It measures the c1, "retired microop" (ie all ops) event and the cb, "retired FP", event. The FP ratio is then given by cb/c1. You can set a mask on the cb event to count x87, MMX+3DNow!, SSE+SSE2 scalar or SSE+SSE2 scalar, in any combination of the four. I did three runs on all games: catch all FP events, catch all SSE+SSE2 (ie scalar+vector) and catch all MMX+3DNow!. x87 ops were then calculated as (x87 = all events - 3DNow - SSE)

Cheers
Gubbi
 
Titanio said:
Just to clarify, since I didn't grab this earlier - you're looking at the ratio of floating point to other code in terms of code volume? Or execution time?

number or uops retired for the time of the sampling session. which basically is not such a definitive measurement factor as people take it to be. it does not exactly give you a workload per particular unit (e.g. int alus vs fp alus). you need a bunch of extra info in conjunction with the retirement statistic in order to obtain the workload picture. like average op's latencies for the different units, the throughput of the various uops dispatchers, potentially the hyperthreading factor (which alone can completely convolute the workload picture depending on the configuration of contention points within the cpu).

what i'm trying to say is that you may get a retirement statistic of, say, hypotetically, 5:1 odds in favor of one calculation type, and still be bound by the throughput of the other.
 
Gubbi said:
Titanio said:
Just to clarify, since I didn't grab this earlier - you're looking at the ratio of floating point to other code in terms of code volume? Or execution time?

More precise than that. It's retired ops, ie. work actually done (speculated execution of ops are not counted).

It measures the c1, "retired microop" (ie all ops) event and the cb, "retired FP", event. The FP ratio is then given by cb/c1. You can set a mask on the cb event to count x87, MMX+3DNow!, SSE+SSE2 scalar or SSE+SSE2 scalar, in any combination of the four. I did three runs on all games: catch all FP events, catch all SSE+SSE2 (ie scalar+vector) and catch all MMX+3DNow!. x87 ops were then calculated as (x87 = all events - 3DNow - SSE)

Cheers
Gubbi

Thanks. Sorry if I'm not following precisely - so it's a count of all ops, and all fp/mmx/sse etc. ops executed? Is the length of time for the exeuction of each op the same, or variable? Sorry if these seem like basic questions ;) Thanks again..
 
darkblu said:
what i'm trying to say is that you may get a retirement statistic of, say, hypotetically, 5:1 odds in favor of one calculation type, and still be bound by the throughput of the other.

True, but the Athlon (all of them) can issue 3 FP instructions per cycle, an add, a mul and a store, so for arithemetic (add+mul) you'd be looking at 66% of the total instruction throughput that could be FP arithmetic. So the FP usage ratio is still a lot less than what it could be, unless these games are doing primarily FP division, but I doubt that :)

Cheers
Gubbi
 
Gubbi said:
darkblu said:
what i'm trying to say is that you may get a retirement statistic of, say, hypotetically, 5:1 odds in favor of one calculation type, and still be bound by the throughput of the other.

True, but the Athlon (all of them) can issue 3 FP instructions per cycle, an add, a mul and a store, so for arithemetic (add+mul) you'd be looking at 66% of the total instruction throughput that could be FP arithmetic. So the FP usage ratio is still a lot less than what it could be, unless these games are doing primarily FP division, but I doubt that :)

i concur. and no, i don't think those titles are fp-bound either : )
but at the end of the day people are still going to run around waving those retirement statistics as evidences for the '80/20' workload ratio nevertheless.. *shrug*
 
Back
Top