Are all FLOPS created equal?

Jaws said:
No, I got the end of your post alright...your initial analogy with dual G5's and a single Athlon was way off and completely misleading. Unless you care to clarify further as I hate misunderstadings you know...?

I think my main points was quite clear.

Point 1: The workload has to fit the architecture to get anywhere optimal performance.

Point 2: It is easier to make workloads that fit general purpose processors like the ones in the Xbox CPU than specialized processors like in Cell.

Point 3: As long as we know as little as we do about the real life workloads on the next gen consoles it is hard to make conclusions.

The G5 vs. A64 analogy is far from perfect (neither the analogy itself nor the way I presented it), but it illustrates the point that if the workload does not fit the architecture it does not matter how big your theoretical power advantage is.

Maybe analogies from the open PC-land is bad when speaking about consoles, but they beat car-analogies any day of the week. (I would rather shot myself than writing stuff about F1-cars with bicycle wheels*)


* not really
 
Tim said:
I think my main points was quite clear.

Point 1: The workload has to fit the architecture to get anywhere optimal performance.

Point 2: It is easier to make workloads that fit general purpose processors like the ones in the Xbox CPU than specialized processors like in Cell.

Point 3: As long as we know as little as we do about the real life workloads on the next gen consoles it is hard to make conclusions.

Tim +1. <applauds politely>
 
Tim said:
Jaws said:
No, I got the end of your post alright...your initial analogy with dual G5's and a single Athlon was way off and completely misleading. Unless you care to clarify further as I hate misunderstadings you know...?

I think my main points was quite clear.

Yes, as I've said already, I understand your points as it's been discussed to death on this forum in hundreds of threads. My specific point made with my reply was on how completely misleading your initial analogy was that you were seemingly presenting as fact. Then you followed on with your post to justify that analogy.

Tim said:
Point 1: The workload has to fit the architecture to get anywhere optimal performance.

Agreed as mentioned above.

Tim said:
Point 2: It is easier to make workloads that fit general purpose processors like the ones in the Xbox CPU than specialized processors like in Cell.

Agreed as mentioned above.

Tim said:
Point 3: As long as we know as little as we do about the real life workloads on the next gen consoles it is hard to make conclusions.

Not really, as EVERYONE knows they will run GAMES. We may not have official specs yet and if we did it would still be difficult to make apples to apples conclusions but we can get a general sense with 'peak' metrics and what's been discussed on these forums for months.

Tim said:
The G5 vs. A64 analogy is far from perfect (neither the analogy itself nor the way I presented it), but it illustrates the point that if the workload does not fit the architecture it does not matter how big your theoretical power advantage is.

Agreed with your point. However, my earlier point was that you need to take into account that these consoles are embedded, closed architecture's and will be optimised for their respective architectures, whether Xenon or PS3. This makes the point less valid IMO, when comparing general purpose PC CPUs to ANY console CPU.

Tim said:
Maybe analogies from the open PC-land is bad when speaking about consoles, but they beat car-analogies any day of the week. (I would rather shot myself than writing stuff about F1-cars with bicycle wheels*)


* not really

Not necessarily, it's a matter of material and presentation. The best analogies sometimes have nothing to do with the subject at hand. ;)


-----------------------------

@ aaaaa00

- 'infinity' for bringing GAF into this place!

- 'a' for above!

- '0' for above!


aaaa0

Ahh f*eck it!

- 'a' again!

aaa0

:p
 
Tim said:
Looking at Cell we have one general purpose processor and eight special streaming processors - offering an extremely high peak flops score, but also extremely high performance penalties when running workloads that are not suitable for these kind of processors (easy 10x times or more).

Where does this figure come from? Claimed 10x speedup with certain workloads = 10x slowdown with others less suitable? 10x slowdown versus peak cell performance, or versus a general CPU plucked from the air?

Tim said:
What the best solution is, is hard to say - it depends purely on the workload.

Agreed, but I think if you're going to optimise for games, your first port of call is still more floating point power (the "heaviest" workloads in games are still flops-orientated, or so I'm told). From that perspective I think STI have probably taken the right route..(although, of course, there are other factors to include beyond pure computational prowess as mentioned by others already).
 
Flops are needed for physics, 3D structures, and audio. AI traditionally doesn't use Floats, but that'll be reworked. My own AI models are float-friendly. The big question regards Cell's performance limitations is how much can be designed as efficient data streams? Traditional approaches have a lot of 'out of order' processing. Some see this as a limiting factor. Ithink Cell programs will develop new Cell-friendly ways to model their worlds.
 
Shifty Geezer said:
Flops are needed for physics, 3D structures, and audio. AI traditionally doesn't use Floats, but that'll be reworked. My own AI models are float-friendly. The big question regards Cell's performance limitations is how much can be designed as efficient data streams? Traditional approaches have a lot of 'out of order' processing. Some see this as a limiting factor. Ithink Cell programs will develop new Cell-friendly ways to model their worlds.

Agreed, programming to it/around it should yield benefits for "other" workloads..

How is integer performance on the SPEs, out of curiousity? Do we know?
 
Gubbi said:
Tim: 1 - Jaws: 0

:D

Cheers
Gubbi

Gubbi, that hurts! :devilish:

Anyway, this thread just another XeCPU vs CELL in disguise again for the 1245322th time...and the usual crew are taking sides... ;)

Prepare your damage control engines as E3 approaches! :p
 
Titanio said:
How is integer performance on the SPEs, out of curiousity? Do we know?
Weren't the SPEs supposed to have an equal performance with ops and Flops?
 
Vysez said:
Titanio said:
How is integer performance on the SPEs, out of curiousity? Do we know?
Weren't the SPEs supposed to have an equal performance with ops and Flops?

Well, that's what I wondered...

Purely from a computational perspective, do integer workloads "suffer" relative to floating point performance on SPEs?

I'm tending to looking purely at just the number crunching here (i'm ignoring issues to do with splitting data between spes efficiently etc.), so forgive me..
 
Titanio said:
Tim said:
Looking at Cell we have one general purpose processor and eight special streaming processors - offering an extremely high peak flops score, but also extremely high performance penalties when running workloads that are not suitable for these kind of processors (easy 10x times or more).

Where does this figure come from? Claimed 10x speedup with certain workloads = 10x slowdown with others less suitable? 10x slowdown versus peak cell performance, or versus a general CPU plucked from the air?

Workloads that could benefit from a demand loaded cache could be > 10x faster on a general purpose CPU (PPE) compared to a SPE.

Cheers
Gubbi
 
Shifty Geezer said:
In these FLOPS you get general/programmable flops, and targetted flops. ...Targetted flops serve only one purpose. You get these in a graphics card. eg. a vertex pipeline might be doing so many flops transforming vertices, but is you want to calculate weather patterns it's no good to you - it can only be used to transform vertces.

What we're seeing more and more is companies digging out as many FLOPS as they can find. Every time any action is performed on a floating point number, it's a flop. This is where Allard's that 1 teraflop number comes from.

Great explanation SG. I always wondered what Allards claim of "more than 1 teraflop of targeted computing performance". "Targeted" was always the term that I found a little funny. His language was VERY specific I thought in that case. So SG, is it your opinion that he was focused specifically on GPU performance during that statement?
 
Titanio said:
Vysez said:
Titanio said:
How is integer performance on the SPEs, out of curiousity? Do we know?
Weren't the SPEs supposed to have an equal performance with ops and Flops?

Well, that's what I wondered...

Purely from a computational perspective, do integer workloads "suffer" relative to floating point performance on SPEs?

I'm tending to looking purely at just the number crunching here (i'm ignoring issues to do with splitting data between spes efficiently etc.), so forgive me..

kaigaip039.jpg


Above, the SPE is a 16-way integer unit too.

It's easily forgotten, when the subject of 'Flops' is brought up that the SPE is a unified vector/scalar/integer/float unit...and a CELL has 8 such units in addition to the VMX/FPU/INT units in the PPE.

CELL is a very capable flops/integer/vector/scalar processor. To make it sing, you have to take advantage of it's strengths...
 
blakjedi said:
So SG, is it your opinion that he was focused specifically on GPU performance during that statement?

I'm not SG, but I think that the 1 teraflop thing Allard said was nothing more than to counter that teraflop talk which was associated to PS3 even if nobody ever said it directly.
 
Titanio said:
Vysez said:
Titanio said:
How is integer performance on the SPEs, out of curiousity? Do we know?
Weren't the SPEs supposed to have an equal performance with ops and Flops?

Well, that's what I wondered...

Purely from a computational perspective, do integer workloads "suffer" relative to floating point performance on SPEs?

I'm tending to looking purely at just the number crunching here (i'm ignoring issues to do with splitting data between spes efficiently etc.), so forgive me..

Well if by integer workload you mean a workload that just crunch integers instead of floating point numbers (like say signal processing audio data), performance would be similar.

But traditionally integer workloads signifies a class of programs that have a significant amount of dereferencing/pointer chasing of some sort. Anytime you have a tree structure of sorts you'll have a significant amount of "integer" work.

Even workloads that have a significant amount of say FP work, can be categorized as an integer workload if the program is bottlenecked by the integer/pointer component.

A good example of this is 252.Eon in the SPEC2000 benchmark suite. Eon is a probalistic raytracer and one would naively categorize it as a FP workload, - yet it is in the integer suite. This is because raytracing is an exercise in pointer chasing more than anything else.

Cheers
Gubbi
 
blakjedi said:
So SG, is it your opinion that he was focused specifically on GPU performance during that statement?
the 1 tflop figure came from everything. CPU, GPU, sound processing, IO. Everything doing maths contributes to this teraflop figure. Even then it's unlikely to be valid.

As for focus, Allard was focussing on the teraflop nonsense for PS3, derived from web-talk. If not for these web-rumours I doubt he'd have mentioned any flop rating whatsoever for Xenon.

It's in reality a marketting strategy (or counterstrike) and no way a performance measure.
 
Tim said:
An example from the PC world, a P4, Celeron, A64 and Athlon XP all has the same peak flops per clock cycle, a G5 actually twice the peak FP power per clock than these chips. What chip would you buy as a gamer? In real life a single 2.2GHz A64 would outperform a dual 2.5GHz G5 in games in spite of the fact that the dual G5 is 4.5 times faster in peak flops.

I think the problem is not in the chip archiectures but rather in the OS. Athlon XP runs Windows games while G5 runs these games ported to OS X. The performance difference is strongly affected by the differing operating systems.

Doom 3 can just run ~50FPS on the fastest dual core G5 with a 9800. Does that mean the G5 sucks at games? Or the 9800 is slow? No. It just shows that there are issues with optimization or drivers for gaming on the Mac platform. And this is well acknowledged, even by the Mac community.

You seem to say that the G5 is poorly targed for games. I don't think this is the case. Rather the G5 is always running an OS which is poorly targeted for games.
 
my question is, are all single-precision FLOPs created equal, and same for double-precision.


lets say I have a low-end SGI workstation with two MIPS CPUs that together provide 20 GFLOPs and then I have a high-end Mac with dual PowerPCs that together provide 20 GFLOPs - are these machines about the same in CPU power ?
 
It is easy to see how Allard might have come up with "more than a teraflop"


the Xbox1 has close to 100 GFLOPs (1/10th of a TFLOP) of total theoretical floating point performance if you add up everything that Nvidia (NV2A) Intel (733 Mhz CPU) and Nvidia (MCPX) quote for their processors.


Xbox360 would only need like 10 to 12 times more total theoretical fp performance than Xbox to reach more than a "teraflop"


Xbox360 CPU is more than 10x more fp than Xbox CPU

Xbox360 GPU is probably around 10x more (or more) fp than Xbox GPU.


so anyway, Xbox360 is probably going to have (peak) between 1/10th and 1/20th of a teraflop of general purpose/programmable floating point computing performance (CPU), and, either at, or approaching, a teraflop of targeted computing performance (GPU), also peak.
 
Megadrive1988 said:
my question is, are all single-precision FLOPs created equal, and same for double-precision.


lets say I have a low-end SGI workstation with two MIPS CPUs that together provide 20 GFLOPs and then I have a high-end Mac with dual PowerPCs that together provide 20 GFLOPs - are these machines about the same in CPU power ?

They have the same peak floating point power.

But such a comparison is so flawed that the 'yes' doesn't really mean anything. ;)

Floating point operations per second is hardly the best way to measure a processor's performance. No single metric is. The only valid metric is the real world performance of the application for which the processor is designed for. And that is a function of integer, FP, memory, bandwidth, programming model, OS, etc.
 
Back
Top