Confused about GFLOPS ratings

Entropy · Feb 20, 2005

Shifty Geezer said:
Is the CELL architecture really that customizable though, that it can support different SPU designs and still work? If you replace the SPU's with some specialized Pixel pipelines, Cell code, running apulets, will keel over as the specialist units won't be able to run them.

I dunno. I guess the SPU's could easily accommodate a design change from SP to DP for more scientific purposes, but I can't see Cell working as a system that seemlessly glues different techs together.

On another note, are there any instances where existing supercomputers use SP? Will Cell find it's way to the top of the Top500 list, it will it be consigned to mainstream uses only?

It is definitely the case that the SPUs could be modified to do DP work - it would just require more on-die real estate. If IBM can make a profitable business out of it is the only question. And besides, there are definitely instances where supercomputers could use just SP - hell we've had single-bit supercomputers!

It's just another (fairly small) narrowing of the set of problems where you can apply the computer. But we typically don't model reality to the 15th significant digit - we're happy if we can get one or two. The precision used in representing numbers is there for numerical stability only. While this is a real issue for sure, DP is only a band-aid. The problem should really be adressed at the algorithmic level. This is not always possible though, be it for fundamental reasons or typically pure lack of time.

Will Cell find its way to the top 500? I should think so. Will it be a hit in computationally intensive areas outside gaming? Damned if I know. Someone has to modify the chip a bit and build good systems based on it, and has to market it as a computational powerhouse and evangelize it to potential customers. IBM might do it, or they might not. The indications are that they will do something of the sort, but how strong an effort they will make is impossible to say. The potential is certainly there, but it's the practise that counts.

gamepower · Feb 20, 2005

Laa-Yosh said:
There's a lot of GFLOPS ratings thrown around nowadays, but it seems that marketing has got the advantage and it's not that easy to actually compare them...

For example, we've heard the 256GFLOPS value for the Cell CPU. But it turned out that this is for single precision FP operations, and once you change to double precision, it sinks down to ~25 GFLOPS.

Now, IBM is said to have plans to build CELL workstations and renderfarms for users like the CG industry. But most rendering applications are using double precision AFAIK, and thus the advertised performances (like the 16 TFLOPS rack) are suddenly not that good looking.

So my question is, how do the actual performances compare? How much FLOPS can a 3GHz P4 Xeon do; is it single or double precision? Are supercomputers rated for single or double prec? Am I right that scientific applications and CG requires double precision? Can SSE accelerate double prec?

The discussion about the Flop rate is so redundant like the discussion about the 75 Millionen Polygons/s of PS2,the same think like the Hype
about the 16Terraflops Workstation,that all is an diversionary tactic from
Sony,because they know that they suck in point of ram amount.If
PS3 and XBOX2 has only 256MB ram and if the graphic bandwidth under
50 Gbyte/s,they suck and gonna down.

randycat99 · Feb 20, 2005

Got Meds?

aaronspink · Feb 21, 2005

Jaws said:
Even though the an x86 FP unit maybe 80bit precision capable, the rendering software would be re-compiled to the appropriate architecture if ported and the default would be SP, unless DP is explicitely required.

Actually, the default in pretty much all processor/operating systems/programmers is DP unless you know that SP is sufficient. Most rendering software is largely DP. SP can require a lot of normalization work or result in the odd artifact.

Aaron Spink
speaking for myself inc.

aaronspink · Feb 21, 2005

passerby said:
A browse at the official site reveals that it is quite a specialised unit. An array-processor with 96 processor elements, but each PE only has 6kb mem. A 128kb scratchpad sits on the chip, and I don't see any presence of any "general-purpose type CPU" for dealing with applications that do that map to array processing. Of course, that is why it is also positioned as a coprocessor to the "normal" pentium/athlon!

It's wonderful for certain applications, but for many other things we're better off relying on our "miserable" years-old pentiums and athlons.

And this is different from Cell in what way? Cell is also quite specialised.

Aaron Spink
speaking for myself inc.

hovz · Feb 21, 2005

anyone have glop figures for the current high end pentiums and a64s

j^aws · Feb 21, 2005

aaronspink said:
Jaws said:

Even though the an x86 FP unit maybe 80bit precision capable, the rendering software would be re-compiled to the appropriate architecture if ported and the default would be SP, unless DP is explicitely required.

Click to expand...

Actually, the default in pretty much all processor/operating systems/programmers is DP unless you know that SP is sufficient. Most rendering software is largely DP. SP can require a lot of normalization work or result in the odd artifact.

Aaron Spink
speaking for myself inc.

I actually meant it the other way around as a best practice i.e. ensure everything is SP unless DP is required for efficiency and not to be lazy. If in doubt then DP should be used as a fallback. Of course if the compiler didn't know what to do with it then accuracy would take precendent over speed. If the end result then happens to have more DP code than SP, then so be it, at least it has been optimised.

I'd be interested to know what part of the rendering pipeline actually benefits from DP be it off-line or real-time?

aaronspink · Feb 21, 2005

Jaws said:
I'd be interested to know what part of the rendering pipeline actually benefits from DP be it off-line or real-time?

There are cases are artifacting in high end rendering due to the limited accuracy of SP. These could be fixed by re-normalization, but that is a big pain and would interact with the art flow. So most of the rendering software has moved to DP. The speed differential is generally minimal due to things like data set sizes, etc.

Aaron Spink
speaking for myself inc.

passerby · Feb 21, 2005

aaronspink said:
And this is different from Cell in what way? Cell is also quite specialised.

I don't believe I mentioned the 'C-word' in the entire post.

Confused about GFLOPS ratings

Similar threads