GPU vs CPU: benchmark?

Monrad · Apr 1, 2006

Hi guys, just a quick question. I've been looking for a floating point benchmark. The idea is to do the same with the CPU and GPU and see the performance difference.

Do you know any benchmark capable of doing what I'm asking?

Thanks in advance

Farid · Apr 1, 2006

To measure something, you need to know what it is, and in the case of the GPU versus CPU Flops, it's not an easy task if one is to define what is a Flops.

Since in a GPU you have both programmable and fixed function hardware capable of Floating Point operations. Yet, one can't compare directly the non programmable Flops to the programmable ones.

And, even with the Flops from the ALUs (Programmables) of a GPU can't be easily compared to the Flops of a CPU... Or Even to the Flops from the ALUS of another GPU, with a different architecture.

At best you can compare the number and the type of Floating Point operations a GPU can do and the one a CPU can do.

But there's no meanigful way to encompass the results in a comprehensive benchmark, with a number for the CPU and a number for the GPU.
One could of course do it, but it would mean stricly nothing in the facts.

DOGMA1138 · Apr 1, 2006

does the GPU still actualy has those fixed function pipe lines?
i was under the impression that most if not all of the fixed function hardware was long gone by now. and the legacy support is done via emulations in the driver.
any ways about GPU vs CPU benchmarks; alot of the articles in GPGPU.org have some benchmarks for specific functions or application.

Monrad · Apr 1, 2006

Thanks for your answers guys. I just wanted to have an idea on physics calculation performance on GPU and CPU.

I personally think that CPUs are not capable of doing what game developers want to do with physics. And when I say not capable I'm talking about dual cores and quad cores. I was looking for a benchmark that measures the part of the GPU that is needed to calculate physics and the same on the CPU.

I've been searching for a benchmark on gpgpu.org but I couldn't find a comparison between GPUs and CPUs.

I know it's not that simple to benchmark a GPU against a CPU. I guess I'll have to wait until physics on games are real

Thanks again

mhouston · Apr 1, 2006

GPGPU.org in the forums and the papers linked sometimes go into deep detail about performance differences. Usually, the comparisons are done at the architecture, not app level. i.e. latency hiding, bandwidth, peak flop rates, etc. There isn't a "physics" benchmark persay, but there are fluid codes, bioinformatics codes, etc that you can get from GPGPU.org and the vrious paper links that you can run against the CPU.

At the moment, very few algs on current GPUs are more than 10x a CPU, and nobody should be 100X a CPU unless the comparison is against an untuned CPU code, or the algorithm is actually very different between the processors. Stay tuned for the next round of GPGPU high performance apps over the next year at conferences like Graphics Hardware, Supercomputing, ASPLOS, PACT, Micro, etc. I'm sure you'll also hear more from Nvidia/ATI and companies like Havok and Microsoft over the coming year.

Now, if you are going to render your simulation, doing the sim+render all on the GPU might be MUCH faster than sim on the CPU + render on the GPU since you've taken the costly feedback loop between the GPU and CPU out of the equation.

And rememeber, not everything will be fast on the GPU, only applications that can be shoehorned into fitting the characteristics of the architectures and their limitations.

Monrad · Apr 1, 2006

Thanks mhouston, I'll take my time to find interesting information in gpgpu.org

Xmas · Apr 1, 2006

DOGMA1138 said:
does the GPU still actualy has those fixed function pipe lines?
i was under the impression that most if not all of the fixed function hardware was long gone by now. and the legacy support is done via emulations in the driver.

There is still a lot of fixed function stuff around the programmable shader parts, like triangle setup, rasterization, depth/stencil test, shader input interpolation, texture address calculation and filtering, and blending.

DOGMA1138 · Apr 2, 2006

ah thanks, for some reason my mind was fixated on the older DX6/7 fixed function pipelines

btw does the SM4.0 Geometry shader replace the fixed function triangle setup the way that vertex shaders replaced TnL? allso i tought that ATI had a programable Z unit for quite some time now, is this correct?
anf what are the chances that they'll move filtering and blending into shaders in upcoming hardware? is it still a fixed function due to only preformance reasons, and will moving it into shaders present any advantages?

MipMap · Apr 2, 2006

Just out of interest, I have already tested this on my 2.8 Ghz Athlon-64 with NV-6800 GPU. The results are:

CPU: 72.78 Mflops
GPU: 2440.32 Mflops

The test was the matrix multiply shown below - you may quibble about the benchmark but what you see there is one floating point add and one floating point multiply (i.e. 2 fp ops). As you can see, the GPU is a whole bunch faster - the drawback is, don't use one if you need numerical accuracy - with the GPU, the accuraccy is much less since it doesn't use IEE-754 compliant methods.

/*******************************************/
/* CODE SNIPPET */
/*******************************************/
float *pBuff0 = new float[nSize];
float *pBuff1 = new float[nSize];
float *pBuff2 = new float[nSize];

float alpha = 1.0f / 9.0f;

assert(pBuff0 && pBuff1 && pBuff2);

// Init input data
for(int i = 0; i < nSize; i++)
{
pBuff1 = i + 1.0f;
pBuff2 = 2.0f;
}

tStart = (double)clock();

for(int i = 0; i < nIterations; i++)
for(int n = 0; n < nSize; n++)
{
pBuff0[n] = pBuff1[n] + (pBuff2[n] * alpha);
}

tEnd = (double) clock();

/*******************************************/

mhouston · Apr 2, 2006

sgemm is upwards of 7GFlops on an athlon and over 15GFlops on an X1800 and over 12GFlops on a 7800.

JHoxley · Apr 2, 2006

DOGMA1138 said:
btw does the SM4.0 Geometry shader replace the fixed function triangle setup the way that vertex shaders replaced TnL?

No, that job is still done by the IA. The IA takes the various bits of data the application provides and stitches them together into a geometry stream (points, lines, triangles..) that the various pipeline stages then operate on.

The IA is a much more refined beast in D3D10 - the instancing and stream related stuff is much tidier than in D3D9 imo.

DOGMA1138 said:
allso i tought that ATI had a programable Z unit for quite some time now, is this correct?

I'm more of a software guy, but I didn't think that was the case. At least via Direct3D there isn't strictly a programmable Z - there are various ways to tweak it (e.g. Z Bias) and to overwrite it (oDepth), but it's mixed in with other parts of the pipe.

DOGMA1138 said:
anf what are the chances that they'll move filtering and blending into shaders in upcoming hardware?

Almost zero I'd reckon. In the case of filtering there is little need for anything that the hardware can't already provide. I'd also imagine that theres plenty of optimization to be had by having it non-programmable. My understanding for frame-buffer blending is that it's just too performance-sensitive. The various complexities involved with how the results of the (many) pixel pipelines are written to the buffers and so on just make it impractical...

Cheers,
Jack

DOGMA1138 · Apr 2, 2006

ha many thanks

so Geometry Shaders just let you manipulate the geometry stream?

JHoxley · Apr 2, 2006

DOGMA1138 said:
so Geometry Shaders just let you manipulate the geometry stream?

Yes, the application has to set up the basic inputs into the pipeline. It is possible that the Geometry Shader will generate the only triangles (e.g. the particle system demo), but it still comes from something that the application provides... it can't create something from nothing :smile:

Jack

BRiT · Apr 2, 2006

MipMap said:
<<...code snippet...>>

How does that not get completely optimized away since there are no references to any of the output ? Do you have an output loop after the timing that you didn't include in your posted snippet? The last time I used gcc with optimizations on, that sort of dead code would be completely removed producing a time of 0 ms to run.

Nick · Apr 2, 2006

MipMap said:
CPU: 72.78 Mflops

I get 560 MMADs, that's 1120 MFLOPs, on a 1.4 GHz Pentium M...

Anyway, the raw numbers don't mean anything. A CPU is still way more efficient with branches (advanced prediction) and non-linear memory access (MB's of cache versus kB's in a GPU). Furthermore, with dual-core becoming mainstream that's extra processing power that comes available to the application. When using existing GPU processing power you're going to have to balance (i.e. lower graphics workload, while the second CPU core sits idle).

GPU vs CPU: benchmark?

Monrad

Farid

Artist formely known as Vysez

DOGMA1138

Monrad

mhouston

A little of this and that

Monrad

Xmas

Porous

DOGMA1138

MipMap

mhouston

A little of this and that

JHoxley

DOGMA1138

JHoxley

BRiT

(>• •)>⌐■-■ (⌐■-■)

Nick

Similar threads