AquaMark3 (tm) (r) (patent pending) ...

Kristof

Regular
Supporter
New pdf with some details :

http://ds1.jowood.at/aquamark3/am3-technical-med.pdf

Erhmm... IIRC the old Intel GPT allowed you to do all these things (or similar) on any application.

And I am worried about this :

But in AquaMark3 two other AvgFPS values can be found in the output files: The AvgFPS CPU and the AvgFPS GFX. These are based on single frame measurements. Each frame takes a certain amount of time to complete. One part of each frame calculation is spent on simulating the game physics and artificial intelligence on the CPU. The other part is spent on rendering the graphics on screen. Both parts of each frame are measured independently, and based on these values the two theoretical AvgFPS values are computed, as if only that part of the frame would have been processed. Note that the absolute value of these two measurements is of theoretical nature, but considering these values in relation to values on other test system allows a performance judgment of the two mainly stressed components.

I hope they don't intend to :

1) Start Timer
2) Send scene to hardware
3) Lock frame
4) Stop Timer when lock succeeds

The above would completely stop any parallellism and would hurt performance and give completely unrealistic performance numbers.

All in "my own humble personal opinion of course..." (tm) (r) (patent pending) :rolleyes:

K-
 
Wouldn't that also skew results insofar as not being able to determine how effecient the driver is at dividing up work between the GPU and CPU :?:
 
Kristof said:
The above would completely stop any parallellism and would hurt performance and give completely unrealistic performance numbers.
Can't they do:

1) start CPU timer
2) do physics/AI/etc.
3) stop CPU timer

And, erm..., subtract the CPU time from the average time it takes to render the whole frame :)
 
Kristof said:
The above would completely stop any parallellism and would hurt performance and give completely unrealistic performance numbers.
Of course it would only hurt performance much on deferred renderers as IMRs don't have much inter-frame-parallelism. ;)
If they do as you suggest, they only measure latency, not throughput. But that can be solved by rendering the same scene (-> low CPU utilization ) 50 or 100 times before locking the frame buffer.
 
NeARAZ said:
Kristof said:
The above would completely stop any parallellism and would hurt performance and give completely unrealistic performance numbers.
Can't they do:

1) start CPU timer
2) do physics/AI/etc.
3) stop CPU timer

And, erm..., subtract the CPU time from the average time it takes to render the whole frame :)
No, because the GPU renders in parallel. So (time per frame) - (time for physics/AI) != (time it takes the GPU to render the scene)
 
Xmas said:
No, because the GPU renders in parallel. So (time per frame) - (time for physics/AI) != (time it takes the GPU to render the scene)
That's why I said "average"... So the time includes the driver work, the 3D API runtime work, and the time that you're waiting if the GPU is lagging behind (eg. the renderqueue is full, or you are already several frames ahead).
I guess raw "GPU benchmark" can't be done in straight way, as there's no means to run timer on the GPU...
 
NeARAZ said:
That's why I said "average"... So the time includes the driver work, the 3D API runtime work, and the time that you're waiting if the GPU is lagging behind (eg. the renderqueue is full, or you are already several frames ahead).
I guess raw "GPU benchmark" can't be done in straight way, as there's no means to run timer on the GPU...
But taking the average doesn't change anything. Imagine your benchmark is running 60 seconds, and your CPU timing tells you that 59s are spent on AI/physics. So there's one second for render calls and the OS/other applications. But that could mean the graphics card needs anything between 1s and 60s to actually render the frames.
 
Xmas said:
Of course it would only hurt performance much on deferred renderers as IMRs don't have much inter-frame-parallelism.
Actually, there is a lot of intraframe parallelism in most apps.

The main reason for this is that primitives tend to come along in groups in which they are quite alike - so because of this, each group tends to hit one particular bottleneck inside the hardware. Only when you have plenty of groups in flight does the hardware get the best opportunity to absorb bottlenecks by taking up the slack.
 
Dio said:
Actually, there is a lot of intraframe parallelism in most apps.

The main reason for this is that primitives tend to come along in groups in which they are quite alike - so because of this, each group tends to hit one particular bottleneck inside the hardware. Only when you have plenty of groups in flight does the hardware get the best opportunity to absorb bottlenecks by taking up the slack.
With interframe parallelism I meant transforming/binning geometry of one frame while rendering another frame like a deferred renderer can do.

Of course there is intraframe parallelism. You have a lot of low-level parallelism (multiple pipelines, several objects in a pipeline at once) and parallelism between multiple functional units separated by caches (VS, triangle setup, PS). But the hardware/driver is usually not going to reorder
the given tasks if, for example, you first render some screen-sized polys (PS limited) and then some pixel-sized polys (VS/triangle setup limited).
 
Why not just do this measurement with a null driver for the graphics card for avgFPS CPU, and a trace of rendering calls for avgFPS GFX?
 
Ailuros said:
How about they concentrate on creating a game worth playing for a change? :oops:
I agree. Aquanox was a beautiful, smooth playing game. Too bad it had no depth.....
 
Back
Top