Luminescent
Veteran
I was sitting on the toilet when it hit me! The puzzle of NV35's rather odd FillrateTester scores has been solved! While relaxing, I came to the conclusion that fragment, not pixel, output and throughput are two completely different performance metrics. Output is the tangiable result obtained from an input, while throughput is
Thus, the fragment throughput of a system can be used to measure the vpu/gpu's pipeline efficiency; the accessability of a given ouput.Dictionary.com said:the amount passing through a system from input to output (especially of a computer program over a period of time)
If a conveyer belt has 2 lanes of 2 workers each (each worker and lane held equal), assembling a two piece component, while another has 4 lanes, of 1 worker, assembling the same component, the 2 lane set of workers maintains a greater likelyhood of upholding a certain ratio of output per lane, since there are 2 workers assigned to every single (global) task.
Similiary we find NV35 (the little cinematic engine that tried to do the right thing), only capable of writing 4 pixel per clock (color ops, with z buffering). Althoug NV35 could work on 8 pixels simultaneously (on apps with shaders of more than 1 instruction or textures per pixel) because it contains more than 1 fp unit per pipeline (like the 2*2 lane employes), it can only output 4 pixels per clock; it contains only 4 parallel pipelines (lanes). On the contrary, R300/350 is able to output 8 pixels per clock because it houses 8 discrete pipelines (although each pipeline is composed of only 1 fp unit). Thus, if there is a two instruction pixel shader (assuming both instructions are micro/core ops, executable in 1 cycle), NV35 should be able to sustain its fillrate (assuming the ops are dependent/optimized for 2 serially arranged units), but R300/350's will drop by a factor of 2, since R3* can only execute 1 fp op per pipeline and there are two (per pixel) in the shader.
Now, because of the inherent nature of fillrate testers (such as MDolenc's), and the fact that they are limited to obtaining the number of pixels written, not ops executed, per second, we must do a little computational work on our own to determine the average number of instructions a certain processor is executing per cycle.
One way to do this, with a fillrate tester, is by dividing the pixle shader fillrate (with no shaders/texturing) by the fillrate result obtained for the particular fragment program in question. The result should tell us the factor by which the original fillrate was cut (number of cycles it took to complete). Next, count the number of instructions in the fragment program and compare the two results. Assuming each fragment op takes an approximate number of cycles (usually 1 for any pixel shader op, excluding things like pow, lit, lrp, etc.) divide the number of pixel shader instructions by the factor of "cut" fillrate, and the average instruction execution rate for each test will be available.
If I were to write a program which did this work for the end user/reviewer, the code would look read the following way(I only know 1 semesters worth of basic C programming ):
Luminescent said:#include <stdio.h>
int
main (void)
{float max_rate, app_rate, number_cycles, exec_rate;
int number_instruc;
/*Prompt user for necessary fillrate tester inputs*/
printf("Enter maximum fillrate");
scanf("%lf", & max_rate);
printf("Enter application fillrate");
scanf("%lf", &app_rate);
printf("Enter number of instructions in application");
scanf("%d", &number_instruc);
/*Determine the number of cycles required for desired application*/
number_cycles=max_rate/app_rate;
/*Extrapolate the average number of cycles per instruction*/
exec_rate=number_instruc/number_cycles;
/*Return results to user*/
printf("Your vpu averages a shader execution rate of %f instructions/cycle/pipeline\n", exec_rate);
return 0;
}