Hey everyone,
As the world of graphics move foward ( yeah, that phrase seems dumb, stop complaining... ) , the old ways to describe an architecture have become inadequate. While many might have said it in the past, including nVidia's PR, I didn't see many proposals for a way to describe them better. And that's why I'm writing this now.
Now, why would we need this? Well, there are many reasons, among which:
1. Easier description of a complex architecture: you don't have to write several phrases to describe something that could fit easily in a dozen characters.
2. If it becomes a standard, the PR machines of the GPU companies would have to use it, and thus would be required to describe their architectures better, evitating disasters like "NV30: 8x1 or 4x2?"
The current system is:
Number of pipelines X Number of TMUs per pipeline
This results in things like 2x4, 4x1, 8x2, ...
So, how could we do this better? Well, first, let's think about what we need to know about an architecture.
The easiest way to describe an architecture is by saying how fast it can do specific things. Basically, that's ops. It is also required to consider the type and precision of those ops.
While you may want to include vertex shader performance in this, it would make it all significantly more messy. I would suggest to leave VS performance to vertices/s, and maybe a few specific VS programs benchmarks.
What I propose is using T for textures, FP for Floating Point, FX for FiXed point and S for Scalar. You'd put the number of operations/clock, the postfix, and then use "+" when it can do it in parallel to something or "/" when it can do either ( for example, the NV30 can do either Textures or FP ops: it cannot be done in parallel )
Let's begin by an easy example.
NV25: 8T + 8FX9 ( AFAIK, the NV25 got 9-bit precision in the PS, and can do two integer operations per clock )
As you can see, that's 8 texture operations and 8 Fixed Point ( 9-bit ) operations per clock. Let's move on to the R300...
R300: 8T + 8FP24 + 8FPS24
This is already slightly more messy, but I believe that it still tells us a lot more about the architecture than, say, "8x1"
Now let's move on to the big bad monster...
NV30: 8T / 4FP32 + 8FX12
What this does not show us, in the NV30's case, is the register usage performance hit. We could display this in the following way:
NV30: ( 8T / 4FP32 + 8FX12 ) / R
But this might become way, way too messy... And you might even try to describe the performance hit mathematically, but then I frankly couldn't even read it myself...
What I do realize is that it is unlikely for PR to use so complex notations for end-consumers. But if this system, or a similar one, became popular, then graphics companies might be more likely to express the underlying architecture in such a way if you asked them for it in interviews, for example.
I don't think the end-consumer will ever use something so complex, but who knows...
Uttar
As the world of graphics move foward ( yeah, that phrase seems dumb, stop complaining... ) , the old ways to describe an architecture have become inadequate. While many might have said it in the past, including nVidia's PR, I didn't see many proposals for a way to describe them better. And that's why I'm writing this now.
Now, why would we need this? Well, there are many reasons, among which:
1. Easier description of a complex architecture: you don't have to write several phrases to describe something that could fit easily in a dozen characters.
2. If it becomes a standard, the PR machines of the GPU companies would have to use it, and thus would be required to describe their architectures better, evitating disasters like "NV30: 8x1 or 4x2?"
The current system is:
Number of pipelines X Number of TMUs per pipeline
This results in things like 2x4, 4x1, 8x2, ...
So, how could we do this better? Well, first, let's think about what we need to know about an architecture.
The easiest way to describe an architecture is by saying how fast it can do specific things. Basically, that's ops. It is also required to consider the type and precision of those ops.
While you may want to include vertex shader performance in this, it would make it all significantly more messy. I would suggest to leave VS performance to vertices/s, and maybe a few specific VS programs benchmarks.
What I propose is using T for textures, FP for Floating Point, FX for FiXed point and S for Scalar. You'd put the number of operations/clock, the postfix, and then use "+" when it can do it in parallel to something or "/" when it can do either ( for example, the NV30 can do either Textures or FP ops: it cannot be done in parallel )
Let's begin by an easy example.
NV25: 8T + 8FX9 ( AFAIK, the NV25 got 9-bit precision in the PS, and can do two integer operations per clock )
As you can see, that's 8 texture operations and 8 Fixed Point ( 9-bit ) operations per clock. Let's move on to the R300...
R300: 8T + 8FP24 + 8FPS24
This is already slightly more messy, but I believe that it still tells us a lot more about the architecture than, say, "8x1"
Now let's move on to the big bad monster...
NV30: 8T / 4FP32 + 8FX12
What this does not show us, in the NV30's case, is the register usage performance hit. We could display this in the following way:
NV30: ( 8T / 4FP32 + 8FX12 ) / R
But this might become way, way too messy... And you might even try to describe the performance hit mathematically, but then I frankly couldn't even read it myself...
What I do realize is that it is unlikely for PR to use so complex notations for end-consumers. But if this system, or a similar one, became popular, then graphics companies might be more likely to express the underlying architecture in such a way if you asked them for it in interviews, for example.
I don't think the end-consumer will ever use something so complex, but who knows...
Uttar