Let's describe architectures better

Arun

Unknown.
Moderator
Legend
Hey everyone,

As the world of graphics move foward ( yeah, that phrase seems dumb, stop complaining... ;) ) , the old ways to describe an architecture have become inadequate. While many might have said it in the past, including nVidia's PR, I didn't see many proposals for a way to describe them better. And that's why I'm writing this now.

Now, why would we need this? Well, there are many reasons, among which:
1. Easier description of a complex architecture: you don't have to write several phrases to describe something that could fit easily in a dozen characters.
2. If it becomes a standard, the PR machines of the GPU companies would have to use it, and thus would be required to describe their architectures better, evitating disasters like "NV30: 8x1 or 4x2?"

The current system is:
Number of pipelines X Number of TMUs per pipeline
This results in things like 2x4, 4x1, 8x2, ...

So, how could we do this better? Well, first, let's think about what we need to know about an architecture.
The easiest way to describe an architecture is by saying how fast it can do specific things. Basically, that's ops. It is also required to consider the type and precision of those ops.
While you may want to include vertex shader performance in this, it would make it all significantly more messy. I would suggest to leave VS performance to vertices/s, and maybe a few specific VS programs benchmarks.

What I propose is using T for textures, FP for Floating Point, FX for FiXed point and S for Scalar. You'd put the number of operations/clock, the postfix, and then use "+" when it can do it in parallel to something or "/" when it can do either ( for example, the NV30 can do either Textures or FP ops: it cannot be done in parallel )

Let's begin by an easy example.
NV25: 8T + 8FX9 ( AFAIK, the NV25 got 9-bit precision in the PS, and can do two integer operations per clock )

As you can see, that's 8 texture operations and 8 Fixed Point ( 9-bit ) operations per clock. Let's move on to the R300...

R300: 8T + 8FP24 + 8FPS24

This is already slightly more messy, but I believe that it still tells us a lot more about the architecture than, say, "8x1"
Now let's move on to the big bad monster...

NV30: 8T / 4FP32 + 8FX12

What this does not show us, in the NV30's case, is the register usage performance hit. We could display this in the following way:

NV30: ( 8T / 4FP32 + 8FX12 ) / R

But this might become way, way too messy... And you might even try to describe the performance hit mathematically, but then I frankly couldn't even read it myself...

What I do realize is that it is unlikely for PR to use so complex notations for end-consumers. But if this system, or a similar one, became popular, then graphics companies might be more likely to express the underlying architecture in such a way if you asked them for it in interviews, for example.

I don't think the end-consumer will ever use something so complex, but who knows...


Uttar
 
Hi there Uttar
I think in principle your idea is great but like you said for the end consumer it is a little daunting, hell even for me it is pretty damn daunting as I dont understand all of it on a quick glance but looking at it again it makes complete sense. Making complete sense but taking two or three glances to get there is not really good IMHO.

What can I say as I have no idea how it could be adequately simplified whilst still retaining the crucial info, just that if I have problems understanding it then I am sure the average consumer is gonna go all crazy and dizzy trying to decipher what all them numbers and letters mean.

Perhaps removing the S for scalar and FX for integer as we are moving forward and these features we have had with older gfx cards for a long time already.... but FP is new a textures number could help and perhaps a bandwidth figure thrown in for good measure.. e.g. 20GB/Sec Max Theoretical Bandwidth (that is without the bandwidth enhancing features but real raw bandwidth).

Also back to the FX abbreviation it sounds too close to the name of the GFFX and could possibly confuse customers... like yea the FX can only do FX cos it is an FX card man.. ;)

Just thinking about the end consumer rather than this board in general as most people in this board would actually love it I bet!

Sorry I'm babbling.
 
But this might become way, way too messy... And you might even try to describe the performance hit mathematically, but then I frankly couldn't even read it myself...

I think this is due to the complicated design of some GPUs. We didn't have these issue before. If things fix themselves in the NV35, this will only be a blip on the RADAR and we can go back to using the old system.

If it's not, then a new lingo can be made up.
 
Tahir: I'm gonna tell you a small secret. The REAL reason the GeForce FX got a "FX" is because of it's amazing FiXed point performance and crappy FP performance :p j/k
Yeah, it's confusing, what about using P & X instead of FP & FX, maybe?

As I said, I doubt it's gonna become a huge standard for the end-consumer. More like a standard for technical sites.
Putting bandwidth, VS, ... might be usefull, but it might be better to do multiple figures.

Also, I was thinking doing a ( ) * 4 to say, for example, that it natively supports 4x MultiSampling. Does seem to make MultiSampling too godly, though.

Saem: The NV35 won't be much easier, although it might have decoupled FP & Tex units, making it it slightly easier. But I must admit I'm unsure as to whether it's truly the case.
The NV40 ( probably à won't sport any FX units anymore, so it might get easier then :) Although I'm ready to bet big time other things will complicate matters.


Uttar
 
I'm not terribly certain that it's important (or even useful) to attempt to create new terminology here. That is, architectures are sure to change very quickly, and the only truly important numbers are the benchmark results in real games.
 
Chalnoth said:
I'm not terribly certain that it's important (or even useful) to attempt to create new terminology here. That is, architectures are sure to change very quickly, and the only truly important numbers are the benchmark results in real games.
Man....that was a good answer.
 
Well, at the risk of censure...

Well, there is my infamous "proxel pipeline" shoehorning of this into the existing terminology structure.

An explanation of the much detested proxel concept.

A discussion in a different context, but providing some additional clarification. I think this also refers to a discussion about alternative names for "proxel" for "people who don't want to be reminded of medications they might have to take all too soon". :p

My reasoning for why I proposed it is presented in those links pretty completely, I think.
 
Chalnoth said:
I'm not terribly certain that it's important (or even useful) to attempt to create new terminology here. That is, architectures are sure to change very quickly, and the only truly important numbers are the benchmark results in real games.

I think this is the first time I have read something sensible from Chalnoth.

:D
 
Benchmarks in real games are on the other hand only half the story. It doesn't tell what to expect in future games. Thus the technical details are still needed.
 
Humus: That's my opinion too. Heck, the NV30 got formidable FX performance, so it obviously wins in current games ( beside AA, of course ).
But in future ones, well, it won't rule as much, or it'll lose, depending on the game.

Demalion: From my understanding ( must admit I didn't read it all ) , a "proxel pipeline" does not give ANY informaiton about FX/FP performance, whether FP & TEX is decoupled, scalar performance, precision, ... - while it might be nicer than current systems used in some ways, I still find it lacking in many ways.

Although realizing that there are still really "pipelines", mostly because of things like registers ( having an infinity of registers isn't too practical :p ) , it might still be useful to give that type of info in a definition, because an architecture which requires a lot more parallelism is often less efficient than an architecture which requires little.

So, you could have:
NV30: 4 x ( 2T / 1FP32 + 2FX12 )
R300: 8 x ( 1T + 1FP24 + 1FPS24 )

Of course, it still isn't complete because different type of instructions take different amount of times on each architecture. But hey, stop complaining! :D


Uttar
 
Humus,

Isn't that the point where techdemos or synthetic applications can come into play? Most of them still measure performance.

As a layman I have a much easier time to understand how an architecture behaves in a hypothetical case A or B with a techdemo/synthetic application, than giving me just paperspecs and/or definitions.

***edit: to avoid misunderstandings: I consider all supplied information as useful from my rather simplistic POV.
 
Why is it that I think you'd never have written this, Uttar, in the absence of nVidia selling an 8x1 marchitecture which is actually a 4x2 architecture?....;) I really don't see a whole lot of value in describing an "8x0" organization, frankly--at least not in the context in which nVidia has framed it.

Personally I subscribe to the KISS point of view because rarely is complexity for the sake of complexity evoked in order to do much of anything except to obscure the fundamental. A non-reduced equation always has something to hide, IMO... I also don't think the idea of counting how many color pixels a graphics chip can produce per clock is in any danger of obsolescence for the foreseeable future. Personally, I'll be delighted when nVidia is able to produce a competitive 8x1 architecture so we can all quit trying to figure out ingenious ways to make a 4x2 organization sound much better than it is...;) However, it's nVidia's marketing that started all of this and it'll be a long while before I forget it.
 
WaltC said:
Why is it that I think you'd never have written this, Uttar, in the absence of nVidia selling an 8x1 marchitecture which is actually a 4x2 architecture?....;)

Unlike what many might think, I can be quite objective :) And if I'm insisting so much about NV35 avaibility dates, it's obviously because I'm not looking into a crystal-ball, but because I know them :p ( or anyway, the expected ones, that is )

IMO, however, the number of colored pixels/clock is completely outdated.
Just look at the NV31! Officially, it's a 4x1. But just look at the shader performance, and you'll easily realize it isn't when you compare it to ATI's solutions. It's more like a 2x2 when using FP.


Uttar
 
I think it will be hard to find a simple metric to compare future GPUs. The expanding capabilities of DirectX/OpenGL 2 also expands the design choices that can be made whech making a GPU

It's already hard to compare nVidia's and ATI's DX 9 GPUs:

nVidia has:
8 fragment processors (pixel shaders)
4 framebuffer blend units.
16 Z-comparators with stencil operations.

ATI has a more rigid scheme:
8 fragment processors
8 framebuffer blend units
48/8 Z/stencil units. The extra Z-comparators only good at MSAA testing.

Apparently organized in 8 pipes.

On top of this each of the fragment processor has different performance with different data-types.

Then you have the size and organization of the texture cache which accounts for a great deal, especially with high quality AF on.

Cheers
Gubbi
 
Back
Top