ExtremeTech: A New Way to Talk about GPUs

Dave Baumann · Aug 1, 2005

http://www.extremetech.com/article2/0,1697,1841940,00.asp

IMO the problem with the entire line of thought presented here is that there really isn't much of an answer for comparing the hardware for an ordinary consumer - we'll still be looking at certain key metrics, which fill-rates and texture rates will be, but realistically speaking all the ordinary consumer can be concerned about is the price, performance and features. Even then the water gets muddy - with DX10 we have less room within the API to differentiate features, and reading performance from synthetic benchmarks is going to relate even less to reality when we have unified architectures. So, the choice is: go deep into the architecture and try to explain, or hover of the the top of the architectural issues and present a bunch of numbers, IMO.

tEd · Aug 1, 2005

Unfortunatley still the majority of the websites totally ignore certain IQ quality points of a given video card. It's all nice if you explain how architecture works and that it's fast but if they fail tell you or ignore that the quality of the output is only medicore in certain areas then it was all for nothing IMO.

Recent case gf7800gtx reviews

trinibwoy · Aug 1, 2005

tEd said:
Unfortunatley still the majority of the websites totally ignore certain IQ quality points of a given video card. It's all nice if you explain how architecture works and that it's fast but if they fail tell you or ignore that the quality of the output is only medicore in certain areas then it was all for nothing IMO.

Recent case gf7800gtx reviews

That's the second time I've seen you comment on the IQ on the GTX. Do you have specific examples and is it only with optimizations enabled?

Arun · Aug 1, 2005

Heh, interesting article, but as I often do, I disagree with most of its points

Besides the obvious one of "the current way is way too confusing for Joe Customer".

What really struck me is how the fans of 3D graphics are sticking hard and fast to a certain way of looking at GPUs. They discuss everything in terms of "pipelines," with some even going so far as to say that the GeForce 7800 GTX isn't a "true" 24-pipeline chip because it only has 16 raster operation units (ROPs), and can therefore only really draw 16 pixels per clock, max.

That's hardly a reason to say that it *is* a real 24 pipelines GPU. Yes, it'd be completely stupid to go with more than 16ROPs within the next 3-6 months or so (and even afterwards, it's doubtful), but there would be advantages to having more ROPs even if they would be minimal. It all depends on the kind of workload, really; Z passes eat them like there's no tommorow, and it's *temporarly* not too bandwidth limited because there's no color and no textures.
Once again, I fully agree with NVIDIA's decision of keeping 16ROPs for this generation, but indirectly implying a 24ROP part with more bandwidth and possibly more texelrate wouldn't benefit from these extra ROPs strucks me as a bit too much of a "these technical details don't truly matter" - but that might be me misunderstanding the article, and, if so, I of course apologize for it.

Pipelines, clock rates, and fill rate were a useful shorthand a couple of years ago, but that's no longer the case.

That's what I used to think, too, until some people like Tridam decided to prove me wrong with most logical arguements around the "NV40's launching any day now" era

It's not because the shading is the bottleneck much of the time that it always is, and peak output matters too - it doesn't when bandwidth limited, though, but Z read/write is only very rarely bandwidth limited. Peak color also has uses though, but only so much is required for it.
What I'm saying, basically, is that if we see a 20 or 24 (or even 28/32) ROP parts soon, we shouldn't just say it's a useless feature because on current cards it would only give a 5% increase (which seems like a correct number to me), since in that timeframe it might be nearer to 10, or maybe even 15%.
Once again, I find the writer a little bit vague on these points, so if I'm nitpicking or misunderstanding, I apologize - my points still stand though, but they wouldn't be related to the article then.

---

As for the actual reason-to-be of the article: my personal preference would be to have a benchmarking suite that would be weighted and sufficiently universal for many people to agree for it to be objective. The problem is, asking for people to believe that weighting different benchmarks like that is objective seems a little bit out-of-this-world to me, but it would most certainly be a more elegant solution for Joe Customer should it happen. Not perfect, but simple and efficient.

Such an (auto-?)benchmarking suite should ideally mix game benchmarks with a few synthetic ones for future-proving, but it should also be updated frequently to stay up to date with the latest games and IQ features. IQ should also be tested, "objective, number-crunching" *benchmarking* of IQ is a little bit more difficult, but possible at least for "basic" AA/AF imo with a little bit of programming effort.

Overall, the problem is taking several benchmarks and making all reviewers and customers agree it's roughly objective and representative. Unless the suite is RIDICULOUSLY big, that seems a little bit hard to me. Still, the "best way" to measure architectures doesn't exist because it changes with the way games are programmed, and different architectures have to be summarized in different ways.

Lots of people have asked the theorical question, but I would tend to believe it's more rhetorical until someone manages to impose an independant benchmarking suite composed of a high number of separate tests. And I still wonder if I'll ever see the day of that, or of anything better for that matter - Dave?

Uttar
P.S: trinibwoy: http://www.behardware.com/articles/574-5/nvidia-geforce-7800-gtx.html - it doesn't seem to be a big issue to me, but still, I fully agree with Tridam & Marc on the pov that this is NOT acceptable for a ultra-high-end part!

no-X · Aug 1, 2005

Jason Cross said:
I was told by one engineer that the performance benefit of moving from 16 to 24 ROPs would be less than 5%, but it would come at a considerable cost in transistor count.

It was mentioned here too...
+50% ROPs = +<5% FPS

GF6600 has 4ROPs/8pips... so 6600 with 8ROPs would be up-to 10% faster, ok? Isn't this the reason for the fact, that 6600 clocked at the same frequency as X700 isn't much faster but 6800(16p/16R) clocked at the same frequency as X800(16p/16R) is markedly faster?

tEd · Aug 1, 2005

trinibwoy said:
That's the second time I've seen you comment on the IQ on the GTX. Do you have specific examples and is it only with optimizations enabled?

So far i can only base my comments on some other peoples experience and have not first hand prove. Technically the AF calculations with the review drivers used is borked they tell me.

Word is that the AF on HQ with optimzation disabled on G70 is only slightly better than on a nv40 on Q settings which means not good enough. Tendency to shimmering/texture aliasing as we are used to on nv40 with default setting.

I am waiting for further tests on the subject but if it is indeed as bad as it sounds benchmarks done with HQ AF with G70 shouldn't be compared to nv40 with HQ or x800 AI off maybe even on low

Arty · Aug 1, 2005

no-X said:
It was mentioned here too...
+50% ROPs = +<5% FPS

GF6600 has 4ROPs/8pips... so 6600 with 8ROPs would be up-to 10% faster, ok? Isn't this the reason for the fact, that 6600 clocked at the same frequency as X700 isn't much faster but 6800(16p/16R) clocked at the same frequency as X800(16p/16R) is markedly faster?

And it was him that mentioned that here and I questioned its validity. *oops*

*edit* I never said G70 isnt a true 24 pp GPU, that was some one else. :neutral:

Ailuros · Aug 1, 2005

tEd said:
So far i can only base my comments on some other peoples experience and have not first hand prove. Technically the AF calculations with the review drivers used is borked they tell me.

Word is that the AF on HQ with optimzation disabled on G70 is only slightly better than on a nv40 on Q settings which means not good enough. Tendency to shimmering/texture aliasing as we are used to on nv40 with default setting.

I am waiting for further tests on the subject but if it is indeed as bad as it sounds benchmarks done with HQ AF with G70 shouldn't be compared to nv40 with HQ or x800 AI off maybe even on low

Albeit my financials are disastrous lately, there's a chance that I might upgrade soon. If and whenever that happens I plan on another write-up about it. I call it write up because I can't stand the senseless concentration on colourful benchmark graphs and I actually play through the games I'm testing, which means I'm actually looking at the screen *cough*.

trinibwoy · Aug 2, 2005

Uttar said:
P.S: trinibwoy: http://www.behardware.com/articles/574-5/nvidia-geforce-7800-gtx.html - it doesn't seem to be a big issue to me, but still, I fully agree with Tridam & Marc on the pov that this is NOT acceptable for a ultra-high-end part!
[/size][/font]

Ah thanks. Haven't noticed that in game as yet but I'll keep my eyes and ears open for similar reports from other people.

tEd said:
I am waiting for further tests on the subject but if it is indeed as bad as it sounds benchmarks done with HQ AF with G70 shouldn't be compared to nv40 with HQ or x800 AI off maybe even on low

OK.

Reverend · Aug 2, 2005

Dave Baumann said:
So, the choice is: go deep into the architecture and try to explain, or hover of the the top of the architectural issues and present a bunch of numbers, IMO.

You're asking for feedback on site direction or is this an advice to sites?

Luminescent · Aug 2, 2005

I believe that my suggestions for the way GPUs should reviewed are already being followed quite well by B3D, but I'd just like to make some further clarifications about my view.

Dave Baumann said:
http://www.extremetech.com/article2/0,1697,1841940,00.asp

IMO the problem with the entire line of thought presented here is that there really isn't much of an answer for comparing the hardware for an ordinary consumer - we'll still be looking at certain key metrics, which fill-rates and texture rates will be, but realistically speaking all the ordinary consumer can be concerned about is the price, performance and features. Even then the water gets muddy - with DX10 we have less room within the API to differentiate features, and reading performance from synthetic benchmarks is going to relate even less to reality when we have unified architectures. So, the choice is: go deep into the architecture and try to explain, or hover of the the top of the architectural issues and present a bunch of numbers, IMO.

I'd go with a combination of both. Consumers need to be able to correlate advertised/propogated hardware capabilities with their theoretical function so that they know what to look/not look for when selecting hardware, based on how these "features" affect performance and versatility. They do need, however, some tangible way to measure how this translates into productivity and to what degree.

In addition, I believe it is important to add the substance of real-world performance metrics/tests through a combination of cutting-edge and typical/generic software applications so that the consumer is exposed to the ways a given architecture addresses different real-world scenarios. There needs to be mention of what each app typifies scenario-wise and, as a result, what it can expose/make known about hardware. This way the consumer knows what exactly why each test is relevant and its implications about hardware performance, assuming all possible bottlenecks/sources of error are mentioned and addressed so that conclusions can be placed with accuracy.

Also, it is important to include ISV input on where future software is headed and what is to like/dislike about cutting-edge hardware, with perhaps some synthetic apps see how hardware addresses up-and-coming scenarios.

IgnorancePersonified · Aug 2, 2005

At least the commentary/reviews in 3D is better than for the new car industry where the number of cup holders seems to be important. There must be a similar feature for video cards.

_xxx_ · Aug 2, 2005

IgnorancePersonified said:
At least the commentary/reviews in 3D is better than for the new car industry where the number of cup holders seems to be important. There must be a similar feature for video cards.

Yeah, I heard it's called a "desk"

Randell · Aug 2, 2005

Ailuros said:
Albeit my financials are disastrous lately, there's a chance that I might upgrade soon. If and whenever that happens I plan on another write-up about it. I call it write up because I can't stand the senseless concentration on colourful benchmark graphs and I actually play through the games I'm testing, which means I'm actually looking at the screen *cough*.

That'd be interesting, because quite frankly NV40 shimmering on some settings is bad enough. What I'm quite pleased about is that I have no hankering after the latest high end GPU for the first time in 4-5 years, price point barrier notwithstanding.

Richard · Aug 2, 2005

_xxx_ said:
Yeah, I heard it's called a "desk"

Last I heard they were called CD-drives...

Anyway, I prefer to actual know the specifics. What is changing, how it is changing, why it matters but if the IHVs are not being forthcoming enough (re: different clock speeds for different units of the Core) I'm entirely confortable in using slightly outdated terms.

Ailuros · Aug 2, 2005

Randell said:
That'd be interesting, because quite frankly NV40 shimmering on some settings is bad enough. What I'm quite pleased about is that I have no hankering after the latest high end GPU for the first time in 4-5 years, price point barrier notwithstanding.

Depends where the shimmering comes from on NV40. Often if you don't use the clamp for the LOD bias, even high quality won't save the day nor supersampling.

I'm not sure yet, but from what I've been gathering that one and any possible issues on G70 shouldn't be connected. The complaints so far point at G70 HQ not being any better than NV40 Q. I don't recall any review having dealt yet with high quality anisotropic when benchmarking with G70.

Ailuros · Aug 2, 2005

As for the topic at hand: it can or should be a task of responsible websites and/or reviewers to at least attempt to clarify a few tidbits for the public to understand what exactly they are looking at.

No matter how complicated architectures will get, there will always be a way to get a fair enough understanding of the underlying details. Those interested can always read up and inform themselves. The more and the better websites try to "educate" their readers, the better the understanding will be.

While I have nothing against extremetech, rather the contrary, simple users can ramble as much as they want on message boards and not knowing what they're talking about. They'd better watch avoiding their own mistakes/mischiefs in the future, before worrying about that one.

JasonCross · Aug 4, 2005

The point of the editorial (and that's what it is, just an editorial opinion) is NOT that tech sites and reviewers should ignore talking about pipelines, or using that terminology in the future.

The point was this: people have been using "pipelines" and megahertz as the shorthand descriptors for GPUs for a long time now. "It's a 520MHz, 16 pipeline card" or "it's a 400Mhz 12 pipeline card." I think as we move on, those aren't going to be very useful descriptors anymore. We need other "shorthand" ways to talk about GPUs in general terms. I'm not sure what that should be... ALUs maybe? We probably need one more generation of video cards, moving past the current architectures, before we really come down on what's the best way to talk about it.

But already we're at a point where the "pipelines" between GPUs have different capabilities - they can retire different amounts of various operations per cycle, under different conditions. Some have seperate texture address units, some don't. We have three seperate "sets" of pipelines - vertex shader, pixel shader, and ROPs - that are all exist in different amounts.

Naturally I expect full-blown reviews to give detailed descriptions on what is in the chip. How many of each type of pipeline, what they can do, and how it's different from other architectures. I'm only talk about the casual, man-on-the-street, one-setence descriptors for GPUs that we've clung to for years and is increasingly becoming less and less relevant.

Dave Baumann · Aug 4, 2005

One "ALU" is not the same as another "ALU" - they have different capabilities, different instruction throuputs and different compsitions. On top of that, how the IHV's class "an ALU" can differ from one to the next without disclosing the full details of that "the ALU" can do - for instance, ATI classify the entire shader unit per pipeline as "an ALU" despite the fact that it can dual issue instructions (and they didn't disclose that R300/R350's shader architecture was comprised in such a manner for a year). Add to that there are other factors in the architecture that can effect the shader performance away from what you may expect looking and trying to understand the purely theoretical performance.

Graphics procoessor are becoming more like stream processors and influencing factors such as DirextX10 are likely to make them converge more in function but they can quite possibly, much like desktop processors, diverge in implementation.

Geo · Aug 5, 2005

It seems to me there are two separate issues here.

1. Is just agreeing on what language means, so when I say "A", enthusiast X hears "A" and not "B". This is best exemplified by the "pipelines" discussion and whether GTX is a 24 pipeline beast or a 16 pipeline beast. Frankly, I don't give a rats a**, just let's all agree to be consistent on it so we can have a rational discussion. I lean 24-pipeish in this day and age, but I'd cheerfully use the other definition if everyone else agrees to use it.

2. After the bloodletting of #1 is resolved, we still have evolved to a situation where it doesn't tell you enough. #1 helps the mid-core and hard-core techie talk "speeds and feeds" (to date myself), but things have gotten too complex for it be useful in marketing terms for the great mass. And they generally don't care about the hoary details of the backend --they want to know what comes out the front end, so the metric needs to be front-end based in its terminology. Something that combines fill rate, shading, and aa/af in one number seems to make a lot of sense to me. The shading part, IMHO, needs to be synthetic, and relatively simple. Maybe that means that it only lasts for a couple years before you need a version 2.0, but oh well. The shader should be relatively short, straightforward, and as "typical" as it can be. The source of course has to be public so everyone knows what is going on.

ExtremeTech: A New Way to Talk about GPUs

Dave Baumann

Gamerscore Wh...

tEd

Casual Member

trinibwoy

Meh

Arun

Unknown.

no-X

tEd

Casual Member

Arty

KEPLER

Ailuros

Epsilon plus three

trinibwoy

Meh

Reverend

Luminescent

IgnorancePersonified

_xxx_

Randell

Senior Daddy

Richard

Mord's imaginary friend

Ailuros

Epsilon plus three

Ailuros

Epsilon plus three

JasonCross

Dave Baumann

Gamerscore Wh...

Geo

Mostly Harmless

Similar threads