Guru3d: 5700XT & 5900 have "8 (2x4) pixel pipelines

WaltC

Veteran
http://www.guru3d.com/article/Videocards/112/


Here are some of the relevant quotes in context, with the marketing spin in bold:

Hilbert Hagedoorn in a Guru3d review said:
When I first heard of the new XT series I figured it only would be fun marketing yet what I did not expect at that time was a product of this caliber in the price range. I'm surprised by this product as it in direct competition with the FX 5700 (Ultra). Performance wise this product will be quite a bit faster then the 5700 series yet only 20 buck more expensive. What makes this product so dominant over the 5700 is due to two reasons: First of all it has 8 (2x4) pixel pipelines compared to only 4 on the 5700. Next to that it is armed with 128 MB running over a fantastic 256-bit memory interface which guarantees high computational bandwidth for it's frame buffer.

Basically this is a somewhat slowed down GeForce FX 5900/5950 Ultra, NVIDIA's current flagship. The differences can be found on core and memory clockspeed and the result means value.

The GeForce FX 5900 produced around a GPU that is profiled as Cinematic GPU as it is capable of bringing cinematic visual effects on your PC with the combination of some brutal power and an excellent feature set. The CineFX GPU is of course capable of utilizing DirectX 9 Pixel Shaders 2.0+, Vertex Shader 2.0+ and OpenGL. Basically this product is in the high-end range and offers with its 8 pixel pipeline (4x2) a lot of gaming pleasure.


I'm guessing that the author is of the opinion that the way to divine the number of pixel pipes in a chip is to multiply the number of pixel pipes by the number of texture units, hence his description of both "2x4" and "4x2" as equating to "8 pixel pipes." I'll give him the doubt on the "2x4" and suggest it is a typo representing "4x2." I'll also wager that he doesn't know what the numbers in these descriptions represent, because if he did he would have known that "4x2" tells us we have 4 pixel pipelines with 2 texture units serving each of those four pixel pipelines, and so he'd never have said "8 pixel pipelines" in the first place.

I can only guess that he received an informative email, probably from the desk of nVidia's "chief scientist," which tutored him on the benefit of using common multiplication tables to work the formulae neccessary to unravel the complex reality of the pixel pipe arrangement in a modern, technically advanced gpu such as those produced by nVidia. I get warm and fuzzy all over when I consider how nVidia has unselfishly devoted so much of its public relations efforts over the last year to the task of applying numerology to aid the common man in answering some of life's most fundamental questions, such as, "Why is fp16 perfect while fp24 is both too much and too little?", and "Why is a 4-pixel-per-clock chip actually an 8-pixel-per-clock chip?", and "Why is a benchmark not a benchmark but in reality a game, instead?"

Yes, thanks to nVidia, 3d-card reviewers around the world are discovering the benefits of numerology as a handy tool they can use to unravel such baffling mysteries. See "4x2" but don't know what it means? No problem, just multiply 4 x 2 for the result of 8 and then label the result "pixel pipelines," and thus we have the mathematical proof for 8 pixel pipes in a modern, complex (and thus non-traditional), nVidia gpu. Numbers don't lie, do they?

Yes, thanks to nVidia, people around the world interested in 3d-chip fundamentals are enjoying a revival of ignorance and confusion not seen since the year 3dfx shipped the V1, the material difference being that then there was a legitimate excuse for confusion and ignorance because the 3d industry was brand new and navigating through wholly uncharted seas. Today's brand of ignorance is artificially imposed by the relentless PR efforts of a single company, unswerving in its dedication to the timeless proposition of the used-car salesman: "It's not the deal they get, but they deal they think they got, that counts."
 
Don't give them hits....don't forget Guru3D promotes cheating and applauded Nvidias driver optimizations(none other than the same author of this article--Hilbert), thought the 5800 was a great card...the bias shines greatly on Guru3D.



http://www.guru3d.com/comments.php?category=1&id=1852

I don't care wether its a 2% or a 25% difference, cheating is cheating. I actually applaud nVIDIA for the way they did it, if you do it then have the b@lls to do it well

Only Morons on Guru3D, the only person worth the time is Unwinder that spends time on their forums.
 
...

We need to destroy this 2x4/4x2/8x1/8x0/1x32 BS. It's not benefiting anyone but NVIDIA and XGI nowadays. And both are using it in the most lame ways imaginable by men.

I say we change the standard. Heck, isn't B3D strong enough nowadays to try to influence other sites to change such things? Even if B3D is alone on it, other forums such as nV News' are guaranteed to follow the new standard, as there is an extreme B3D influence in those places.

I suggest this new standard should include:
- TEX ops/clock; if trilinear/AF/... is done in a single clock cycle, which is extremely rare, inluce that fact.
- MAD ops/clock (different precision formats if required); showing the number of operations possible on both scalars & Vec4s
- Other instruction ops/clock if noteworthy
- Maximum Pixel(P) , Zixel(Z) and Complex Pixel(CP) outputs/clock
- Short description of how the loopback/bypass paths/... work.

Condensated techy examples:
NV30: 8 TEX, 4FP32 MAD, 4+8 FX12 MAD, V4, 4CP/4P/8Z, register penalties & 8Z bypass path
NV36: 4 TEX, 4x4FP32 MAD, 6FP32 M|A, 6FP16 MAD, V4, 2CP/4P/4Z, register penalties & 4P bypass path
R350: 8 TEX, (3+1)x8FP32 MAD, V3+S, 8CP/8P/8Z

Evantually, you might want to add RCP/COSIN/RSQ/...

Marketing example:
GeForce FX 5800 Ultra: 8 high-speed texturing units and 48 arithmetic mini-processors provide for over 10 billion fully programmable operations per second. Associated with state-of-the-art memory technologies, between 2 and 4 billion pixels can be put on your screen every single second, allowing for higher resolutions and industry leading performance!

That's just an example though, ain't gonna say BB what he has to do of course! ;) :)
Seriously though, I really think this whole 8x1/2x4/... BS has to stop. Those conventions are NOT acceptable nowadays and they're just confusing non-techy reviewers. For the sake of the industry, it has to go.


Uttar
 
Uttar said:
Condensated techy examples:
NV30: 8 TEX, 4FP32 MAD, 4+8 FX12 MAD, V4, 4CP/4P/8Z, register penalties & 8Z bypass path
NV36: 4 TEX, 4x4FP32 MAD, 6FP32 M|A, 6FP16 MAD, V4, 2CP/4P/4Z, register penalties & 4P bypass path
R350: 8 TEX, (3+1)x8FP32 MAD, V3+S, 8CP/8P/8Z

Evantually, you might want to add RCP/COSIN/RSQ/...

Doesn't quite roll off the tongue like 8x1 or 4x2 though does it? ;)
 
Hellbinder said:
Wow Walt.. I dont usually see you get worked up about this stuff so much.

Of Course I completely Understand. :(

I guess I'm just weary of seeing the same fundamental specification fallacies popping up in print, over and over again, in blissful ignorance of the facts as so many people have worked hard to establish them and to correct the public record concerning them in the last year. It would be easy for me to just blame the reviewer, but I thought about the fact that had nVidia been honest pre-nV30 and ever since about the number of pixel pipelines in nV3x as pertain to nV30/35/38, that there are four (4) of them, then even the most ardently pro-nVidia reviewer in the world would still have to present this basic product specification accurately, and would be doing so. So, the blame here is 100% nVidia's as such confusion as exists on the topic is entirely the result of the company's continuing misrepresentation of its nV3x chips. It's as if nVidia's telling the world "We don't care if it's 4 pixels per clock. We prefer to represent it as 8 pixels per clock, so that's what it is."

Right, sure. And we've all got subcutaneous, uv light-reflecting tatoos engraved across our foreheads which read: "I'm a dummy and I'll buy anything."

But in one sense certainly this has all become so routine and predictable it's almost boring. If nVidia ever wonders why people like me have a difficult time even associating the words "3d" and "nVidia" together these days...well, I guess they actually don't care, do they? Up until writing this post I hadn't stopped to reflect on it, but the truth is that when I think of 3d, nVidia never crosses my mind anymore, except in a peripheral sense, much as I presently think of 3dfx, or Matrox. It's been so long since I've seen nVidia do anything credible or relevant in 3d as I think about it that I honestly don't expect anything out of nVidia any more. What's really sort of surprising to me is the realization that...I really don't care, either. The thought is actually liberating...:)
 
Hanners said:
Doesn't quite roll off the tongue like 8x1 or 4x2 though does it? ;)

Okay, I shouldn't have put "condensated" behind it :LOL:
If you want something that "rolls off the tongue", then I suggest:

NV30: 8/4 & 4/4/8
NV36: 4/4 & 2/4/4
R350: 8+8 & 8/8/8
RV350: 4+4 & 4/4/4

But good luck making NVIDIA accept that one! :LOL:
Maybe if you accounted for the chip's maximum & minimum precision:
NV30: 8/4+8 & 4/4/8
NV36: 4/4+2 & 2/4/4
R350: 8+8 & 8/8/8
RV350: 4+4 & 4/4/4

But it's still slightly too confusing to my tastes... So, what about:

NV30: 4x(2&1+2) + 4Z
NV36: 2x(2&2+1) + 4P
R350: 8x(1+1+0)
RV350: 4x(1+1+0)

Or, if you want to step into the wonderful world of sheer idiocy, ATI-biased version:

NV30: 4x2&1
NV36: 2x2&2
R350: 8x1+1
RV350: 4x1+1

But if you want it to be NVIDIA-biased, you could do this:

NV30: 4x2&2 + 4
NV36: 2x2&3 + 2

---

Another solution is to use more averages:
For example, the number of "base" pipelines, multiplied by the average of TEX and arithmetic units. Arithmetic units is an average too, due to different precision modes. And because of the Vec3+Scalar advantage of ATI, you could use a 0.85 exponent. And to make lower precision formats disadvantaged, you could also use a 1.5 exponent for the precision. A Vec4 FP24 standard is assumed.

I could try to make an example, but it's been 10 mins I've been trying to calculate one, and I'm not finished yet! :LOL:

---

No, but seriously, what about:
NV30: 4x2&1 + 4Z
NV36: 2x2&3 + 2P
R350: 8x1+1
RV350: 4x1+1

Not as catchy as good ole 4x2/2x2/8x1/4x1, but an awful lot more accurate IMO (if you know what it means, of course!)
And what does it mean?

NV30: 4 base pipelines capable of 2 texop or 1 full precision MUL or ADD, and also capable of 4 zixels/clock on a bypass path.
R350: 8 base pipelines capable of 1 texop and 1 full precision MUL or(and) ADD.

Imposing the "zixel" and "complex pixel" conventions standard is a big part of making this representation possible IMO, as otherwise, NVIDIA is never going to accept it. But then again, it certainly makes the whole thing much more messy...


Uttar
 
Uttar said:
Hanners said:
Doesn't quite roll off the tongue like 8x1 or 4x2 though does it? ;)

Okay, I shouldn't have put "condensated" behind it :LOL:
If you want something that "rolls off the tongue", then I suggest:

NV30: 8/4 & 4/4/8
NV36: 4/4 & 2/4/4
R350: 8+8 & 8/8/8
RV350: 4+4 & 4/4/4

But good luck making NVIDIA accept that one! :LOL:
Maybe if you accounted for the chip's maximum & minimum precision:
NV30: 8/4+8 & 4/4/8
NV36: 4/4+2 & 2/4/4
R350: 8+8 & 8/8/8
RV350: 4+4 & 4/4/4

But it's still slightly too confusing to my tastes... So, what about:

NV30: 4x(2&1+2) + 4Z
NV36: 2x(2&2+1) + 4P
R350: 8x(1+1+0)
RV350: 4x(1+1+0)

Or, if you want to step into the wonderful world of sheer idiocy, ATI-biased version:

NV30: 4x2&1
NV36: 2x2&2
R350: 8x1+1
RV350: 4x1+1

But if you want it to be NVIDIA-biased, you could do this:

NV30: 4x2&2 + 4
NV36: 2x2&3 + 2

---

Another solution is to use more averages:
For example, the number of "base" pipelines, multiplied by the average of TEX and arithmetic units. Arithmetic units is an average too, due to different precision modes. And because of the Vec3+Scalar advantage of ATI, you could use a 0.85 exponent. And to make lower precision formats disadvantaged, you could also use a 1.5 exponent for the precision. A Vec4 FP24 standard is assumed.

I could try to make an example, but it's been 10 mins I've been trying to calculate one, and I'm not finished yet! :LOL:

---

No, but seriously, what about:
NV30: 4x2&1 + 4Z
NV36: 2x2&3 + 2P
R350: 8x1+1
RV350: 4x1+1

Not as catchy as good ole 4x2/2x2/8x1/4x1, but an awful lot more accurate IMO (if you know what it means, of course!)
And what does it mean?

NV30: 4 base pipelines capable of 2 texop or 1 full precision MUL or ADD, and also capable of 4 zixels/clock on a bypass path.
R350: 8 base pipelines capable of 1 texop and 1 full precision MUL or(and) ADD.

Imposing the "zixel" and "complex pixel" conventions standard is a big part of making this representation possible IMO, as otherwise, NVIDIA is never going to accept it. But then again, it certainly makes the whole thing much more messy...


Uttar
:oops: AAIIIIEEEEEEEEE!!! MY BRAIN HURTS!!!! :oops:

It's too complex, it'll melt the average persons brain or else they'll just let it slide right over 'em. I'm dizzy from just trying to figure out your last post still! :rolleyes: :LOL:
 
Uttar said:
...

We need to destroy this 2x4/4x2/8x1/8x0/1x32 BS. It's not benefiting anyone but NVIDIA and XGI nowadays. And both are using it in the most lame ways imaginable by men.

I say we change the standard. Heck, isn't B3D strong enough nowadays to try to influence other sites to change such things? Even if B3D is alone on it, other forums such as nV News' are guaranteed to follow the new standard, as there is an extreme B3D influence in those places.

I suggest this new standard should include:
- TEX ops/clock; if trilinear/AF/... is done in a single clock cycle, which is extremely rare, inluce that fact.
- MAD ops/clock (different precision formats if required); showing the number of operations possible on both scalars & Vec4s
- Other instruction ops/clock if noteworthy
- Maximum Pixel(P) , Zixel(Z) and Complex Pixel(CP) outputs/clock
- Short description of how the loopback/bypass paths/... work.

Condensated techy examples:
NV30: 8 TEX, 4FP32 MAD, 4+8 FX12 MAD, V4, 4CP/4P/8Z, register penalties & 8Z bypass path
NV36: 4 TEX, 4x4FP32 MAD, 6FP32 M|A, 6FP16 MAD, V4, 2CP/4P/4Z, register penalties & 4P bypass path
R350: 8 TEX, (3+1)x8FP32 MAD, V3+S, 8CP/8P/8Z

Evantually, you might want to add RCP/COSIN/RSQ/...

Marketing example:
GeForce FX 5800 Ultra: 8 high-speed texturing units and 48 arithmetic mini-processors provide for over 10 billion fully programmable operations per second. Associated with state-of-the-art memory technologies, between 2 and 4 billion pixels can be put on your screen every single second, allowing for higher resolutions and industry leading performance!

That's just an example though, ain't gonna say BB what he has to do of course! ;) :)
Seriously though, I really think this whole 8x1/2x4/... BS has to stop. Those conventions are NOT acceptable nowadays and they're just confusing non-techy reviewers. For the sake of the industry, it has to go.


Uttar


The issue is an extremely simple and fundamental one relative to all 3d architectures:

How many pixels per clock does a chip render to screen?

Every single one of the things you've listed is a sub-process, an "op," occuring in a gpu at various stages and in various ways and in various combinations (dependent on the architecture and software one runs) *prior* to the final pixel being rendered to screen.

In other words, no matter how many "ops" are done in a gpu relative to the formation of a pixel prior to its being rendered to screen, there is still a concrete maximum number of pixels a gpu may render to the screen per clock. With nV30/35/38 that absolute limit is 4.

So, what I think we need to do is relearn some basics and rediscover the difference between a pixel rendered to the screen, and operations of all kinds which occur inside a gpu that are not in themselves rendered to screen in any form, but are descriptions of how specific architectures may operate internally with respect to pixel formation/creation prior to rendering the pixel to screen.

Like it or not, pixels are the ultimate product of 3d chips--doesn't matter what the architecture is. As such, the number of pixels a 3d chip renders to screen per clock cannot be anything but the bedrock fundamental specification it has always been. Sure seems that way to me. I think confusing the number of finalized pixels per clock a gpu can render to screen with internal architecture-specific operations occuring in a given gpu that concern pre-render pixel creation steps is a huge mistake. They are two very separate and distinct categories of specification. Being able to render a certain number of pixels to screen per clock is a bedrock requirement for all 3d chips--it's universal--if a gpu doesn't render any pixels to the screen per clock then it simply doesn't matter what else it does internally as it will be uselsess as a 3d chip.

Along those lines, the thing all 3d chips regardless of architecture *must do* is render pixels to screen per clock. When pixels are rendered to screen they are, of course, final, and all sub operations relative to pixel formation and creation and treatment such as you mention in your post have already been done to completion. Again, operations-per-clock are architecture-dependent; pixels rendered to screen per clock are absolute and universal--3d architectures can be fundamentally different as to how they operate internally in regards to their approaches to pixel creation and associated "ops per clock," but they must all render pixels per clock to the screen in the same way. That's why looking at pixels per clock is anything but an arbitrary and disposable exercise. The confusion here exists because of a misunderstanding that "ops", like pixels, get rendered to screen. They don't, of course, which is the distinct difference.

This entire debate originated simply because nVidia did not wish to state nV3x renders 4 pixels per clock to screen, and when asked, cleverly (or not, depending on your point of view) concealed the issue with "ops per clock" numbers to obfuscate it, while couching the actual answer to the original question in the phrase "4 color pixels per clock" alongside other misdirecting info such as "8 B&W Z-pixels per clock." Because of the indirect, misleading fashion nVidia chose to answer the question of "How many pixels per clock does nV3x render to screen?", some were induced to wrongly infer that "ops" and "b&w z-pixels" were elements that could be rendered to screen just like what nVidia described as a "color pixel." Only final, and of course, color pixels are what "pixel-per-clock" specifications are concerned with. The rest of nVidia's answer to that question was irrelevant at best, and deliberately misleading at worst. Come to think of, the rest of it was both irrelevant and misleading, as to the orignal, "Does nV3x render 8 pixels to screen per clock?" question. To which the answer is "No."

Uttar, since rendering pixels per clock to the screen is an absolute requirement for all 3d architectures, and is quite separate and apart and distinct from the various and often very different internal operations by which those architectures create their pixels prior to rendering them to screen, it boggles the mind to hear you say that knowing the pixel pipeline organizational sturcture of any gpu is "BS." What I think is "BS" is to confuse "ops per clock" with "Pixels rendered per clock to screen." Regardless of how many operations per clock nV3x performs internally it is still limited to an absolute limit of 4 pixels per clock rendered to screen, and that limit is set in stone by the 4x2 pipeline organization of the chip. I find it inconceivable that such knowledge might ever be construed as "BS," sorry...:)
 
Even if B3D is alone on it, other forums such as nV News' are guaranteed to follow the new standard, as there is an extreme B3D influence in those places.
The problem with the current (pixel pipeline) x (texture units) is it is simple. Every Kyle Bennett incorrectly infers that the x is a multiplication symbol rather than something like the "by" in "2x4." But any shmuck thinks that they can understand it. Uttar's method is much more accurate, but it's not useful to the majority of people out there. To them, it doesn't make sense. It's as incomprehensible as the FP32 shader being replaced with an FX12 that somehow looks equivalent according to their pal Kyle Bennett (god, I'm picking on Kyle today! FEELS GOOD!). "If it looks equivalent, so what?" they think.

But, I like the new system. It's nice; it needs a definite legend somewhere, though. What does M|A mean, for example? MAD? The average person who just wants to buy the best card for their money doesn't know what these mean. Maybe they should (but god, if the intelligence levels on Counter-Strike servers are any indication of the average intelligence of humanity... just no), but they don't, nor do they care. They just want to know--is it a good card. For the slightly more technical folks, the new format should be used and should definitely replace the current completely inaccurate nomenclature.

But, most reviews are for the sake of buyers, not the technically inclined.
 
:LOL:
I say it's all NV's fault.
If you were to limit yourself to ATI's simple yet elegant R3xx architecture, you could get away with:
R350: 8x1+1+3
NV35: 4x2+4

Problem is NVIDIA got a lot of bypass paths; so the NV35 is best described as:
NV35: 4x(2+4) + 4x(0+0)
And the NV36:
NV36: 2x(2+4) + 2x(0.5+0)

*sigh*


Uttar
 
Sorry for the double post, was only responding to Dig above...
Walt: Problem is, your view on this favors NV31/NV34/NV36. Because they can't get to their maximum of 4 pixels/clock much of the time.

Those chips are 2x2, but they got a bypass path which (ab)uses the texturing units of the original path, which results in a 4x1 chip, officially. Problem is, in 4x1 mode, you can't use the loopback functionality of the chip, as it's not implemented in the bypass path!
So, for 3 textures, you're wasting 1/3th of your pixel output: For a 4x1 it's 4 pixels/3 clocks, thus 4/3. For a 2x2, it's 2 pixels for 2 clocks, thus 2/2.

Furthermore, for all NV3x chips, the bypass path is ONLY available with less than 4x AA. Otherwise, all ROPs are taken anyway, and the bypass path can't be used. Just like 8x1 can't be used on the R300 with less than 4x AA because they got only 2 ROPs/pipeline. So it doesn't really matter, but still, it's not like those chips ALWAYS were 2x2/4x1 or 8x1. Sometimes, they're just 2x2 or 4x2 (2.67x3 in 6x AA mode for the R300(!))

Disregarding that AA thing, I'd still say that if you claim the NV30/NV35 are only 4x2, and completely disregard their 4x0 bypass path, then it's unfair to say the NV31/NV34/NV36 are 4x1. Yet, it's also (less though IMO) unfair to say they're 2x2. Annoying, isn't it?

The maximum pixel output/clock is fully dependant on the operations being done in the pipelines and disregarding that fact is okay today, but it won't be going foward. In the next 3-4 years, I suspect all pixels will take either 1 clock in the pipelines, or hundreds of them. No in-betweens.
Let's imagine Doom 4 and let's say it uses the same rendering model, but a lot more shaders. That means the Z & Stencil paths would be completed in 1 clock/pixel/pipeline, while all other operations would use heavy shading which would take a looong while, and thus it wouldn't matter there how many pixels/zixels you can output per clock.

My point is you focus too much on numbers which are really nothing but old conventions regarding the maximum possible output. We should be beyond that IMO, because nowadays it's only part of the story. Taking the FSAA example again, did you know that if you activate 4x FSAA, the R350 practically becomes a 4x2 while the NV35 doesn't become a 2x4 but remains a 4x2? That's because the R350 has 2 ROPs/pipeline, and the NV35 has 4 ROPs/pipeline.

If a part is engineered towards 8x MSAA non-stop, then a 1x8 design with 8 ROPs/pipeline is just as good as a 8x1 one with 1 ROP/pipeline, and yet, it's less expensive transistor-wise. Just like if the NV30/NV35 *really* were engineered towards 4x MSAA non-stop, they wouldn't have that 4x0 path! *grins*

I'm *not* saying NVIDIA is right to lie and say their architecture is a 8x1 when it's clearly not one. Problem is, saying it's a 4x2 makes it look much worse than it really is, considering the disadvantage is nullified compared to the R3xx when:
1) in Z/Stencil-only mode.
2) using 4x MSAA or more.
3) using a large amount of arithmetic instructions or textures.

I'd love to see something along the lines of:
NV35: 4x2 + 4x0
NV36: 2x2 + 2x0.5
Or even better, including full precision MADs/clock, but as Dig says - Joe Consumer most likely won't understand it, at all.

Ultimately, a 4x8 + 4x0.5 + 4x0 + 4x0 might be smarter than a 16x2 in a few years. And how are you going to describe this now? 4x8? 8x4? 8x4 + 16x0? 16x2? If I'm right and a time comes when this type of architecture becomes mainstream, the current way to describe GPUs just won't cut it. Heck, even a MIPS number would be more accurate then!

Baron: M|A is Multiplication OR Addition. MAD is Multiplication And Addition. The NV35 is capable of 8FP32 M|As and 8FP16 MADs, but only 6FP32 MADs.


Uttar
 
What does M|A mean, for example? MAD?
Psst, M|A is not the same thing as MAD, so you didn't know ;)
Pretty normal too, considering I just invented that M|A thing :p

Makes sense though since | is OR in logic/code.


Uttar
 
No, those two sentences were, "What does M|A mean? (What does) MAD (mean as well)?" It was an understood thing. Sorry about not being clear--bell was going to ring in like 15 seconds at school.

ps--yes, the | thing makes perfect sense. oh, PHP, you have ruined me.
 
"Ouch!", sayeth the Dig and his brain in unison.

I'll stick with building the kittyputer for the rest of the day, this discussion makes me feel too bloody stupid. :(
 
NV30: (8T[FP32]|4A[FP32]#1) + 8 Reg-Combiners[FX12]#2)
NV31: (4T[FP32]|2A[FP32]#3) + 4 Reg-Combiners[FX12]#2)
NV34: (4T[FP32]|2A[FP32]#3) + 4 Reg-Combiners[FX12]#2)
NV35: (8T[FP32]|4A[FP32]#1) + ((8MUL[FP32]|4MAD[FP32]|4DP4[FP32])|8 Reg-Combiners[FX12]#2)
NV36: (4T[FP32]|2A[FP32]#3) + ((4MUL[FP32]|2MAD[FP32]|2DP4[FP32])|4 Reg-Combiners[FX12]#2)
NV38: (8T[FP32]|4A[FP32]#1) + ((8MUL[FP32]|4MAD[FP32]|4DP4[FP32])|8 Reg-Combiners[FX12]#2)

#1 1/SQRT(X) = 2A[FP32]|4A[FP16]
#2 Reg-Combiner:
RGB/XYZ 2MUL|1MAD|1ADD|2DP|1MUL+1DP|1A*B+C*D
A/W 2MUL|1MAD|1ADD|1A*B+C*D
#3 1/SQRT(X) = 1A[FP32]|2A[FP16]
 
I think places like Beyond3D need a "Everything you wanted to know about graphics cards, but were afraid to ask" page. I'm sure a lot of reviewers write crap because they know crap. Give some of them a free lesson.
 
Back
Top