The Inquirer Trying to Scoop B3D!

yeah, i know, so whats his point?

the sentence makes no sense to me....

Is he trying to say the idea came from 3dfx?

Thats what i thought, but it makes no sense to be worded the way it is, to me at least.
 
RoOoBo said:
Or 4 pipes with 4 32 bit fp units each and 2 fps units for each TMU and 2 TMUs per pipe. As we don't know what NVidia calls 'functional unit' that means nothing. And it isn't just the functionals units, how many read/write ports to memory or caches? How many Z units?

hmm wasnt it Hellbinder who said the FX is 4x4?
 
is it possible that it the 4 "legacy" int 2x tmu pipes are completely seperate from the 4 fp pipes, and this is why we see half the performance we'd expect in either mode, yet still has "8 pipes" like nvidia says?
 
Mulciber said:
is it possible that it the 4 "legacy" int 2x tmu pipes are completely seperate from the 4 fp pipes, and this is why we see half the performance we'd expect in either mode, yet still has "8 pipes" like nvidia says?

i doubt it, that would really make it just a GF4 with FP bolted on
 
Mulciber said:
is it possible that it the 4 "legacy" int 2x tmu pipes are completely seperate from the 4 fp pipes, and this is why we see half the performance we'd expect in either mode, yet still has "8 pipes" like nvidia says?

That would be incredibly clumsy. R300's convert all to float method seems much more sensible from a design perspective. And why did NVIDIA chose to strap on all these legacy support when this is their first new architecture since conception? The only benefits we are seeing is for the Quadro line..
 
Randell said:
hmm wasnt it Hellbinder who said the FX is 4x4?

The hells wrong with you people. The amount of Anti-nVidia bias in here nowadays is sickening.


Like this comment above, are you a moron? Why do you have such a closed mind about this? Kirk just stated, if you're still dumb enough to think, that the era of fixed function pipelines with TCUs bolted on to a set-piece pipeline is over.

The time when you have a 4*4 or 8*2 architecture is over. Whether this is in the nV30 is irrelevent as you're still thinking like we did way back in 1997. With the rise in transistor budgets has come the move to added programmability while increasing preformance - at a fundimental level, the architecures are the same.

So, how do you know the fragment/pixel back-end of the nV30 isn't based around 32 processing elements that can form upto 8 virtual pipelines, with the number actual dependent upon the task/fragment being worked on. IMHO, if the architecture is anything like this, regardless if it's generally utputting in a 4 pixel comparable - it's a hell of alot smarter of a design than anything else this side of the P10.

The point is, it's too early to state that it's case A or B. For you to do it is not only ignorant, it shows just how rabid many of you are in your hatred of a particular IHV.

ATI also has a 'network of processing uints on a single dye'. It is just that we know how is arranged and how most those units are: 4 vertex shader units with a scalar and a SIMD vector unit each, and 8 pixel pipes with 1 TMU, one scalar and one SIMD vector unit per pipe. And the TMU is capable of doing bilinear filtering in one cycle (4 reads from a single texture). It is just they aren't playing the FUD game.

And if I'm not mistaken, the P10 can actually dynamically reassign processing elements. It's not FUD unless you know the story and it's wrong - you clearly don't yet. Lets wait and see before talking.
 
Eloquently stated, Vince....

The point is, 4x2 or 8x1 or flibbity by floo, the NV30 is only slightly faster than the 9700 (without AA or AF) despite a 50% clock speed advantage. We'll see if drivers are holding it back soon enough, but NV30's highly-flexible architecture doesn't seem so impressive ATM.
 
There was a rumour a while ago that NV30 would contain a similar thing to the texture computer used in Rampage. However, IIRC, this was either dropped at the last minute, or is non/partially functional. Could this possibly be the reason that we are seeing poor performance out of NV30? BTW, I'm not an EE, nor studying to be one, just interested in this sort of stuff. Therefore, if I'm way off base, excuse my ignorance.
 
Vince said:
Randell said:
hmm wasnt it Hellbinder who said the FX is 4x4?

The hells wrong with you people. The amount of Anti-nVidia bias in here nowadays is sickening.

How the heck is mentioning Hellbinder's "4x4" comment anything remotely resembling "anti-nvidia bias"? It seems likely now that Hellbinder's description may have been based on the description Randell was replying to...was he asking anything more than that?

Like this comment above, are you a moron?

Err..what? Where did that come from? The observed information has the GF FX exhibiting performance equivalent to a 4 pipe architecture in many cases. Why is he a "moron" for asking the question you quoted? Or, is there some other piece of text you forgot to quote that is the reason for your description?

Why do you have such a closed mind about this? Kirk just stated, if you're still dumb enough to think, that the era of fixed function pipelines with TCUs bolted on to a set-piece pipeline is over.

Is this statement supposed to make sense? "if you're still dumb enough to think,"? Eh? The simple observation is the R300 exhibits behavior consistent with being able to output 8 pixels per clock, and the nv30 does not (most of the time)...the "why" is under investigation, and it doesn't take "anti-nvidia" bias to wonder about it in the meantime.

The time when you have a 4*4 or 8*2 architecture is over. Whether this is in the nV30 is irrelevent as you're still thinking like we did way back in 1997.

You mean in 1997, when performance mattered...as opposed to now, when...? :oops:
Are you saying the nv30's performance isn't relevant now, or do you think there is some other reason this 4x? discussion is being brought up?

BTW, please take a look and atleast begin to try to show others the least bit of courtesy?
 
demalion said:
How the heck is mentioning Hellbinder's "4x4" comment anything remotely resembling "anti-nvidia bias"? It seems likely now that Hellbinder's description may have been based on the description Randell was replying to...was he asking anything more than that?

It's the mentality.

Err..what? Where did that come from? The observed information has the GF FX exhibiting performance equivalent to a 4 pipe architecture in many cases. Why is he a "moron" for asking the question you quoted? Or, is there some other piece of text you forgot to quote that is the reason for your description?

Are you missing this whole ideal? Instead of a 4 pipe architecture, the architecture is instead composed of an array of processing elements, ALUs, whatever. From there, these can be formed into virtual pipelines - based upon whats being rendered.

Thus, the whole paradigm of a 4 or 8 pipe architecture allways exhibiting the same, set-piece preformance is overwith. As Kirk stated in the quote that you all glazed over in your continued nVidia bashing.

Is this statement supposed to make sense? "if you're still dumb enough to think,"? Eh? The simple observation is the R300 exhibits behavior consistent with being able to output 8 pixels per clock, and the nv30 does not (most of the time)...the "why" is under investigation, and it doesn't take "anti-nvidia" bias to wonder about it in the meantime.

It takes anti-nVidia bias to dismiss the answer:

Pipes don't mean as much as they used to. In the [dual-pipeline] TNT2 days you used to be able to do two pixels in one clock if they were single textured, or one dual-textured pixel per pipe in every two clocks, it could operate in either of those two modes. We've now taken that to an extreme. Some things happen at sixteen pixels per clock. Some things happen at eight. Some things happen at four, and a lot of things happen in a bunch of clock cycles four pixels at a time. For instance, if you're doing sixteen textures, it's four pixels per clock, but it takes more than one clock. There are really 32 functional units that can do things in various multiples. We don't have the ability in NV30 to actually draw more than eight pixels per cycle. It's going to be a less meaningful question as we move forward...[GeForceFX] isn't really a texture lookup and blending pipeline with stages and maybe loop back anymore. It's a processor, and texture lookups are decoupled from this hard-wired pipe.

And instead talk about nVidia's lying and failure, ohh, and throw in some 3dfx bashing too.

Did you ever pause to contemplate that perhaps the reason why the nV30's preformance isn't fixed like the R300's, is because it is using the approach that Kirk talked about? If you have an array of processing elements, then you can only use their flexibility to do so much as you only have so many elements to share. Thus, it won't have the consistentcy of a fixed pipeline, but a virtual pipeline has more plasticity and - as stated - can achieve 16-odd ops in some tasks.

You mean in 1997, when performance mattered...as opposed to now, when...? :oops:
Are you saying the nv30's performance isn't relevant now, or do you think there is some other reason this 4x? discussion is being brought up?

What? Preformance matters, but so does architectural elegence and effeciency. I think you have absolutly no idea what I'm talking about.

I do fear, mind you, that this topic is sticking and growing in presence because of two factors (1) Ignorance to the Nv30's true design (b) the rampent anti-nVidia rhetoric thats rallying around this ideology that the Nv30 has only "4pipes" and is thus inferior in the nomenclature race with the R300s "8pipes"
 
What? Preformance matters, but so does architectural elegence and effeciency.

I don't care how elegant it is if it runs too slow. These cards are made for gaming and in gaming, performance matters. This whole thing started because the nv30 isn't performing like most thought it should. How you get this to be Nvidia bashing is beyond me.
 
@vince: do you think its wrong/bashing to talk about the architecture of the geforcefx? I want an explanation why nvidia stated that the fx card would be 8x1. if this isnt the truth why would be moronic to discuss the merits or lack of them. Anyways, as long as the discussion is relevant to the topic i would like to hear more.

later,
 
I hope this 8x1 or 4x2 confusion isn't blown out of proportions. Personally, I wouldn't place too much importance on this as long as the drivers can handle things in a smart way - it is generally better to have a 8x1 but I fail to see a driver team not being able to make things work like a 4x2.

If you're someone that is more concerned with exploring PS 2.0 then this becomes probably even less important.

My opinion of course.
 
As a side note, has anybody checked texturing speed with an odd number of textures? It seems possible that the NV30 would be capable of, say, 3-texture performance at the speed of an 8x1 pipeline (depending on how flexible the pipelines are).

If this is true, then the NV30 working as a 4-pipeline architecture when single-texturing is used will be of little consequence, as it will usually be bandwidth-limited in that situation anyway.

Then the real question becomes how much processing power at each pixel shader accuracy does the NV30 have?
 
I think the important factor is not so much as what format is better under what conditions but rather why did Nvidia not state exactly what format the NV30 is. Surely if their configuration was technically superior they would have PR'ed the thing to death by now.
 
Thus, it won't have the consistentcy of a fixed pipeline, but a virtual pipeline has more plasticity and - as stated - can achieve 16-odd ops in some tasks.

:rolleyes: By the same token, R300 can achieve 24 ops in certain situations. (8*[1 tex lookup + 1 scalar op + 1 vector op]); Kirk is almost certainly just referring to GFfx's ability to do (8*[1 tex lookup + 1 math op]), and it is far from clear that GFfx's shader is as flexible in terms of the circumstances in which this can occur.

The point is, both chips (indeed any DX9 chip) have a fixed-function pipeline and a shader pipeline, and both parts are crucial to performance on the workloads these GPUs will render. The fixed-function portion of GFfx is 4x2 with the ability to do z/stencil writes at 8/clock. The fixed-function portoin of R300 is 8x1. It does not make sense to categorize pixel shader pipelines as "NxM", but R300's has more throughput per clock (and at higher precision, FWIW).

Now. The Kirk paragraph you keep repeatedly quoting is little more than misleading marketing flim-flam to cover up the fact that the "8 pixel pipelines" explicitly claimed on NV30 PR are a fiction. This is not to diss Kirk, who is a great engineer, but only to put the quote in its proper context, i.e. something impressive-sounding to tell a non-technical audience in place of answering a question that marketing does not want answered. From all available evidence, the NV30 is no more flexible (with two exceptions, noted below), and quite a bit less successful at reusing functional units in the most efficient way, than R300.

The first of those exceptions is the ability to use FP32 at half-rate by combining some functionality (and adding more) from 2 FP16 units. This is, unfortunately, of extremely dubious utility outside the DCC market. The second exception is the ability to run z/stencil pass at double the fillrate of normal fixed-function rendering, which is indeed a clever and worthwhile design choice.

Other than that, though, there's no reason to call NV30's pixel pipeline any more flexible, elegant or plastic than R300's. The decision not to make (the fixed-function portion of) NV30 8x1 was the correct one, given the 128-bit DDR interface and the ability to output z/stencil at 8/clock. And Kirk's statement is at least commendable in the sense that he did not repeat the outright lie that NV30 has 8 pixel pipelines. But to try to make a virtue of Nvidia's deception is truly going too far.
 
Back
Top