The Inquirer Trying to Scoop B3D!

I'd be interested in fillrate tests with 3 textures/arithmetic operations. And I expect the GFFX to act like an 8 pipeline design there (i.e. 8 pixels/3 clocks average), because it is not output limited.
 
THe_KELRaTH said:
Heh, it maybe a tad difficult to get any info from NVnews forum as any threads that might put Nvidia in a bad light are being closed.

Damn, you're right! I was there last night, and there was a poll asking which card you would buy now...... and now it's gone! How sad.....
 
Crusher said:
It seems you're viewing me as defending Vince as opposed to simply addressing your statements for their own value. I never said Vince's response was called for, and I think the only time I mentioned it was when I was talking about one of the underlying points he made. That point being that you can't blame something for being 4x2 instead of 8x1 if it's really neither of those things, and I extended line of reasoning with my own view, which was to say the reason marketing called it 8x1 was probably because some engineer informed them that it's capable of producing 8 single textured pixels per clock under some circumstances. That doesn't mean it's capable of running a single textured fillrate test at 8 pixels per clock, but that doesn't make it a useless ability either, as you seem to claim it is. Wait, I better make a complete quote of exactly what you said so you can't claim I'm manipulating your meaning again...

demalion said:
What isn't useful is calling something 8x1 when it can't exhibit the characteristics associated with the name in any useful circumstance.

And from the Inquirer article:

some NVIDIA guy said:
"GeForce FX 5800 and 5800 Ultra run at 8 pixels per clock for all of the following:
a) z-rendering
b) stencil operations
c) texture operations
d) shader operations"

Therefore, your statement means that you consider z-rendering, stencil operations, texture operations, and shader operations to all be useless.

First, let's get over your incomplete quoting habit:

"Only color+Z rendering is done at 4 pixels per clock".

Then, let's dismiss texturing, as a 4x2 architecture would operate equivalently to an 8x1 architecture, and the stated problem is when it behaves differently than a 4x2 and like an 8x1. EDIT: except for Chalnoth's stipulation.

We are left with "shader operations that do not output z buffer/stencil and color data for the same pixel".

You're right, outside of replying to Vince's post, that statement of mine you quoted is incorrect, as a useful circumstance for this behavior will be when performing stencil shadow operations, depending on how effective Z compression is. That's why I had "so far" in my original comment, but I guess I slipped up when addressing your reply.

I'm not sure what other circumstances where 8x1 is different than 4x2 on the 128-bit bus the GF FX will exhibit 8x1 characteristics, and we won't until we discover why the shader performance is exhibiting the characteristics it is. The info the inquirer is stating and the Tech-Report says they received from nvidia seems to indicate these opportunities do exist, but "so far" it hasn't been demonstrated.

Please note, Crusher, that Chalnoth's reply addressing your concern was a bit more brief.
 
Crusher said:
The only two reasons it underperforms expectations, is that people had the erroneous idea that being delayed would make it significantly faster, and that it was supposed to stomp all over the R300 no matter what.

Wrong.

The MAIN reason why it underperforms expectations is because it has been touted as an 8 pipe card running at 500 Mhz. (A pixel fill rate of 4 GPix.sec) Meaning, that it was understood to have a large pixel fill rate advantage over the R300.

Had the card been advertised / previewed with something more like a 4x2 architecture (or 4 pixel pipes), it would be performing more or less very close to expectations. It would have a "texel rate" advantage, but a pixel rate disadvantage. And thus, it would accordingly have performance advantages and disadvantages more in line with what we've seen.

All we're asking for is an honest representation (via specs) of the card's architecture. We didn't get it, and thus, there is a back-lash. Quite simple.
 
"GeForce FX 5800 and 5800 Ultra run at 8 pixels per clock for all of the following:
a) z-rendering
b) stencil operations
c) texture operations
d) shader operations

Z-rendering and stencil ops can use those units that are used for 'free' MSAA. As other have said that would make a very good design choice.

Texture ops: 4x2 = 8 textures per cycle. Nothing surprising here.

Shader ops: if they can do 2 16bit FP instruction per cycle and they have 4 pipes makes sense too. It could also mean 1 arithmethic op and 1 texture op (something like the R300 but without the scalar unit).

And they are hinting or saying (not sure about that) that they can't do more than 4 color writes per cycle.

Other than clearly and oficially stating that the NV30 is a 4x2 architecture (that seems what they are trying to avoid may be for PR reasons) I think those comments are pointing to something very near to a 4x2 architecture. It looks like they don't want to lie but they don't want either to say explicitaly what those numbers are telling.
 
THe_KELRaTH said:
If it turns out that the NV30 does in fact have a flexible pipeline where the format can change in order to provide the highest level of efficiency why on earth didn't the NV PR ppl supply this info.

Perhaps because it doesn't work quite the way it is supposed (or intented) to... :?
 
Ichneumon, the CineFX doesn't work as to be useable package, it's far too slow at high precision and it lacks Gamma correction but it's a central part of their PR.
 
Fuad emailed me back regarding the article posted at The Inquirer. I have asked his permission to post his reply in full and if he lets me I will post it here. Basically he does acknowledge the Beyond3D forums ;)
 
Rev,

I agree with it. To me its not a big deal what the card has (8x1, 4x2, 16 x.5 ect). I am just a little miffed that we were told 8 Pipes with out a little footnote saying (under said conditions) unitl now.
 
jb, I said that because I personally have yet to encounter a possible scenario where this really matters. I could be wrong but I'll bet most developers don't care either.

But for clarification's sake, it is better to have this sorted out.
 
I just dont agree that it does not matter. If it did not matter we would not even be having this discussion right now.

How can you say it does not matter when they have half the single texture performance, half the DX9 shader performance etc. How convniently everyone forgets that the Card has a 175mhz clock speed advantage. Thats like benching a 9700pro against a 9700pro + 1/2 of another one. Yet they barely win by 10% in most cases.

In all the traditional tests that do use Multitexture they clearly have a clock for lcock fill rate advantage of 1.4G Texels. Which is the only thing keeping their head above water right now. in raw shader performance they fall by the wayside.
 
HellBinder,

what I ment is I dont care what the internal specs of the card is (8x1, 4x2, ect) so long as the perfromance is there. I am not debating if this is the case or not. I was just saying the specs are not what matter its the perfromance and what we were told by NV is what I have issues with.
 
Joe DeFuria said:
Had the card been advertised / previewed with something more like a 4x2 architecture (or 4 pixel pipes), it would be performing more or less very close to expectations. It would have a "texel rate" advantage, but a pixel rate disadvantage. And thus, it would accordingly have performance advantages and disadvantages more in line with what we've seen.

Joe, hypothetical question for ya. Just chew it around (no forward-looking pun intended) and comment if you'd like:


I give your kid 10 wooden blocks to play around with and tell him that by grouping them together into lines of various sizes, he'll be rewarded with various candy that correspond to the line he formed.

4 blocks in a line = Reeses (my favorite, not his)
3 blocks in a line = Snickers bar
2 blocks in a line = Butterfinger bar
1 block in a line = Kit Kat bar (wouldn't really be a line,more like a point, but)

So, being the young enterprising scholar that he is - he quickly finds that because I only gave him 10 blocks, there is a finite upperbounds on the amount of candy that he can get from my cheap ass at any x point in time.

After a bit more conjecture and play, he finds that at any one time he can only recieve 2 Reeces, or 3 Snickers, 5 Butterfingers, or 10 Kit Kat's.

But, being your typical kid, he finds that his candy preference changes as time and his tastes vary. Hey, who wants to eat nothing but Snickers bars? So, he plays a bit more and finds that, in reality, he only has an upperbound on what combination of candy he can have with respect to a point t in one frame of time. In theory, he sees that he has tremendous flexibility to pick and choose any combination he wants - aslong as it fits under the bound imposed by Vince of 10 blocks.


So, his dad :LOL: comes home and sees 2 Snicker's wrappers laying in the middle of the floor next to 2 Butterfinger wrappers. So, assumes the blocks are grouped in 2 main 'Snickers' pipelines of 3 blocks a piece, each of these having a 'Butterfinger' pipeline. All is good.

The next day he comes home and sees 1 Reeces wrapper laying beside 3 Butterfinger wrappers. Whoa! This would require 1 'Reeces' piepline composed of 3 blocks, with it having 3 'Butterfinger'pipelines strapped on. He goes into shock and asks how this could be? He thinks about it and descides that he was probobly mistaken based on this new observed data. But, instead of mouthing off in typical Anti-Snickers fashion, desiced to ask his kid first.

His son, being the prodigy he is didn't even think this was a problem or source of confusion and says, "Dad, you're so old-school - nobody superglues their blocks together anymore since I thought this up." And then goes on to describe the above process, that there is no fixed "4*4" -esque pipelines, but rather virtual pieplines that the son changes depending what he wants at time t.

But then the dad asks,"But, but... is it 1*3 or 2*2"? And the son walks away....

So, the question is... did nVidia superglue their blocks or not?

PS. You'll be waiting a long time for that appology
 
who cares?

Here is a new candy story for you:
You tell you kid "I can give you 8 units worth of candy every day. A reeses is worth 1 unit, butterfinger 2, snickers 4."
The first day, all is well - you kid asks for 4 butterfingers. But then the next day, your kid asks for 4 reeses and two butterfingers, and you tell him "whoops! Its really 8 units of candy, with a maximum of 4 pieces!"

The question is, does it perform?

The answer is:
Not up to expectations created by marketing.
Not up to expectations created by 8x1 claims and 500Mhz clockspeed.

That is the problem.
 
Only we haven't yet seen benchmarks with multitexturing and an odd number of textures (well, I guess non power-of-two would be a better way to put it). As long as the only performance deficiency is at single texturing, then it really is not a big deal. If there are also performance issues at 3, 5, 6, and 7 textures per clock, then the FX does have some architectural problems.
 
Althornin said:
who cares? Here is a new candy story for you:

Cute :)

The question is, does it perform?

Preformance as such isn't a function that can actively and reliable be used to gauge a modern, underlying, architecture. If you stick with my example, you'll get unexpected speed-ups when doing certain tasks in comparason to a legacy architecture that can't supply the processing power per cycle - which would seem to indicate, say, a 16*0 architecture. You'll also have situations, most are heavily taxing, that the legacy architecture with fixed processing elements may actually win at and the flexible architecture is outputting at 4*2 like levels.

The answer is:
Not up to expectations created by marketing.
Not up to expectations created by 8x1 claims and 500Mhz clockspeed.

Expectations of an 8*1 claim? What the heck? Thats your fault then. If the achitecture can architectually output 8 pixels, then it's most comparable -especially in marketing eyes - to a fixed function 8*1 architecture thats easy for people to comprehend.

Don't you see this yet? The idea is that 'expected' preformance is irrelevent as you don't have fixed pipelines like you once did. It's plastic, it's flexible, it's intelligently designed.

Thus, perhapsit's time to stop saying xxx architecture is a "8*1 design,but rather xxx architecture outputs like 8*1 when doing this, and like 4*2 when doing this....

You'd think that people would have easied their way into this thinking with the progression in 3D architectures: From total fixed -> to loop-back -> to pipeline flexibility. Appearently, pipelines are a great way to compare IHVs
 
Reverend said:
Then tell him to do so officially in the future.

I did Rev.. I seriously doubt he would listen to me, I am not even an acquaintence of his and this was the first time I had e-mailed him.
 
Vince said:
Don't you see this yet? The idea is that 'expected' preformance is irrelevent as you don't have fixed pipelines like you once did. It's plastic, it's flexible, it's intelligently designed.

Dont you see this yet?

All this "anti-nvidia rhetoric" you keep complaining about comes from what i've said.
If you are to dense to get it, thats YOUR problem.
 
Back
Top