NV40: 6x2/12x1/8x2/16x1? Meh. Summary of what I believe

Vince said:
OpenGL guy said:
Except that every product since the 5800 Ultra has shipped with clocks under 500 mhz. Where's the alleged superiority of stencil ops?

  • In this discussion are we referring to the 5800U?
  • If so, is the 5800U superior in absolute [theoretical] preformance over the then contemporary competition?

End of story as that's all I'm stating. Stop trying to enlarge this debate so you can show ATI superiority/nVidia inferiority. We all realize we're viewing a singular case at a specific point in time.
Maybe you should read peoples' posts more carefully.
Democoder said:
p.s. what's with this fixation on the 5800?
Vince said:
So, it's only "real" if it meets some arbitrary bound you describe? :rolleyes: Common bud, the product in question is being utilized in people's PCs to good effect, it's most definitly real.

I still consider most high-end sports cars as "real" even though there are only a handful in circulation. For example, the Ferrari F50 (one of my favorite looking cars) only saw 349 produced and sold at high price. Is it not "real" either? This is semantics cum insanity.
Sure, that's a real close comparison. How many other graphic chip models have sold in such low quantities as the NV30 Ultra? Cars, like the Ferrari, that are sold in such low quantities cost hundreds of thousands of dollars, and perform like it. Can't say the same about the 5800 Ultra on either account.
Yes, it's a close comparason for the reasons you stated. Both are marketed by their producer as special, high-end, high-price, low-production volume parts which cater to a niche during a limited production run after which they are superceeded by cheaper, lower-preforming products.
Except that the 5800 Ultra has to be the lowest production run of any high end chip ever, excepting perhaps the Savage 2000. Sure, this is common business practice in the 3D graphics world. :rolleyes: If you can't see the difference, then there's no point talking to you.
And get this, both are "real." True story too.
Take your attitude and shove it.
Except that I state, again, where's the superiority when every newer product has shipped at less than 500 mhz? I also use this as evidence that the 5800 Ultra was not a product. If the architecture is so "pipelined and revolutionary" then newer products should be shipping at ever higher speeds.
Get this threw your head, nobody is debating ATI's superiority since then. We all know it's there, we all know the current siuation in the 3D world. Nobody in this current debate cares about anything but what is being debated here - namely the 5800U and it's stencil preformance relative to the competition at the time. What ATI and nVidia and Michael Jackson did since then have no effect what-so-ever on how the 5800U compares with the competition. Get off your frickin' selfish, self-reinforcing, pathetic ATI high-horse and try to act like an engineer and not a Derek Perez disciple.
Get this through your head: I was questioning Democoder's conclusions about stencil optimizations. Now you butt in and get everything completely wrong and even resort to personal attacks. Good show!

P.S. Again, take your attitude and shove it.
 
Sigh. It's such a simple concept. That someone can "design for" something, and still not be the performance leader. AMD went with a CPU design specifically designed to maximize per-clock performance. Intel went with a design prioritized to achieve high clock scalability. AMD isn't beating the pants off Intel, even though, if they could clock an Athlon64 at 3.2 Ghz (real) it would blow the doors off a 3.2Ghz P4.

Nvidia designed in the ability to double stencil pipeline performance per clock, but their pipelines ended up so complex they could only put half as many on a chip, and moreover, big problems with their semiconductor process threw a wrench into their plans scale clock up to make up for potentially having less pipelines.

It turns out that NVidia and ATI's stencil performance are roughly similar. So how does that invalidate the fact that NVidia added a bunch of extra stencil features and a special double-pumped stencil mode? If I said "AMD went for per-clock instruction efficiency", but it turned out that Intel still equaled or beat them, does it invalidate the fact that Intel and AMD engineers in fact DID take radically different design philosophies?

It is quite clear to me that ATI put way more effort into making shaders run good, and supporting all the various FP texture modes and formats, whereas NVidia did not. But it is equally clear to me that ATI's stencil performance is more of a serendipitous nature -- it fell out of their design naturally, rather than being a deliberate high priority item "e.g. make stencils as fast as we can"

Can't people understand that this is not a value judgement on either company or product?
 
webmedic said:
Man this thread is a sick joke. Which arcitecture was superior? Which one worked and which one did not? Which card performed and which one did not? Which one did not have ot cheat in the drivers for the last year and which one did just to keep up?

Hey...we're trying to warm up for next month here.... 8)
 
Joe DeFuria said:
At least DC can be civil in his disagreements, and not blindly hypocritical.

Ahh, right. Civil... like you? But, that's not an excuse; nor do I need one. Especially when debating one who wants me to believe that the 5800U isn't a "real" "product." Other's have mentioned his prejudice in responce which is unwarrented, I'm not the only one.
 
Joe DeFuria said:
webmedic said:
Man this thread is a sick joke. Which arcitecture was superior? Which one worked and which one did not? Which card performed and which one did not? Which one did not have ot cheat in the drivers for the last year and which one did just to keep up?

Hey...we're trying to warm up for next month here.... 8)


thanks at least somebody read my post.


I assume more of the the same. Nvidia hasn't had enough lumps to learn it's lessons yet. I think you find in uttars comments the same kind of general idea.
 
Vince said:
Ahh, right. Civil... like you?

Absolutely not like me. I'm too "Italian" to not return the personal cracks once they are levied.

But, that's not an excuse; nor do I need one. Especially when debating one who wants me to believe that the 5800U isn't a "real" "product." Other's have mentioned his prejudice in responce which is unwarrented, I'm not the only one.

Oh...others have mentioned that the 5800U isn't a "real product" too, we're not the only ones.
 
Well, sometimes I am deliberately uncivil in the General Forums to troll. :)

Maybe OGL Guy can enlightenus as to the changes made to stencil implementation from R200 to R300, or R100 for that matter that were deliberate optimization attempts, i.e. to squeeze more stencils per-transistor, or per-clock, or per-pipeline, or whatever metric you want to choose. Again, this is for education purposes, and not to criticize ATI's design decisions.
 
Joe DeFuria said:
Vince said:
Ahh, right. Civil... like you?

Absolutely not like me. I'm too "Italian" to not return the personal cracks once they are levied.

But, that's not an excuse; nor do I need one. Especially when debating one who wants me to believe that the 5800U isn't a "real" "product." Other's have mentioned his prejudice in responce which is unwarrented, I'm not the only one.

Oh...others have mentioned that the 5800U isn't a "real product" too, we're not the only ones.


it wasn't a real product? I heard it was a leaf blower that doubled as a jet engine.
 
OpenGL guy said:
Maybe you should read peoples' posts more carefully.
Democoder said:
p.s. what's with this fixation on the 5800?

Ok, first off, I'm not Democoder. Second, lets put this in perspective and see why I'm singling it out:

OpenGL guy said:
I know that. My question was: How many 5800 Ultras were shipped?

This is the responce which I entered the debate upon. I then stuck with the 5800U, as that's why I entered, and you kept firing back responces so it couldn't have bothered you. So try to remember.

Take your attitude and shove it.

Sorry... is that a "real" comment?

Get this through your head: I was questioning Democoder's conclusions about stencil optimizations. Now you butt in and get everything completely wrong and even resort to personal attacks. Good show!

Actually I entered here when I asked you very nicely how the number of shipped product is indicative of it's underlying architectural ideology.

I was then treated to repeated comments such as this or such as this which took my responce out of context as per my origional comment about shipping volume and put it into one of superiority in products other than the 5800U as compared with ATI's SKUs.

Now, I don't think it takes a neo-Ramanujan to see that you need to put people's comments into perspective before laying on the standard ATI denial/shift to favorable ground argument.
 
Sidenote and a small reminder on stencil op performance:

FableMark 1024*768*32bpp:

9800XT@400MHz:

noAA: 76.0 fps
2xAA: 64.9 fps
4xAA: 38.3 fps
6xAA: 23.9 fps

5900@400MHz:

noAA: 62.0 fps
2xAA: 42.6 fps
4xAA: 36.7 fps

Yes it is a synthetic application, yet it is also very fill-rate limited.

Links from B3D reviews (I suspect those were run at default 16bpp):

NV38/NV35:

http://www.beyond3d.com/previews/nvidia/nv38/index.php?p=9

http://www.beyond3d.com//previews/nvidia/nv35/index.php?p=22#stencil

R350/R360/R300:

http://www.beyond3d.com/reviews/ati/9800xt_r360/index.php?p=19

Going by Wavey's results (where results come from the same system AFAIK) I can see a 9700PRO getting 65.5 fps and a 5800Ultra getting 66.1 fps.

1024*768*32bpp
9700PRO@325MHz:

noAA: 57.0 fps
2xAA: 46.7 fps
4xAA: 28.6 fps

Now despite the obvious OT considering stencil performance can someone kindly explain to me, where exactly it has been proven so far that the NV30 due to it's higher clockspeed yields better stencil performance and that in a pure stencil op concentrated synthetic?
 
Ailuros said:
Now despite the obvious OT considering stencil performance can someone kindly explain to me, where exactly it has been proven so far that the NV30 due to it's higher clockspeed yields better stencil performance and that in a pure stencil op concentrated synthetic?

The discussion was on the differing ideologies behind each respective architecture and that each IHV has talented people capable of producing valid and high-preformance ICs that are specialized to their beliefs of what's important in the marketplace.. So, it's a theoretical discussion and comments like this serve as the proof you seek:

[url=http://www.beyond3d.com/forum/viewtopic.php?p=225324#225324 said:
OpenGL Guy in responce to me[/url]]If you want to say that the 5800 Ultra was 50% faster because of its higher clock speed, I won't dispute that.
 
DemoCoder said:
Maybe OGL Guy can enlightenus as to the changes made to stencil implementation from R200 to R300, or R100 for that matter that were deliberate optimization attempts, i.e. to squeeze more stencils per-transistor, or per-clock, or per-pipeline, or whatever metric you want to choose. Again, this is for education purposes, and not to criticize ATI's design decisions.
How about 16 Z/stencil ops per cycle when AA is enabled?

Can I go now or is class still in session?
 
Vince said:
Ailuros said:
Now despite the obvious OT considering stencil performance can someone kindly explain to me, where exactly it has been proven so far that the NV30 due to it's higher clockspeed yields better stencil performance and that in a pure stencil op concentrated synthetic?

The discussion was on the differing ideologies behind each respective architecture and that each IHV has talented people capable of producing valid and high-preformance ICs that are specialized to their beliefs of what's important in the marketplace.. So, it's a theoretical discussion and comments like this serve as the proof you seek:

[url=http://www.beyond3d.com/forum/viewtopic.php?p=225324#225324 said:
OpenGL Guy in responce to me[/url]]If you want to say that the 5800 Ultra was 50% faster because of its higher clock speed, I won't dispute that.

It's obvious that I haven't followed the discussion for the past few pages and I really don't have much to disagree with the paragraph above. Au contraire I DO disagree with OpenGL Guy's note that the NV30 was 50% faster. Faster than what and exactly where, because I'm obviously blind.

I can see more raw fill-rate on paper and that's about it; I can also see severe possible bandwidth constrains for that very same fill-rate again on paper. In the case of the NV30 it's 8*500= 4.0GPixels/s vs. on the R300 it's 8*325= 2.6GPixels/s stencil fill-rate, yet it's also 16GB/s vs. 19.84GB/s bandwidth. Performance numbers are up there.
 
Ailuros said:
Links from B3D reviews (I suspect those were run at default 16bpp)
As far as any B3D testing is concerned, FableMark is always done in 32bpp (as are all the benchmarks).

Edit: Colour depth makes virtually no difference on a 5900U anyway:

Code:
	640	800	1024	1280	1600
16 bit	168.0	111.4	70.4	43.5	30.3
32 bit	167.3	110.9	69.9	43.0	29.9

That's using the 52.16 drivers on the same test machine used in all the testing that I do.
 
OpenGL guy said:
How about 16 Z/stencil ops per cycle when AA is enabled?

Can I go now or is class still in session?


Well, let's talk per pipe: Any changes besides adding multisampling? Most of the discussion of zixel fillrate to date has been with multisampling off.

e.g. any stenci/z fill changes with multisampling off?
 
Neeyik said:
Ailuros said:
Links from B3D reviews (I suspect those were run at default 16bpp)
As far as any B3D testing is concerned, FableMark is always done in 32bpp (as are all the benchmarks).

Thank you for the clarification.
 
D3DFableMark is a benchmark program featuring a puppet theatre in which a well-known fable, "The Hare and the Tortoise", is performed. Every object in the scene has soft-edged shadows projected onto the stage background and the rendering of these accurate shadows form the basis of this benchmark.

All shadows in D3DFableMark are soft-edged, and are rendered using the hardware stencil buffer. The CPU uses the light position and the model for each object to calculate multiple shadow volumes. The scene is rendered in multiple passes; first the scene is rendered using ambient lighting, which also prepares the depth buffer for stencil lighting. Then the shadow volumes are submitted and the contribution from each light source to each soft-edge volume is summed into the scene. Finally, the scene is rendered once again, this time modulating the textures by the results of the previous passes. The final result is a relatively high polygon count scene with approximately 95% of the polygons being translucent.

http://www.pvrdev.com/pub/PC/extra/h/FableMark.htm
 
Back
Top