3:8 vertex and pixel shader in X800 and NV40 - coincidence?

Ailuros · Sep 3, 2004

You said that they wouldn't get near their theoretical triangle limit. Yet both do, in fact, GeForce is even closer, it's just that the Radeon's limit is that much higher, so it's still far ahead.

And I still stand to it in the content I mentioned it. Maximum theoretical triangle rates are just that a theoretical number. They'll never reach those rates in real time gaming conditions, unless we're talking about a damn simplistic case scenario, which my former rather weird example attempted to illustrate. If I'd get in excess of 150-200M Triangles/sec from each of those accelerators out in today's real time gaming conditions, then I guess the initial GF2 MX was in the same consensus reaching it's advertised 20M Triangles/sec in real time gaming conditions of the past too.

What does that have to do with anything?

Everything since those VS units get used from simple T&L up to VS2.0 or VS2.0+ depending on their maximum compliance. Kindly take under consideration all case scenarios and not those that serve your own point best. There are tons of recent games out there that still utilize simple T&L code, some use VS1.1 calls and the more advanced ones VS2.0., last one obviously being the fewest.

Then you say they don't get near their theoretical limit, so I show some figures that indicate that they do.

Those links I provided before you, included those very same numbers. You showed me zilk, nada, that I hadn't seen or linked to before.

Who cares if they use VS1.1, 2.0 or fixed T&L? Especially on the Radeon it doesn't matter, since everything uses the same unit. I'm not too sure how the GeForce does it, I believe it had a separate fixed unit still.

ROFL

NOT SINCE THE NV20. You were saying? And yes I obviously care because of the simple fact what a unit gets used for and not what it's compliance exactly is. For me it means that in all fairness I'd have to evaluate it's usability in all possible scenarios and for what it mostly gets used for in games.

We WERE concentrating solely on the vertexprocessing aspect yes.

Again today's vertex processors take on from simple T&L functions to the highest vertex shader calls of their compliance.

You mean the completely horrible PS2.0 implementation? Yes, but that's not what we are discussing now.

Yes that one and it was truly close to horrible. Irrelevant to that the NV3x line's weaknesses relied mostly there and NOT in the vertex processing department (gladly take the static flow control performance of the NV3x as an exeption).

If it is a fact, there must be proof. I don't have to acknowledge anything that isn't proven. So come on, present the transistor counts of all the parts then. Else it's your word against mine, and I will continue to say that since pixelshader units are less complex than vertexshader units, they require less transistors, which ofcourse makes perfect sense.

When I say comparative terms then I obviously do not only mean Pixel Shader SIMDs, but NV4x's Vertex Shader MIMDs too. Texture samplers are expensive in hardware. Even more MIMDs. You need proof for what? Even if I had exact transistor counts I wouldn't be as foolish to hand them out. Feel free on the other hand to think that the ~60M Transistor difference between R420 and NV40 is exclusively due to the latter's PS3.0 support and nothing else. And it's by far not my word against yours. You better have a closer look around into what most had to say in this thread.

Gee, interesting. Also the fact that they don't test against any gaming cards from ATi or NVIDIA, and don't bother to test any games.
Apparently that card is nothing more than a low-budget professional card. I wouldn't be surprised if it didn't actually have Direct3D drivers.
In short, this is NOT a mainstream card, but aimed solely at professional applications, quite unlike the ATi and NVIDIA cards.

Of course did it have D3D drivers. Here's the entire family of former generation 3DLabs accelerators:

http://www.3dlabs.com/products/family.asp?fami=6

Here the VP560:

http://www.3dlabs.com/products/product.asp?prod=264

The new, unified 3Dlabs Acuity Driver Suite runs across the entire Wildcat VP family and includes highly optimized OpenGL and Direct3D drivers, a customized driver for 3D Studio Max and the new 3Dlabs Acuity Window Manager that provides precision window control
over multiple displays.

There was in fact one review with games I can remember from the P9, from a guy named Modulor, yet the site sadly no longer exists. Of course did it perform really bad in games and it was obviously aimed at professional applications only.

Current price ranges of the older generation products (lowest):

VP560= 140$
VP760= 246$
VP870= 329$
VP880 Pro= 329$
VP990 Pro= 544$

That's a complete former generation product line from top to bottom and while the 560 is the lowest player of them all, other offerings apparently aren't. Digit-Life compared it to the same generation products of ATI/NV back then, for which it did more than just well, especially it's straight competitors the Fire GL 8700 and the Quadro 550 (if my memory serves me well the GL 8700 was some sort of "8500LE" for the professional market).

ATI today:

http://www.ati.com/products/workstation/fireglmatrix.html#agp

http://www.xbitlabs.com/news/video/display/20040806084441.html

The FireGL Visualization series includes the entry-level FireGL V3100 with 128MB of memory, 4 pixel pipelines and 2 vertex processors; the FireGL V3200 which adds stereo 3D capabilities; the mid-range FireGL V5100 with 128MB of memory, 12 pixel pipelines, 6 vertex processors and stereo 3D output and the high end FireGL V7100 with 256MB of GDDR3 memory, 16 pixel pipelines, 6 geometry engines, stereo 3D and dual link capabilities.

Dave Baumann · Sep 3, 2004

Ailuros said:
There was in fact one review with games I can remember from the P9, from a guy named Modulor, yet the site sadly no longer exists. Of course did it perform really bad in games and it was obviously aimed at professional applications only.

You may also remember that they implemented "Slipstream", software deferred rendering on P9 - something that was not normally associated with workstation, and IIRC they only used it on D3D.

Ailuros · Sep 3, 2004

Yes I do and I actually wonder if they'll continue investing/building in the idea in the foreseeable future too. However I don't recall seeing any D3D professional applications running on P9 vs. P10, yet in D3D games, synthetics, overdraw tests etc. the humble P9 did fairly well if one considers that it's only a professional accelator and not a gaming card compared to it's bigger brother.

I also wonder if there's any kind of future expansion to OpenGL for Slipstream planned.

Maverick · Sep 3, 2004

A slight aside, but it's interesting to note that, as hinted at in the name, Slipstream was originally developed in order to ensure that both the vertex units and texture pipes were working at all times by providing a buffer in between them, and sort of "grew" into a tile-based deferrered rendering approach.

Scali · Sep 3, 2004

Thank you for completely missing the point, Ailuros.
I don't feel like continuing this discussion because it is not going anywhere, and you seem to get less reasonable every post.

aZZa · Sep 3, 2004

If slipstream has its roots in tbdr and keeping optimal usage of vertex & pixel units, might this develop in future parts or is the merging of units a more likely outcome?

Would it be easier to keep the chip more active & productive, sharing textures and data, if the units could perform both vertex & pixel calculations?

What is the most likely outcome here for future shader models (4/dxNext)?

Ailuros · Sep 3, 2004

Scali said:
Thank you for completely missing the point, Ailuros.
I don't feel like continuing this discussion because it is not going anywhere, and you seem to get less reasonable every post.

I've heard better excuses for someone dumping a debate in which he not only tried to turn presented documentation into his favour by deliberately selecting what supports his points best, but also a bunch of non existing facts.

Of course does the P9 obviously not contain any D3D drivers, of course is by your perception the VS alignement on the NV3x line underpowered, of course are the only weakness of the NV3x line theoretically it's geometry units alone, of course are VS2.0+ units so overly expensive that there isn't any logic behind it for ATI or any other IHV to keep all 6 of them on a mainstream part and of course isn't there a single threat to foresee from the competition in the face of 3DLabs and the possible strength of their future mainstream parts. As of course is geometry throughput totally irrelevant too, for let's say CAD.

And now if you'll excuse me there's still some interest left in this conversation and I'd like to concentrate on that.

Ailuros · Sep 3, 2004

aZZa said:
If slipstream has its roots in tbdr and keeping optimal usage of vertex & pixel units, might this develop in future parts or is the merging of units a more likely outcome?

Would it be easier to keep the chip more active & productive, sharing textures and data, if the units could perform both vertex & pixel calculations?

What is the most likely outcome here for future shader models (4/dxNext)?

Good question. From what I've read so far it's not entirely impossible that units will continue to remain separated on IMR's, irrelevant if the code get's unified in future API's. No idea of course for how long.

As for slipstream it seems to me that it'll be able to gain just as much as any software sollution would. While it most certainly theoretically could gain some advantages or parts of those from a TBDR, I have doubts (even more having more or less the former review and it's results in mind) that it'll ever reach the efficiency of a true hardware based deferred renderer. However if it would be able to gain a couple of extra points, without inheriting the disadvantages of a TBDR, the idea might prove itself as interesting.

Early Z optimized applications already have been labelled (and not unjustifiably so) as application driven deferred rendering. Here also the Deferred Shading article here at B3D comes to mind. If there is a way to combine advantages from both sides, then the idea might stand a chance.

One of the points TBDRs always have been in doubt, is parameter bandwidth or storage space and possible overflow. 3DLabs though is already using Virtual Memory for quite some time and from what it looks like the latter will be an essential part of WGF.

I'd love to see something like a re-evaluation from someone like Ilfirin of WGF, based on the more recent data. I somehow had some second thoughts/doubts when reading his conclusion on the future API and TBDRs, especially when thinking about the I/O model, but I could eventually be also completely wrong.

And to avoid misunderstandings I'm not trying to turn this one again into a TBDR vs IMR debacle. I'm mostly interested in future possibilities especially related to WGF and what the possibilities could be to further increase efficiency; and yes even if it's just through software since WGF seems to be so far a quite revolutionary software layer, bringing along quite significant changes.

Scali · Sep 4, 2004

I've heard better excuses for someone dumping a debate in which he not only tried to turn presented documentation into his favour by deliberately selecting what supports his points best, but also a bunch of non existing facts.

Ah nice, so we're going for the personal attack again. Look, you missed the point, and completely misinterpreted what I meant. Just forget it, instead of these lame personal attacks.

Of course does the P9 obviously not contain any D3D drivers

I never said it didn't. I merely said that I believed that 3dlabs at one point sold cards without (proper?) Direct3D drivers. I wasn't sure if it was still like that or not, obviously it isn't. But I never said they don't contain D3D drivers, so don't harass me about that.

of course is by your perception the VS alignement on the NV3x line underpowered

Again, this is not what I said. I was speaking of the 5600 and the change to the more powerful unit on 5700.

of course are the only weakness of the NV3x line theoretically it's geometry units alone

Never said that either, I in fact stated the opposite, but it was not relevant in this case.

of course are VS2.0+ units so overly expensive that there isn't any logic behind it for ATI or any other IHV to keep all 6 of them on a mainstream part

Never said that either.
I just said that it would be cheaper without those units, and there'd be little reason not to remove them, especially when half the pixel pipelines were removed anyway. And I also said that if they wanted to base a new line of FireGL cards on the same chip, it would be a different story.

and of course isn't there a single threat to foresee from the competition in the face of 3DLabs and the possible strength of their future mainstream parts. As of course is geometry throughput totally irrelevant too, for let's say CAD.

Never said that either, in fact I stated the opposite.

Apparently you either didn't read anything I said, or you completely misinterpreted everything. And these remarks are just wrong and... well, lame... since I already told you you misinterpreted it, and should leave the discussion alone, since it's not getting anywhere.

But I guess you had to vent your agression, or get some weird kind of ego-boost out of it or something. Grow up.
That is exactly what I meant by you getting less reasonable every post, thanks for proving that this discussion should have ended, as I said.

aZZa · Sep 4, 2004

It seems rather odd that the radeon x700 will (according to all early reports) contain 1.5x the previous series in vertex units 6 vs 4. Would this be a hint to future requirements of more vertex power needed, and possibly a combined shading architecture of vertex & pixel? Or could this relate to tbdr being featured in future api's - with possibly requiring more vertex power to sustain performance??

jvd · Sep 4, 2004

aZZa said:
It seems rather odd that the radeon x700 will (according to all early reports) contain 1.5x the previous series in vertex units 6 vs 4. Would this be a hint to future requirements of more vertex power needed, and possibly a combined shading architecture of vertex & pixel? Or could this relate to tbdr being featured in future api's - with possibly requiring more vertex power to sustain performance??

I dunno perhaps they looked at the 9800 series and wanted to surpass it in the midrange while still making more money per chip ?

aZZa · Sep 4, 2004

The 9700 & 9800 series had ample T&L and vertex power for gaming - especially compared to the competition - but perhaps that merely related to vertex shading 1.1 rather than using 2.x shading tech???

I guess this new series will be much more beneficial in workstation performance, but also offer a nice little boost per clock, whilst maintaining the same amount of pixel pipelines as the previous series cards.

It will be interesting to see how the x700s match up the the 9500pro/9800s on a clock per clock basis - and outline the improvements overall in the various rendering areas - probably best compared to the 9500 pro to match up a 128-bit memory interface card.

Ailuros · Sep 5, 2004

aZZa said:
It seems rather odd that the radeon x700 will (according to all early reports) contain 1.5x the previous series in vertex units 6 vs 4. Would this be a hint to future requirements of more vertex power needed, and possibly a combined shading architecture of vertex & pixel? Or could this relate to tbdr being featured in future api's - with possibly requiring more vertex power to sustain performance??

Professional applications like CAD require an ungodly amount of geometry power. My speculations/assumptions this far are also of course with the comparable results from the P20 in mind, where both other IHVs seem to have space to cover.

I doubt that this is directly related to gaming requirements. In fact if I concentrate exclusively on ATI I can see on R3xx 2 quads/4VS and on R4xx 4 quads/6VS, which doesn't look at least to me that the pure VS requirements are going to grow with the same analogy in comparison with PS requirements.

Future API's don't contain anything TBDR specific; the early DX-Next preview from Ilfirin here at B3D implied in it's conclusion that the API is somewhat tailored for TBDRs. As I said I don't entirely agree to that, especially because of the suggested I/O model, but again I might be entirely wrong.

As for combining VS and PS calls, it's rather a topic for SM4.0 and beyond. R4xx's could hardly be related to that, especially due to the lack of texture samplers for the VS units.

aZZa · Sep 5, 2004

Ailuros, Im not sure about future APIs being tailored to any architecture either, and I dont think it's advisable to advantage nor disadvantage the IMRs or TBDRs. They need to both work mainly within the same open guidelines/API (to a degree anyway), otherwise we could end up with another Glide-like situation where everything is written for one card/company/architecture.

Looking at the latest integrated graphics (and currently most advanced) from Intel - some alternative needs to be implemented. Future integrated chips will have access to more bandwidth with a push to ddr-2 and beyond, however the performance gap has grown substancially further apart from integrated to high-end stand-alone cards in recent years. The integrated PC graphics world is in need for a higher-performing low-bandwidth solution.

Would Windows Longhorn benefit much from the extra vertex power offered in an x700 over older cards for its 3d-Environment?

ATI obviously has a few reasons for the increased vertex potential - which may relate to future-proofing, offering a cheaper and powerful CAD cards and reduction of development in reuse of developed core modules. Maybe someone could ask some questions at a future interview - after the x700s release, if the 6 vertex units indeed exist.

Dave Baumann · Sep 5, 2004

Actually, looking at DeanoC's game and thinking about where consoles are going Vertex Shader performance may become somewhat more of a focus than they have in the past. PS3's arrangement appears to be very heavy on the geometry side from what I've seen of it and heard from Dev's and the Xbox2 can also have plenty of geometry performance from both the intructions added to the CPU's and the unified shader model - given the influence that the PS3 is likely to have on XBox titles this might also cross over to PC games in the future as well.

Of course, none of this is really a consideration for something like X700, but I suspect we may see geometry getting a bit more of a focus in the future.

Scali · Sep 5, 2004

Of course, none of this is really a consideration for something like X700, but I suspect we may see geometry getting a bit more of a focus in the future.

One possibility could also be that (manual) deferred rendering will be the future.
A z-only pass, then shading every pixel only once.. That would shift the focus from pixel processing to vertex-processing, since you will do more geometry processing, since you render multiple passes, while you perform less pixelshading, because the zbuffer will eliminate all occluded pixels.

aZZa · Sep 6, 2004

I think dave may be spot on with the console-gaming geometry influence brought toward the PC-gaming, hopefully making easier ports across multi-platforms.

BTW, any idea if the new futuremark rendering engine will require any significant increase in geometry power? They are going to want to really separate the low-end, stand-alone dx9 graphic cards from Intel's integrated chips, so could we really begin to see the hardware vertex shaders influence??

Ailuros · Sep 6, 2004

DaveBaumann said:
Actually, looking at DeanoC's game and thinking about where consoles are going Vertex Shader performance may become somewhat more of a focus than they have in the past. PS3's arrangement appears to be very heavy on the geometry side from what I've seen of it and heard from Dev's and the Xbox2 can also have plenty of geometry performance from both the intructions added to the CPU's and the unified shader model - given the influence that the PS3 is likely to have on XBox titles this might also cross over to PC games in the future as well.

Of course, none of this is really a consideration for something like X700, but I suspect we may see geometry getting a bit more of a focus in the future.

If you were targeting advanced HOS in some way, then I can only agree. Yet in that case we're obviously talking about way more programmable geometry, where it's also important to note that geometry can be generated and deleted on chip.

Ailuros · Sep 6, 2004

aZZa said:
Ailuros, Im not sure about future APIs being tailored to any architecture either, and I dont think it's advisable to advantage nor disadvantage the IMRs or TBDRs. They need to both work mainly within the same open guidelines/API (to a degree anyway), otherwise we could end up with another Glide-like situation where everything is written for one card/company/architecture.

Absolutely true. TBDRs though shine in the past few years with their absolute absence *shoots a nasty look up north*

Looking at the latest integrated graphics (and currently most advanced) from Intel - some alternative needs to be implemented. Future integrated chips will have access to more bandwidth with a push to ddr-2 and beyond, however the performance gap has grown substancially further apart from integrated to high-end stand-alone cards in recent years. The integrated PC graphics world is in need for a higher-performing low-bandwidth solution.

I know I might get crucified for saying so, yet I'd prefer that Intel would stay away from graphics in general. Or else raise their bar a bit more for what they usually deliver in terms of image quality at least. No-one really expects standalone PC graphics performance from an integrated part, but the washed-out stuff I've seen from their parts so far is simply unacceptable.

Would Windows Longhorn benefit much from the extra vertex power offered in an x700 over older cards for its 3d-Environment?

No idea really. But I have a hard time imagining that the 3D enviroment of Longhorn would be that complex and demanding after all. The question is if and to what degree those could possible offload the CPU in the end with some functions. On the other hand even mainstream platforms will be times more powerful in 2006 and beyond, that I don't think it'll be that important after all.

V3 · Sep 7, 2004

Talking about geometry, can they improve the triangle setup rate ? They seems to be only scaling setup rate with clock it seems, that's why all the focus on pixel shading rather than geometry.

3:8 vertex and pixel shader in X800 and NV40 - coincidence?

Ailuros

Epsilon plus three

Dave Baumann

Gamerscore Wh...

Ailuros

Epsilon plus three

Maverick

Scali

aZZa

Ailuros

Epsilon plus three

Ailuros

Epsilon plus three

Scali

aZZa

jvd

aZZa

Ailuros

Epsilon plus three

aZZa

Dave Baumann

Gamerscore Wh...

Scali

aZZa

Ailuros

Epsilon plus three

Ailuros

Epsilon plus three

V3

Similar threads