5200 vertex shader(s)?

horvendile

Regular
The latest days, I have gathered that something approaching consensus here was that the 5200 lacks hardware vertex shader(s). Was that an educated guess or confirmed by nVidia?

Anand's 5600/5200 review is up, and there is no mention of missing vertex hardware in the 5200. Of course, that is no proof of anything, and he did use a fast CPU, but what do you make of it?

Both cards were Ultra. Not sure if there is any qualitative difference between those and non-ultra.

Edit: Typo
 
Yes, I'm not seeing any mention of it....nor any benchmarks. What are the chances that the nv34 DOES have a full DX9 vertex shader?

nv30 = 3, nv31 = 2, nv 34 = 1?
 
It's unclear. It seems more likely now that NV34 (GFfx 5200) does indeed have (at least one) hardware vertex shader(s). This is--at the least--very impressive, considering it also includes a fully functional 4x1 pipeline with full PS/VS 2.0+ support, all in ~45 million transistors.

It's downright amazing when you consider NV31 (GFfx 5600) is also a 4x1 with full PS/VS 2.0+, but takes 80 million transistors, almost double. The only functional difference between the two that Nvidia has specified is that NV31 does color and z-compression, while NV34 does neither. Fine, and that takes up some amount of transistors...but not 35 million.

Pixel and vertex shader results from the reviews we've seen today seem to indicate NV34 has ~25-50% less shader performance per-clock than NV31. Some of that is surely explained by NV31 having more physical shader units, although some is probably also a result of the lack of z-compression on NV34. I'd hold out for some more "pure" shader tests (i.e. that don't actually display anything useful and thus don't use z at all, taking that variable out of the equation) before making any firm pronouncements, but it's almost certain there are transistors saved here, too. Again, it doesn't seem like near enough to get us from 80 to 45 million.

Anand's review, interestingly enough, mentions a third source of saved transistors: they claim NV31 is more deeply pipelined than NV34, and thus trades off extra transistors for increased clockability. Presumably they didn't come up with this on their own, so Nvidia must have told them this in which case it may be true. Then again, it may not: NV31 does seem to overclock higher than NV34 (reviewers have been able to get the reference NV31s above 400MHz, while the NV34s don't seem to get above the 325MHz Ultra clock), but then again NV31 is made on the .13u process while NV34 is .15u.

Meanwhile, several reliable members of the forum have been implying for some time that not only did NV34 not have full hardware vertex shader functionality but that NV31 may also use some part-hardware/part-software scheme.

So what gives?

I dunno. The NV34 vertex shading scores we saw today were not remarkably bad, but then again considering most reviewers used ~3GHz class CPUs, this wouldn't be surprising even if software VS was being used. By far the simplest way to sort this all out would be to have someone do a vertex shader limited test (the 3DMark03 VS test should work very nicely) at two very different CPU speeds. That should tell us all we need to know.

Until then, I don't think we can say one way or the other.

(edit: P.S. - the only difference between Ultra and non-Ultra should be core and memory clock rates.)
 
Dave H said:
Until then, I don't think we can say one way or the other.

Dave, please download our GeometryProcessingSpeed synthetic utility and check this out if you want!!!

its pretty geometry limited - didnt depends from resolution almost at all and enables use of different complexity of the lighting and different versions of shaders/TCL to check...

also it very near from theoretical peaks of chips in most simply aggressive mode - can produce 50 million tris per sec on previous generation and 100 and more on current...

http://www.digit-life.com/rm3d/DX9Synth/GeometryProcessingSpeed.zip
 
By far the simplest way to sort this all out would be to have someone do a vertex shader limited test (the 3DMark03 VS test should work very nicely) at two very different CPU speeds.

Alternatively, keep the CPU speed constant but vary the clock speed of the 3D processor - if it varies then it has hardware VS, if it doesn't its software (or at least, partially so).
 
Mulciber said:
nv30 = 3, nv31 = 2, nv 34 = 1?

No. NV30 doesn't have 'discrete' VS operating in parallel, but more akin to the FP array on P10. Its more likely the number of processing units in the array would be varied from chip to chip.
 
DaveBaumann said:
No. NV30 doesn't have 'discrete' VS operating in parallel, but more aking to the FP array on P10. Its more likely the number of processing units in the array would be varied from chip to chip.

yep, also think so
 
DaveBaumann said:
Mulciber said:
nv30 = 3, nv31 = 2, nv 34 = 1?

No. NV30 doesn't have 'discrete' VS operating in parallel, but more akin to the FP array on P10. Its more likely the number of processing units in the array would be varied from chip to chip.

Yes, so perhaps I should have typed 1, 1/2, and 1/3
Better?
 
No. NV30 doesn't have 'discrete' VS operating in parallel, but more aking to the FP array on P10. Its more likely the number of processing units in the array would be varied from chip to chip.

Can this confusion about the vertex-shader´s come from that Nvidia stated that the Quadro FX has 3 vertex-shader´s?
Is that true?

The NV34 apparently has 47million transitors, the Geforce3 has 57million if i remember right(2 diff core arch but could give some indication about the difference).
Would be very nice to see the vertex-shader score in 3Dmark03 and then hopefully come to a more "clear" picture what this core is, and understand the NV30-line better.
Cause it seem´s that Nvidia won´t give a clear picture.
 
Can this confusion about the vertex-shader´s come from that Nvidia stated that the Quadro FX has 3 vertex-shader´s?

Do you have a link for that? I don't remember NVIDIA saying anything offical like that, all that I know of is the quote of "1.5 x NV25 vertex performance clock for clock" from the developer documents - I believe some sites (naming no Russion sites ;)) took that to mean three VS operating in parallel. When GFFX was launched I only remember quotes about an FP array in various previews and from what Geoff Ballew said to us.
 
It really looks like the NV31 and NV34 have the exact same vertex engine, differing only by clockspeed. From nVidia's website:

GeForce FX 5600 Ultra: 88MVerts/sec
GeForce FX 5200 Ultra: 81MVerts/sec

It seems pretty certain that the only difference in the chips between the NV31 and NV34 is that the NV34 lacks the color and z compression. Curious how it still manages to outperform the GeForce4 Ti 4200 frequently in FSAA tests...
 
Chalnoth said:
It really looks like the NV31 and NV34 have the exact same vertex engine, differing only by clockspeed. From nVidia's website:

GeForce FX 5600 Ultra: 88MVerts/sec
GeForce FX 5200 Ultra: 81MVerts/sec

They are both also 4 pixel pipeline architectures, according to nVidia. Though we know their pipelines work a bit differently.

In any case, these particular theoretical numbers don't tell us much about their vertex engines. Vertex throughput as a single number is almost getting as useless as "fill rate" as a single number describing pixel pipeline engines. (Are those numbers via vertex shaders? T&L? Is T&L emulated with vertex shaders on one or both parts?)
 
Yep the number´s fit to the 5800 400/400 core/mem.

NV31 = 88(Vertex)/350(Mhz)=~0,25

NV30 =~0,25*2(Twice the number of shaders, pipes)*400(Mhz)=~200million Vertex/sec

Assuming Ultra = ~250million in vertex output
 
DaveBaumann said:
Alternatively, keep the CPU speed constant but vary the clock speed of the 3D processor - if it varies then it has hardware VS, if it doesn't its software (or at least, partially so).

/smacks forehead

Of course. And the hardware.fr review gives us all the information we need. Namely:
  • 3DMark03 VS 2.0 test is scaling perfectly with clock speed within both the 5200 and 5600 families. This demonstrates that all vertex shading is done in hardware. There is a slight discrepancy between 5600 and 5200 Ultra, despite the identical clock speeds, but presumably this is due to z-compression.
  • As overclocked has pointed out, Nvidia's theoretical T&L numbers fit with a scheme in which NV30 has twice the clock-normalized hardware T&L throughput as NV31 and NV34. The results of both the 3DMark03 VS 2.0 test and the 3DMark01 1 light T&L test match this extremely closely. NV31 and NV34 lag a bit (compared to where they ought to be) on the 8 lights test, but presumably this is due to NV30 natively allowing a larger number of simultaneous lights (if more lights are used than the hardware supports, does this force the geometry to be broken into 2 passes?).
The obvious conclusion is that all three NV3x cores have T&L/VS units in hardware, and that NV30 has exactly twice the resources as NV31/34. An easy conclusion would be that the T&L and VS units are therefore in fact the same (i.e. that T&L is executed on the general-purpose VS units as a normal vertex program), but comparison with R300--which is confirmed to do things this way--has made us conclude that NV30 has seperate fixed-function T&L units in addition to the VS functionality.

Another possibility--suggested by the fact that T&L and VS functionality both seem to exactly double between NV31/34 and NV30--is that NV30 doesn't really have seperate fixed-function T&L units, but rather the general-purpose VS units are in some way optimized to provide greater throughput on standard T&L. Or, conversely, that NV3x has trouble running the particular VS programs in the synthetic vertex shader tests we've seen with the same efficiency as it realizes when running the straight T&L program. (This would, of course, not be a new issue with NV3x, although previously we've only seen it on the pixel shader side of things...)
 
The results of both the 3DMark03 VS 2.0 test and the 3DMark01 1 light T&L test match this extremely closely. NV31 and NV34 lag a bit (compared to where they ought to be) on the 8 lights test, but presumably this is due to NV30 natively allowing a larger number of simultaneous lights (if more lights are used than the hardware supports, does this force the geometry to be broken into 2 passes?).
AFAIK, the maximum number of lights (per primitive lighting calc) for fixed function is 8 anyway, and since the original GeForce supported this number, it won't be any different between the NV3x range.
 
So, now that we can guess that the VS are in hardware, what part of these two chips is not DX9?

Or was that just somebody crowing before the dawn?
 
That´s the weird stuff, the NV31 is in round numbers twice as fast as NV34.
I think the pipelines for NV34 is lacking two shader-pipes, the thing anandtech said about "shorter" pipes and "memorycontroler" is the reason for the transistor decrease. The last seem logic but i don´t think that´s all.
Refering to what Nvidia say´s official about the Quadro FX IT has 3 vertex-engines, it don´t sound like they have a pool of small vertex-arrays to that statement. But is the Quadro Fx so different than NV30, before the Quadro´s has alwas been a regular Geforce with the "hack" in the hardware/driver.
As someone said earlier it seem´s like the Pixelshader performance of NV31 is half of NV30 and NV34 half of NV31.
 
I talked with NVIDIA and got an official confirmation that both the 5600 and 5200 does vertex shaders in hardware. As you may know, there are 4 shader pipes for both cards.
 
Back
Top