5200 vertex shader(s)?

RussSchultz said:
So, now that we can guess that the VS are in hardware, what part of these two chips is not DX9?
Um, I think it's pretty clear by now that all of the NV3x chips have the exact same core programming-side functionality.

The only question remaining is "fringe" support. That is, do they support the same number of instructions as the NV30? Do they have support for the high-precision log/exp/sin/cos functions? But the NV31 and NV34 certainly are fully-DX9.
 
Dave H said:
DaveBaumann said:
Alternatively, keep the CPU speed constant but vary the clock speed of the 3D processor - if it varies then it has hardware VS, if it doesn't its software (or at least, partially so).

/smacks forehead

Of course. And the hardware.fr review gives us all the information we need. Namely:
  • 3DMark03 VS 2.0 test is scaling perfectly with clock speed within both the 5200 and 5600 families. This demonstrates that all vertex shading is done in hardware.

No, this shows us that if there is a CPU/system limited portion of the vertex shading, it is not the bottleneck on the system in question (Pentium 4 2.8 GHz system), for the benchmark in question. Which, considering it is the 3dmark vertex shading test (presumably a complex workload), is still a good thing, assuming nothing funky is going on in the drivers that is specific to that benchmark program. Wouldn't it be nice not to have to wonder with the other things that have gone on, including nvidia's request for a delay of benchmark results? :-?

BTW, the drivers used were 42.72, which AFAIK is the latest "benchmark driver" release. I'm wondering how other driver releases would compare, and how other VS benchmarks would compare.

Note, that if you replace "all the information we need" with "a very good indication", I don't disagree at this point.

EDIT: they weren't VS 2.0 tests
 
No, this shows us that if there is a CPU/system limited portion of the vertex shading, it is not the bottleneck on the system in question (Pentium 4 2.8 GHz system), for the benchmark in question. Which, considering it is the 3dmark vertex shading test (presumably a complex workload), is still a good thing, assuming nothing funky is going on in the drivers that is specific to that benchmark program.

I dunno about this. The results scale exactly with clock speed (to the precision given by the benchmark, i.e. .1 fps). It's pretty darn clear.

EDIT: they weren't VS 2.0 tests

Correct, VS 1.1. My bad.
 
I talked with NVIDIA and got an official confirmation that both the 5600 and 5200 does vertex shaders in hardware. As you may know, there are 4 shader pipes for both cards.

Well if it´s official from Nvidia it must be true. ;)

From the benches i seen the NV34 lags behind the NV31 with so much difference in shader test´s that it seem´s like it well could be a 2*2 not using shader´s and can only work as 2*1 with shader´s.
That would still be "effective" 4 shaderpipes.
 
Well, your point of scaling exactly and 0.1 being precise enough raises another issue.

Let me illustrate with a counter example that is just as (in)valid, depending on your assumptions. I'll present in a template form since with some of the blanks filled in with enough system variations, it should answer our question conclusively (the key is that AFAIK we don't have that data yet, hence all the "?"). Of course, maybe I just missed the data somewhere, but it isn't in the page you linked to.

Systems:

TH: Athlon XP 2700+, nforce2, 512MB PC333
HF: Pentium 4 2.8 GHz, ASUSTeK P4G8X Deluxe, 512MB PC3200

Drivers:

TH: "6307", 42.72
HF: Cat 3.1, 42.72

3dmark 2001SE VS results
Code:
    9(2/0)00  9000P    5200U    5600U

TH:  81.5     82.1     58.4     73.0
HF:   ?        ?        ?        ?


3dmark 03 VS results
Code:
    9(2/0)00  9000P    5200U    5600U

TH:   3.4      3.7      5.3      6.2
HF:   3.4      3.7      5.4      6.2

3dmark 03 VS 2.0 results
Code:
    5200U    5600U
TH:  7.4     16.4  
HF:  ?        ?

The 5.3 and 5.4 figures are the ones that address our question, and the difference is just as meaningful (AFAICS) as the lack of deviation from what you expected in what you are proposing is the final word on the issue. Namely, the significance of the numbers needs further context for comparison (and we currently have a lack of information for that context). Note, that it is your insistence that what you link to is "all the info we need" that I am disputing, not the conclusion you are reaching, which seems reasonable barring something unexpected. The problem is we don't (yet) have enough info to rule out something unexpected as being a factor with the data you are focusing on.

This isn't a big deal one way or the other, we'll get that info eventually, but your comment is of the nature "we don't need to investigate this anymore" which I don't agree with.
 
well i i just grab my thought´s as a layman.
Not going to deep into it now but i should say that i "mean" pixelshader-pipes performance/output being half of the NV31/ and that´s was slowing it down IMO.

I draw the conclusion that VS is much the same in NV31 and NV34 but i still want to know more.
As to vertex output i still think it´s weird that Nvidia went out with 350m/v a sec, this is what many looks at when judging performance on the back of the box.
Now the Ultra has 250m/v a sec and regular 200million.
I don´t care whatever the spec tells you it´s about performance and quality.
But i still want to know as all 3D geeks here..
 
Chalnoth said:
OpenGL guy said:
Chalnoth said:
But the NV31 and NV34 certainly are fully-DX9.
Only if you consider the NV30 fully DX9. Where is MRT support?
The packed 128-bit framebuffer offers similar functionality.
It's not enough to allow creation of interesting vertex arrays via the pixel shader. x,y,z alone would be 96-bits. See the displacement mapping presentation from GDC using uberbuffers.
 
OpenGL guy said:
Chalnoth said:
OpenGL guy said:
Chalnoth said:
But the NV31 and NV34 certainly are fully-DX9.
Only if you consider the NV30 fully DX9. Where is MRT support?
The packed 128-bit framebuffer offers similar functionality.
It's not enough to allow creation of interesting vertex arrays via the pixel shader. x,y,z alone would be 96-bits. See the displacement mapping presentation from GDC using uberbuffers.

Under Dx9 NV3x doesn't support ANY high precision render targets let along multiple ones yet! Don't assume that fully Dx9 means anything BUT PS_2_0 or higher.

We will get D3DFMT_R32F and D3DFMT_R16G16F when the drivers catch up and MET support should appear at DX9.1. But from what I've been told, no 4 channel >8 bit formats (so you can't even store position easily!)

For now I'm having to split 16 bit integers into 2 8 bit integers pass and combine in a pixel shader when I use them. My 1 pass R300 shader (MRT with 4 D3DFMT_A16B16G16R16 surfaces is looking like 5 passes NV3x (5 D3DFMT_A8R8G8B8 surfaces as I only need to 1 surface to be effectively D3DFMT_A16B16G16R16).

What NV3x can do under OpenGL bares no relation to what it does under Dx9!
 
overclocked said:
I talked with NVIDIA and got an official confirmation that both the 5600 and 5200 does vertex shaders in hardware. As you may know, there are 4 shader pipes for both cards.

Well if it´s official from Nvidia it must be true. ;)

From the benches i seen the NV34 lags behind the NV31 with so much difference in shader test´s that it seem´s like it well could be a 2*2 not using shader´s and can only work as 2*1 with shader´s.
That would still be "effective" 4 shaderpipes.

I talked to an engineer, not a PR person. I'm not here to make you believe, I'm just passing on the answer. You guys asked if it was done in hardware, so I went to the source and asked.
 
Matt said:
I talked to an engineer, not a PR person. I'm not here to make you believe, I'm just passing on the answer. You guys asked if it was done in hardware, so I went to the source and asked.

But did you explicitly ask if it was done in hardware on the GPU/VPU? Even if the CPU does the work, technically it's still done in hardware. ;)
 
NV31/34 has the same vertex CALCULATION performance on same clockspeed. they 2.5 thimes slower than NV30

but NV34 brobably has twice smaller (or around so) caches/fifo's and another memory controller so somethimes its VS speed is lower.
 
BRiT said:
Matt said:
I talked to an engineer, not a PR person. I'm not here to make you believe, I'm just passing on the answer. You guys asked if it was done in hardware, so I went to the source and asked.

But did you explicitly ask if it was done in hardware on the GPU/VPU? Even if the CPU does the work, technically it's still done in hardware. ;)

Yes, I specifically asked if it was done on the GPU (I used GPU, not VPU), and he said yes.
 
demalion-

I'm not sure I understand exactly what you're getting at... :oops:

But if your point is that 5200U and 5600U show large clock-normalized performance differences on some VS tasks and not on others, yes, I'm quite aware of that but that doesn't really concern me because we know there are plenty of other differences between NV31 and NV34 that could explain this.

What I am sure of--and AFAIK the review at hardware.fr is the only one to address this--is that comparisons of 5200 to 5200U, and 5600 to 5600U (i.e. differently clocked versions of the same chip) demonstrably show that VS performance scales linearly with clock rate i.e. is done completely in hardware.
 
Dave H said:
demalion-

I'm not sure I understand exactly what you're getting at... :oops:

But if your point is that 5200U and 5600U show large clock-normalized performance differences on some VS tasks and not on others, yes, I'm quite aware of that but that doesn't really concern me because we know there are plenty of other differences between NV31 and NV34 that could explain this.

What I am sure of--and AFAIK the review at hardware.fr is the only one to address this--is that comparisons of 5200 to 5200U, and 5600 to 5600U (i.e. differently clocked versions of the same chip) demonstrably show that VS performance scales linearly with clock rate i.e. is done completely in hardware.

Well, I thought I made it clear that my issue was with the bolded words (not one or the other set alone, but both together).

You quoted VS 1.1 benchmarks (in fact, my comment that the benchmarks were a good indication and complex workload was based on it being VS 2.0 before I corrected that in an edit), and one that doesn't seem likely to use all of even VS 1.1 functionality. But we aren't talking about a vertex shader 1.1 benchmark accelerator, we are talking about a vertex shader 2.0 (and not just benchmark) accelerator, which to me leaves two issues:

1) All we have indication of (in what you propose is the complete picture) in the first place, is that on the CPU in question (a very fast one), the CPU workload for the specific (VS 1.1) benchmark in question is not limiting.

2) We have no idea how the CPU workload changes for implementing other VS functionality as far as I've seen (as I tried to illustrate, among other things, with my charts). This is other VS 1.1 instructions, register counts, macros, VS 2.0....some pretty significant items (in fact, it would be interesting to see this tested for a whole host of cards at the same time, but reviewers seem to have things like lives and stuff that get in the way of them doing some of the testing I'm curious about :rolleyes: :p ).

It is your insistence that this is a complete picture that continues to puzzle me, that's all. What if the behavior isn't the same with a 1 GHz Athlon or P III? Is it conceivable that the limitations might be different than for a 2.8 GHz P4? And that is not even the bottom end of the range the bargain cards will address.
An example that comes to mind is that of watching Quake III scale perfectly with GPU clock speed on a Rage Pro and concluding the graphics speed of the game is determined solely by GPU, in the absence of any benchmarks involving changes in CPU performance, or, as in my counter example, observing a change based solely on CPU performance and ignoring the graphics card and resolution at which this is observed to conclude it is accelerated completely by the CPU.
 
Back
Top