NV40: Surprise, disappointment, or just what you expected?

BRiT said:
Chalnoth said:
Hence my statement that when the rest of the NV4x line comes out, it will come out unchallenged (in the low-mid range). I don't think ATI will have a low-end part in the R4xx line by that time, and maybe not even a lower-mid range part.

:rolleyes: The NV40 Ultra and nonUltra aren't even out (available) yet. What makes you think Nvidia will have any of their parts out before ATI?

Exactly.

All I have heard is the NV40 is THE card, the KING!!

That thing isn't even out, the R420 MAY kick the snot out of it, AND the 420 MAY be on the shelves BEFORE the 40.

NV fans are really loving life right now, yet a blowup doll just won't do it for me.
Sorry
 
Ardrid said:

The thought of posting twice as many links did cross my mind. :devilish:

But my contention was essentially that unless you restrict yourself to certain circumstances, NV40 does not quite offer twice the performance of a 9800XT overall. Twice that of a 9700 Pro might be closer to the truth, which I believe was also the target for R420.
 
kemosabe said:
But my contention was essentially that unless you restrict yourself to certain circumstances, NV40 does not quite offer twice the performance of a 9800XT overall. Twice that of a 9700 Pro might be closer to the truth, which I believe was also the target for R420.
I was primarily talking about shader performance, where average performance in synthetic benchmarks is over twice that of the Radeon 9800 XT.

As far as "normal" game performance is concerned, I don't think it's possible today for one architecture to really separate itself from another with similar pipeline configuration, core clock, and memory speeds. In other words, I don't expect major performance differences in older games with parts designed for the same market. I expect the main battleground to be fought in shader performance.
 
Ardrid said:


There is many games there that autodetect the chipset and set what Shader version to use, Stalker leak doesn't even have shaders in it (very old leak),UT 2003 very very few, HL2 only has R300, Nv3.x profiles. Same for Halo unless the reviewer forced PS 2.0 with the command line, so before we start showing graphs lets make sure we are looking at a apples to apples comparison. The Far Cry screen shots I've seen shows that the 6800 may have been running PS 1.1 which would give a signficant advantage to it.
 
What percentage of transistors in a GPU would unrelated to “pipelinesâ€￾? I am referring to transistors for the memory controller, etc., the number of which would static regardless of the number of pipes. I would like to be able to put things like “double performance with less than twice the amount of transistorsâ€￾ into perspective.
 
The Far Cry screen shots I've seen shows that the 6800 may have been running PS 1.1 which would give a signficant advantage to it.

I do not think that is an accurate description with respect to the Geforce 6800 Ultra. In fact, I recall seeing some tests where the 6800 was actually faster when running PS 2.0 than when running PS 1.1! Let's also not forget that the 6800 was using very raw drivers. If anything, performance in Far Cry for the 6800 should get even better, especially compared to the current generation of hardware.
 
kemosabe said:
radar1200gs said:
kemosabe said:
Chalnoth said:
I therefore claim that ATI cannot compete with the NV4x line with anything less than cards from the R4xx line.

Well there's a revelation for us all. :oops:


Additionally, if you consider the Radeon 9800 XT vs. GeForce 6800 Ultra, the GeForce 6800 Ultra has almost exactly twice the transistors, but more than doubles the performance, all at a very slightly lower clock speed. I think this bodes very well for the value lineup that will hopefully be released before the end of the year.

Don't like to nitpick, as NV40 is a great performer, but if you go back to the reviews there are extremely few instances (e.g. CoD) where you get more than double the performance. In most instances, NV40 is faster by roughly 50% and often far less.

Oh, and not to forget about features. Featureset is perhaps more important than performance in the parts in the price range I'm talking about here. ATI won't even be able to come to parity there until the R5xx, if current rumors hold true.

Feature set is more important in the mid and low-end segments than in the high end? Since when :?: :?
Oh since around the time on the TNT Vanta...
Most of nVidia's past success has been predicated on fully featured budget chips. In fact I'd go as far as to say the TNT M64 and GF2 MX between have almost single handedly defined the mainstream 3D market as we know it. its only a pity nVidia dropped the ball with the GF4 MX.

I would contend that that might have been truer a few years ago. Considering the OEM market is where the bulk of the low-end volume is, I don't see the recent focus being on feature set as much as price/performance ratio. It's also why the integrated segment has exploded in terms of market share vs. discrete and everything indicates that trend will continue. The mid-range segment seems to have been neglected somewhat by major OEMs in the last year or two, however. They've seemed content to stuff a low-end card in most standard configurations and preferentially offer high-end solutions for upgrades.
nVidia took a different approach with their DX8 (GF3 & GF4) chips. There really wasn't a proper budget chip for either generation (the Gf3 Ti200 & GF4 Ti4200 were closest). IMO the lack of decent MX cards for that generation was what allowed ATi to catch up to nVidia. With NV3x nVidia went back to the MX concept and I'd expect them to continue it with NV40.
 
kemosabe said:
But my contention was essentially that unless you restrict yourself to certain circumstances, NV40 does not quite offer twice the performance of a 9800XT overall. Twice that of a 9700 Pro might be closer to the truth, which I believe was also the target for R420.

What certain circumstances are you talking about ?

I see the NV40 offer around twice the performance (and sometimes more) of the 9800 XT and under certain circumstances, mostly when it's CPU limited from what it seems of current benchmarks, less then that. I'm not expecting anything less from the R420 though.
 
kemosabe said:
Chalnoth said:
Additionally, if you consider the Radeon 9800 XT vs. GeForce 6800 Ultra, the GeForce 6800 Ultra has almost exactly twice the transistors, but more than doubles the performance, all at a very slightly lower clock speed. I think this bodes very well for the value lineup that will hopefully be released before the end of the year.

Don't like to nitpick, as NV40 is a great performer, but if you go back to the reviews there are extremely few instances (e.g. CoD) where you get more than double the performance. In most instances, NV40 is faster by roughly 50% and often far less.
Agreed. I just want to point out another "abnormal" result, Spellforce: here.
(no English version yet; note the relatively slow testing platform)
 
Look at the results from MDolenc's fillrate tester, which should probably isolate pixel processing fairly well (whereas 3dmark 03 and other "complex" shader intensive results will be more influenced by bandwidth and special features).

My 9800XT results @400MHz/400Mhz, on Cat 4.4:

EDIT: Oops, had AA and AF on in D3D for some reason (usually set those in game), and seem to have failed to select the Z buffer format I wanted. Because the difference for the shader results were a maximum of about 3.5 M (1573.8 for PS 2.0 Simple, the rest all relatively unchanged), and to avoid recalculating for no discernible benefit given the lack of accuracy in any case, the FFP ones are the only ones I updated from the new file.

1024x768 (I'll post this verbatim for easier searchability)
Code:
Fillrate Tester
--------------------------
Display adapter: RADEON 9800 XT
Driver version: 6.14.10.6436
Display mode: 1024x768 A8R8G8B8 60Hz
Z-Buffer format: D24S8
--------------------------

FFP - Pure fillrate - 3024.906250M pixels/sec
FFP - Z pixel rate - 2794.202393M pixels/sec
FFP - Single texture - 2874.847656M pixels/sec
FFP - Dual texture - 1476.179321M pixels/sec
FFP - Triple texture - 837.697327M pixels/sec
FFP - Quad texture - 695.885559M pixels/sec
...
PS 1.1 - Simple - 1570.407715M pixels/sec
PS 1.4 - Simple - 1570.395996M pixels/sec
PS 2.0 - Simple - 1570.425049M pixels/sec
PS 2.0 PP - Simple - 1570.394531M pixels/sec
PS 2.0 - Longer - 790.195129M pixels/sec
PS 2.0 PP - Longer - 790.190674M pixels/sec
PS 2.0 - Longer 4 Registers - 790.206726M pixels/sec
PS 2.0 PP - Longer 4 Registers - 790.198608M pixels/sec
PS 2.0 - Per Pixel Lighting - 198.141907M pixels/sec
PS 2.0 PP - Per Pixel Lighting - 198.139145M pixels/sec



I'll try to format for easy comparison to the 6800Ultra results from the review, same settings, except missing "FFP" tests (issues with the applicability of the test, Wavey, or just extraneous info that confirmed what was shown elsewhere?):
Code:
filrate tester test               9800XT           6800U
Megapixels                        400MHz           400MHz
-------------------------------------------------------------------
PS 1.1 - Simple  (/pipe)          1570.4 (196.3)   2092.0 (130.8) ?
PS 1.4 - Simple                   1570.4 (196.3)   2092.8 (130.8) ?
PS 2.0 - Simple                   1570.4 (196.3)   3115.5 (194.7)
PS 2.0 PP - Simple                1570.4 (196.3)   2091.9 (130.7) ?
PS 2.0 - Longer                    790.2 ( 98.8)   1573.0 ( 98.3)
PS 2.0 PP - Longer                 790.2 ( 98.8)   1573.2 ( 98.3)
PS 2.0 - Longer 4 Registers        790.2 ( 98.8)   1572.6 ( 98.3)
PS 2.0 PP - Longer 4 Registers     790.2 ( 98.8)   1572.6 ( 98.3)
PS 2.0 - Per Pixel Lighting        198.1 ( 24.8)    420.1 ( 26.3)
PS 2.0 PP - Per Pixel Lighting     198.1 ( 24.8)    626.0 ( 39.1)


For these comparisons, I'm counting 400/instructions for what the figure in ( ) should be for an IPC of 1.0. I'm also doing questionable things like counting MAD as 1 op (which makes sense to me because I expect both pipelines to be able to do 1 per clock cycle per pipe) and not accounting for the abundant rounding errors introduced.

Simple tests are of the form:

Code:
def c0, 0.3f, 0.7f, 0.2f, 0.4f
...
add r0, c0, v1
add r0, r0, -v0
...
mov oC0, r0

For the "PS 2.0" test, per-pipe parity is indicated. IPC looks like about 1 for each, but R3xx description, AFAICS, indicates it should have 2...isn't the final mov supposed to be free for both?
The less than full precision register usage performance issues...is it something not being visible to the driver/optimizer just because of precision? Or could it be that casting to a lesser precision register is a tradeoff of register performance issue concerns versus full utilization of the effective IPC potential?
The first makes sense as what nVidia would want to achieve in ideal and is counter-indicated by the apparent triviality of what causes the IPC drop.
The second makes sense because it is smart tradeoff to make in conjunction with things like free normalization of lower precisions and dealing with register performance issues that seem to result from trying to achieve a branching solution, and is counter-indicated by it not being clear that such a casting penalty tradeoff would even be necessary.
As far as my thinking: perhaps there is still some sort of register management component that implements the dual "Mini-ALU' modifier functionality, and can also handle precision casting and perhaps pack/unpack (not analyzed yet AFAIK?). Looking at the full set of these features for this hypothetical unit, this looks less like a "casting penalty" than a purposeful compromise.


Some investigation of some of these ideas would be interesting.

Longer tests are of the form:

Code:
def c0, 0.3f, 0.7f, 0.2f, 0.4f
def c1, 0.9f, 0.3f, 0.8f, 0.6f
...
add r0, c0, v1
mad r0, c1, r0, -v0
mad r0, v1, r0, c1
mad r0, v0, c0, r0
...
mov oC0, r0

Well, apparent IPC is again about 1 for each (about 1.7 if you count MAD as 2 ops). This again makes sense to me for the info about the pipelines we have. Regarding the R3xx, it is unclear whether, for this shader, the first unit needs the second for MAD or not, and so whether, as an example, a mad/mul occurrence would boost IPC.

PerPixelLighting tests are of the form:

Code:
def c0, 0.0f, 0.0f, 2.0f, 0.0f
def c1, 0.4f, 0.5f, 0.9f, 16.0f

dcl t0.xy
dcl t1.xyz
dcl t2.xyz

dcl_2d s0
dcl_2d s1
...
// Normalize light direction
dp3 r1.w, t1, t1
rsq r1.w, r1.w
mul r1.xyz, t1, r1.w

// Calculate halfway vector
add r0.xyz, c0, -t2
dp3 r0.w, r0, r0
rsq r0.w, r0.w
mad r0.xyz, r0, r0.w, r1
dp3 r0.w, r0, r0
rsq r0.w, r0.w
mul r0.xyz, r0, r0.w

// Load and normalize normal
texld r2, t0, s0
dp3 r2.w, r2, r2
rsq r2.w, r2.w
mul r2.xyz, r2, r2.w

// Calculate lighting
dp3 r1.w, r2, r0		// N.H
dp3 r1.xyz, r2, r1		// N.L
pow r1.w, r1.w, c1.w
mad r1.xyz, r1, c1, r1.www

// Add base texture
texld r0, t0, s1
mul r0, r1, r0
...
mov oC0, r0

Well, more opportunity for the "smartness" of the pipelines to manifest, including via co-issue and some swizzle functionality.
IPC for the R3xx is about 1.24 over the entire shader, and the 6800U is about 1.32 for full precision. For partial precision, the 6800U is about 2 (1.96 to continue the somewhat imaginery accuracy) IPC. For counting MAD as 2 instructions, this goes up to about R3xx: 1.36, NV40: 1.45, NV40_pp: 2.15

When put in relation to how free the fp16 normalization is, and if the shader compiler is extracting the normalization opportunities, it looks like about 1.17 IPC for NV40 when counting the normalization (and MAD) as one instruction, and 0.78 when treating it as truly "free" (about 1, at 0.98, when counting MAD as 2 instructions again).


About the tests in general: partial precision uses the _pp hint for every op (including tex_ld), and, for the longer PS 2.0 shader tests, the 4 register tests propogate extraneous registers for each op (which, ideally, the hardware can either perform without penalty or the driver can recognize as extraneous).

Hopefully no instruction counting or simple math mistakes in the above...it's late/early here. :-?
 
Chalnoth said:
nVidia is not ATI. The NV4x architecture is quite a bit more complex than the R3xx architecture, and so has much more room to grow with future driver improvements.

And how do you know this? Do you have an intimate knowledge of the 6800 chip, it's architecture? Do you have some foresight in it's potential from a programming standpoint? Or are you just making an assumption based upon the fact that the 6800 is made by Nvidia, the love of your life?



I was primarily talking about shader performance, where average performance in synthetic benchmarks is over twice that of the Radeon 9800 XT.

Synthetic benchmarks?

That's not a very good example.

For the past year, Nvidia have shown just how inaccurate synthetic benchmarks can be with all their cheating. I don't really rely or trust synthetic benchmarks when it comes to graphics cards performance anymore.
 
One must congratulate for nVidia for coming up with the goods with the gf6800ultra. It seems to show solid dx9 and legacy performance. One still cant test the shaders 3.0 which apparently comes under dx9.0c. Do the PowerVR demos require this to run under with the reference rasteriser??

This new architecture, as with any might require optimisations, but one must expect only minimal improvements i'd say as it seems to perform to expectation within the theoreticals, with its limited bandwidth for the 16 pipes.

R420 might face the same issues with its (assumed) updated shader core and could also gain (maybe around 5-10%) in shader apps. One has to think that these companies would have drivers capable of at least 90% performance upon release to showcase a new card, especially with all the time spent in with the debugging and simulations used in research & development (as well as the competition out there). I dont think ATI have been sitting idle for the last 20 months either, so the R420 might just push the dx9.0 shading performance to an entire new level altogether, even if it isnt fully capable of shader 3.0.
 
Nick Spolec said:
And how do you know this? Do you have an intimate knowledge of the 6800 chip, it's architecture? Do you have some foresight in it's potential from a programming standpoint? Or are you just making an assumption based upon the fact that the 6800 is made by Nvidia, the love of your life?
How do I know the NV40 architecture is more complex than the R3xx's? Well, nVidia did publish quite a bit of information about how their architecture works.

For example, the NV40 architecture is maximally capable of ~6-7 instructions per clock per pipeline under ideal circumstances. The R3xx's architecture appears to be maximally capable of closer to 4-5 instructions per pipeline per clock, with an average closer to 1 instruction per clock (since ATI has not released technical information, we cannot know exactly how many instructions per pipeline per clock the R300 is capable of).

What I'm saying is that I expect the current average of ~1.2-1.3 instructions per clock on the NV40 to increase with better compilers that reorder instructions in more optimal ways for the architecture, not to mention proper use of FP16 for certain specific calculations.
 
aZZa said:
R420 might face the same issues with its (assumed) updated shader core and could also gain (maybe around 5-10%) in shader apps. One has to think that these companies would have drivers capable of at least 90% performance upon release to showcase a new card, especially with all the time spent in with the debugging and simulations used in research & development (as well as the competition out there). I dont think ATI have been sitting idle for the last 20 months either, so the R420 might just push the dx9.0 shading performance to an entire new level altogether, even if it isnt fully capable of shader 3.0.
Keep in mind that ATI won't have the benefit of extra special-function units that operate at high speed under FP16 processing (like nrm and rsq on the NV40). I doubt they'll be able to increase shader efficiency by more than ~20%.
 
I wouldn't get my expectations up to expect a great deal of extra performance through the drivers. I think the new chips will struggle to gain much above 10% performance except under ideal (benchmark??) conditions. There maybe the odd case where they could gain a little more in the shaders, but i'd assume the drivers would gain as much performance asap in the next couple of releases due to the competition about and due out soon. I dont think these companies would risk holding back a driver release to combat a new product release with the capabilities that these could bring unknown, so they benchmark favourably to the public.

Looking at the benchmarks the nv40 has a slight performance edge over the r3xx tech (not including shader 3.0) in ps2.0, but ATI has the advantage of holding back its new release to present the necessary performing part to match the nv40, whether it would be a 12 or 16 pipe part. *They also hold all the cards with the r500 (assumed) to be well and truely developed and not that far off. Could this meet the 2-year cycle after the r300 core which was described??? This tech might bring shader 3.0 (and maybe 3.x or 4.0) performance to the mainstream by years end if these rumours provide some truth, as the nv4x shader 3.0 performance is still unknown. Anyway the more companies who can bring this amazing tech out, no matter who they are, the better for all consumers out there in both price and performance.
 
One still cant take anything off the performance of the nv40 core which takes dx9 to a playable level. This new card might be only for the extremist or professional but it is such an awesome performer, one would only wish the average joe could use one to show off the true performance potential on pc graphics.

Im sure a few people have been caught by surprise by its much improved shader performance which is probably about 30% above what would have assumed a follow up to the nv3x cores. Having the extra pipes (in a 16x1 setup over 8x2) probably helps gain the extra efficiency and performance improvements in base design alone, along with the better (and simpler in theory) overall architecture.
 
Chalnoth said:
aZZa said:
R420 might face the same issues with its (assumed) updated shader core and could also gain (maybe around 5-10%) in shader apps. One has to think that these companies would have drivers capable of at least 90% performance upon release to showcase a new card, especially with all the time spent in with the debugging and simulations used in research & development (as well as the competition out there). I dont think ATI have been sitting idle for the last 20 months either, so the R420 might just push the dx9.0 shading performance to an entire new level altogether, even if it isnt fully capable of shader 3.0.
Keep in mind that ATI won't have the benefit of extra special-function units that operate at high speed under FP16 processing (like nrm and rsq on the NV40). I doubt they'll be able to increase shader efficiency by more than ~20%.

So you know for a fact R420 will not support fp32/fp16??
Also note and I am prety confident in this being true..NVidia did a good job with their compiler with NV3x..and did make prety good gains in their SM2.0 output..However NV3x had problems to begin with that it appears NV40 does not have..I would not expect then to have the same success with NV40 if nothing else for this reason alone.
Also note that if NV releases a set of drivers a few months from now that shows 10 - 20% performance increases for the NV40..I gaurantee there will be people screaming all over the place "Cheat Cheat" NV already has a bad reputation due to the crap they pulled these past 18 months..I would think they would do everything in their power to try to walk away from this..Well if they have any brains at all.

I have a personal question for you.
If ( And I know it is a big if) ATI releases the X800XT and it smacks NV40 all over the place as far as shaders go. Are you going to leave this forum in embarassmeant, Admit that you have been mistaken all along, Or continue to say NV40 has so much more to offer than R420, even though every benchmark, Every game (with the exception of games like Tiger Woods Golf where NV strongarmed EA Games to turn shaders off when any ATI card is used) and reviewer (with the exception of purely NV biased reviewers are concerned) say and prove otherwise? (If what I said about R420 smacking the crap out of NV40 is true)
I am just curious why you would put your reputation out on a limb and say something about a technology and card of which you have no idea how it will perform..I know you are a NVidiot..You prove that with every post you make. But I still wonder..
 
Bry, i think you are being a little harsh to Chalnoth. These forums are only a bit of fun and edu-tainment. It doesnt really matter who is the best/greatest/fastest. One is allowed to stick up for their own team. If nVidia doesnt win this round they might get up next time. At least they have raised the bar for all consumers meaning that the competition (ie ATI & co) needs to raise it even higher if they want to be number 1.

No one wants a single company to take over the market altogether and leave us with a slow developing industry with multiple rehashes of aging technology at high prices (ala gf2gts->gf2ultra <gf2pro <gf2ti etc). Companies leap-frogging one another is the best possible scenario for everyone.

ATI should be able to surpass nVidia this time with an optimised 16-pipe r420 based on previous tech, but it will probably be fairly close. Lets hope powerVR can also pull something out and trump both the aces in the 3D industry, and give consumers a third performance alternative with the sega-tech part.
 
Bry said:
I have a personal question for you.
If ( And I know it is a big if) ATI releases the X800XT and it smacks NV40 all over the place as far as shaders go. Are you going to leave this forum in embarassmeant
Haha! You take this way too seriously. I've been wrong in the past, and I'll be wrong in the future. I stay here because it's my hobby. I enjoy learning about new hardware, and I (usually) enjoy arguments that I'm invoved in.

Or continue to say NV40 has so much more to offer than R420,
I don't think I've been saying that yet. I've been stating my expectations on performance (I still think the NV40 will outperform the equivalent R420, that is, 16 pipe vs. 16 pipe, etc., but it looks like the margin will be close), and, I hope, a realistic expectation on what to expect from PS 3.0 support on the NV40.
 
Back
Top