What is the chance that the nv36 has improvments?

Yes, I know. But are you saying NV31/6 behaves more like a 2x2 pipeline instead of 4x1, hence needing 2 cycles/quad most of the time? I thought that was the preserve of NV34.
 
NV31/34 (and, I assume, 36) are 2x2 doing multi-texture, and 4x1 with single texturing.

I'm not sure how "shader parity" relates to the fillrate numbers one can glean from 2x2/4x1 pipeline organization. Normally, assuming clock speed equality, one can assume a 4x1/4x2 card can output twice as many shaded pixels per clock as a 2x1/2x2 card, simply because it has twice as many pixel pipelines (and, thus, pixel "shaders" embedded in each pipeline). But the FX and Radeon DX9 lines don't really have comparable shader hardware, if we listen to nV's "sea of shaders" line, so the comparison isn't that clear-cut. As it stands, Radeons appear to be more efficient per clock, but who knows what driver updates and special code paths (that may affect more than just shader brute force) will bring?
 
Texturing isn't at issue. I was curious about semantic configuration of NV37 shading pipeline vs NV35 as it was specifically mentioned. RV360/R360 is a more "traditional" 4x1 vs 8x1 so we can extrapolate, whereas essentially a 2x2 may result in <1quad/cycle. I am aware that the NV3x pipeline includes flexible ALUs & is scalable for each model.
 
Sorry, I thought you were continuing the shader discussion.

(Though I lamentably still don't understand the concept of a quad and how it moves through the hardware pipeline, I'd imagine that if an NV35 can process one quad per clock, then an NV36 (don't know of an NV37) would take two clocks.)
 
I did some tests with a NV34 (non ultra), I got some strange results. I hope some one who know more about NV34 can verify my conclusions (all my tests are done with 45.23 WHQL driver):

1. When using pixel shader 1.1, with only one texture and one color instruction, I can get near 1,000 Mpix/s result. This suggests NV34 is a 4x1 architecture when using pixel shader 1.1.

2. When using pixel shader 1.4 or pixel shader 2.0, the maximum fillrate I can get is near 500 Mpix/s. This suggests NV34 behaves as a 2 pixel pipelines architecture when using ps 1.4 and ps 2.0.

3. Under pixel shader 2.0 (I didn't test pixel shader 1.4), using two non-dependent texture access cost the same as using only one texture. This suggests NV34 is a 2x2 architecture when using ps 2.0.

4. Using pixel shader 1.1, NV34 can only run one color instructions per cycle, instead of two instructions. This is a bit strange since almost all previous GeForces has two register combiners per pixel pipeline.

5. When not limited by registers, it seems no difference in performance between FP32 and FP16 operations.

6. Under pixel shader 2.0, most instructions have one cycle throughput. ABS is not free (unlike R300). LRP takes 3 cycles (I'll have to check about that), NRM takes 5 cycles.

Any comments?
 
Hm, maybe it's built that way: A unit working on 2 quads that can either do 2 tex lookups or 2 fx12 ops or one fp32 op per clock. That would make it quite similar to NV30, but without the combiners following the shader core.
 
on 1st
NV36 has CineFX2 instead of CineFX1.
So the PS Performance must be improved.
Furthermore the Framebuffercompression has been advanced (Intellisample HCT instead of Intellisample 1.0).

Last but not least NV36 is produced by IBM.
They use FSG as electrical insulator. So the powerconsumption of NV36
should be lower than NV31. And NV36 must be higher clockable...
 
pcchen said:
4. Using pixel shader 1.1, NV34 can only run one color instructions per cycle, instead of two instructions. This is a bit strange since almost all previous GeForces has two register combiners per pixel pipeline.

:oops:

If that's true...
It could mean:

NV34 = 2x ( ( 1xFP32 OR 2xTEX ) OR ( 1xFX12 AND 1xTEX ) )
NV31 = 2x ( ( 1xFP32 OR 2xTEX ) AND ( 2xFX12 ) )
NV30 = 4x ( ( 1xFP32 OR 2xTEX ) AND ( 2xFX12 ) )

All seem like a very likely scenario for their respective number of transistors IMO.
And could you test 2FX12 instructions + 1TEX instruction in PS1.1. please? I'd say that should take 1 cycle on the NV31, and 2 on the NV34 if I'm right, same thing for 2FX12 instructions + 2TEX instructions.


Uttar
 
Robbitop said:
on 1st
NV36 has CineFX2 instead of CineFX1.
So the PS Performance must be improved.
Furthermore the Framebuffercompression has been advanced (Intellisample HCT instead of Intellisample 1.0).

Last but not least NV36 is produced by IBM.
They use FSG as electrical insulator. So the powerconsumption of NV36
should be lower than NV31. And NV36 must be higher clockable...

Well since NVidia NDA's seem to be flakier than a lepper colony some reviews of the 5950 and 5700 have already appeared on the web. IXBT's was the first I think, but that has dissapeared now.

Nordichardware have their review up, great for those who can read swedish ;) but the graphs do speak for themselves.

The 5950 is quite capable of keeping up with the 9800XT but still dies a death when a lot of shaders appear. Looks as though it will be $100 cheaper than the 9800XT at launch which will make it worth while. Is the extra shader performance (considering how many titles are out there) worth an extra $100? Personally I'd say yes...

The 5700 looks like it will be something of a beast, they were looking at a $199 launch price which makes it exceptional value. It beat the 9600XT quite easily and they got the damn thing to overclock to 500/1100 without it breaking a sweat. Looks like they may have actually produced a fine card, at last!!

What I don't get is the PCB layout. It looks identical to the NV30, which begs the question - why was the NV30 so expensive? Either NV made a killing on the few sold or they are taking a big hit with the 5700...

Sadly the only thing that got cached on my 'puter were the pictures of the 5950 and 5700.
 
It seems to me that the 5700 will be a 'loss leader' product for NVidia.

Seems like a good idea for them to put out a high-performing yet cheap card to try and win back some of the mindshare of the enthusiast community. It doesn't matter if they barely make a profit on it - it is more likely to spur the sales of the lower GFFXs through name association.

Of course, I'm not going to get one because of the crappy AA. 8)
 
Uttar said:
And could you test 2FX12 instructions + 1TEX instruction in PS1.1. please? I'd say that should take 1 cycle on the NV31, and 2 on the NV34 if I'm right, same thing for 2FX12 instructions + 2TEX instructions.
Uttar

Unfortunately I don't have a NV31. IIRC NV34 needs two cycles to run a 1TEX + 2 FX12 shader. 2TEX + 2FX12 also needs two cycles. I'll check them again tomorrow.

btw, I think NV34 should be

NV34 = (2x (1x FP32 or 2x TEX)) OR (4x (1x FX12 and 1x TEX))

Of course, this's just my speculation.
 
Thanks pcchen. It seems to support previous ideas about NV34 pipeline organization. It will be interesting to see how much the mini FP units of the NV36 affect it's configuration. According to Dave's tests, the new drivers expose these better. It makes sense that all NV3x series are the same general pipeline, only with scaled ALUs/units resulting in different shading/texturing characteristics.
 
BoardBonobo said:
What I don't get is the PCB layout. It looks identical to the NV30, which begs the question - why was the NV30 so expensive? Either NV made a killing on the few sold or they are taking a big hit with the 5700...

Nvidia apparently hade huge yield problems at TSMC back then. And the NV30 also has 50+ million transistors more then the 5700.
 
It seems that NV36 has a few more transistors than NV31.
NV36 got 3 VertexUnits, NV31 just 1.
So there must be about 10-15million Transistors more than in NV31.
So we have aproximately ~100Mio Transistors on NV36.

But the Yields @IBM seems to be excellent and they use FSG as electrical insulator.
 
I think 5700U articles pin the NV36 at 82M transistors, about 5M more than the NV31.
 
Back
Top