surfhurleydude said:Stryyder said:surfhurleydude said:Stryyder said:True, must of the article is already common knowledge but there is some interesting information
Quote:
NVIDIA UltraShadow II for 4 times the performance in highly shadowed games (e.g. Doom III) comparing to older GPUs
Confirmation of a 32x0 mode
128 pixel shader operation /clock
They mean 4 times the performance of their older GPU's not ATI's
AFAIK, the NV3x was actually ahead of ATi in terms of shadow performance, as the line simply dominated in all early Doom III benchmarks, and NV35 contained all sorts of upgrades "recommended" by JC himself.
Drinking the Cool Aid?? Doom III was the only game with shaders that the NV3x didn't choke on and die. Since the NV35 was built to play doom and Doom was coded to run on the NV35 this shouldn't be suprising. Unfortunately most people will play more than just Doom 3 and JC will have to release a product that is coded to the DX9x spec.
Can I ask what the hell you are rambling on about? Fact of the matter is, UltraShadow was put in the NV35 line up because of JC's request, and it enhances STENCIL op performance, not shader performance.
Me said:This isn't QUITE true.LeStoffer said:They get to the 128 operations/clock like this:
a) 16 pipelines with...
b) 2 Shader Units each...
c) that can each do 4 instructions (on RGB+Alpha) per cycle (per clock I assume)
Thus: 16 x 2 x 4 = 128
16 pipes, each with 2 shaders.
Each of those shaderss can execute 2 instructions.
Those 2 instructions can have a total of 4 ops.
ie vec3/scalar, or 2 vec2s.
So there are 128 operations, but only 64 instructions the way I read it.
The difference is that if you have 4 scalar adds in a row they all can't execute in 1 clock on one shader.
It would be a stupid shader to hit this case though.
Riff said:2x2 may or may not be important. Many 2D texture coordinate manipulations only require two components. Plus everyone seems to be missing the fact that the compiler could break some 4x4 instructions into multiple 2x2 instructions. These extra instructions could be used to keep a shader unit doing useful work when it might normally be stalled due to data dependencies. Even if they don't do this now, they could. Flexible is good.
I agree that their little blurb doesn't say anything about the limitation of the second unit but I am under the impression that both units are identical.
Zeross said:You're only talking about co-issue 3/1 VS 2/2 here but GF 6800 has two units each capable of co-issue.
Wasn't it already the case with the NV35 ? I believed the first NV30 had one FP Unit (also responsible for texturing) and two FX units, and NV35 had two FP units.
Joe DeFuria said:NV4x is able to process dual instructions at the same time. It's not clear from their diagram, but it sounds like each PS unit can process either 3/1 or 2/2 operations. So, NV4x could execute dual 3/1 operations per pipeline per clock.
Colourless said:I thought the slide was quite clear.
Shader Unit 1 is highlighted with RGB+A
Shader Unit 2 is highlighted with RG+BA
Shader Unit 1 is also shown to be used to do texture addressing.
Riff said:I agree that their little blurb doesn't say anything about the limitation of the second unit but I am under the impression that both units are identical.
Colourless said:I thought the slide was quite clear.
Shader Unit 1 is highlighted with RGB+A
Shader Unit 2 is highlighted with RG+BA
Shader Unit 1 was also shown to be used to do texture addressing in the previous slide. I tend to think this indicates that the 2 units are different.
NV4x can do this as 3/1 or 2/2 pairing of components
That's how I interpreted it, too. Shader1 is clearly linked to tex in the first slide.Colourless said:I thought the slide was quite clear.
Shader Unit 1 is highlighted with RGB+A
Shader Unit 2 is highlighted with RG+BA
Shader Unit 1 was also shown to be used to do texture addressing in the previous slide. I tend to think this indicates that the 2 units are different.
dan2097 said:Do you have any idea whether their figure of 128 operations/clock is comparable to ATIs 9800XT value of 40 pixel shader operations/clock from here:
http://www.hardocp.com/image.html?image=MTA2NDg1OTI2NEFrQm9pUVd0ZU5fMV83X2wuanBn