Games and Pixel Shader 2.0

DaveBauman said:
Luminescent, have you seen any evidence that actually backs up that it has two FP shaders?
Here is a post of mine from around June which extrapolates the whole two shaders per clock shabang from the results obtained by MDolnec's benchmark on AnteP's test system, a 5800 and 5900.

NV35:
Maximum fillrate=1772.702026M pixels/sec
Per-pixel fillrate=105.561607M pixels/sec (at fp16 for the sake of neglecting fp32's register overhead)
Maximum fillrate/per-pixel fillrate=~16.79
Considering there are 21 instructions:
21/16.79=~1.25 instructions/cycle per pipeline

Now, compare this to NV30:
Maximum fillrate=1957.946899M pixels/sec
Per-pixel fillrate=67.032890M pixels/sec
(at fp16 for the sake of neglecting fp32's register overhead)
Maximum fillrate/per-pixel fillrate=~29.209
Considering there are 21 instructions:
21/29.209=~.7189 instructions/cycle per pipeline

Clock for clock, the improvement between NV35 and NV30 in number of instructions executed per clock, for the per pixel shader test is:
1.25*1.10/.7189=1.91 or almost 91%, which is close a two-fold improvement.
 
Reverend said:
There is a certain DX9 game that will be benchmarked in my forthcoming Triplex Radeon 9600PRO review. It features floating point for a certain effect.

Either Dave or me will be studying the quality of this effect utilizing float textures soonish, comparing ATI's and NVIDIA's offerings.

NVIDIA's doesn't support (ath the moment) fp texture in dx9. You don't mean GunMetal?

Thomas
 
ATI didn't "prove" you don't need integer. They simply proved that you could have a simpler 24-bit FP design that could run fast and that the lack of integer wouldn't hurt you on old DX7/DX8 content.

If someone ever delivers a card with the FP24-32 performance of ATI, but with 2x the FP16 and integer performance, and that card blazes in OpenGL2.0 (the only API at the moment that can deal with it in the pixel shaders via HLSL), then in hindsight, ATI's decision will look like a conservative design decision, and NVidia's will look like a bungled start, but bold design that panned out.

NVidia and ATI both "prove" you don't need tiled based deferred rendering. But if someone ever delivers a TBR DX9 design with much higher performance, their brute force conservative approach will look bad in retrospect.

The first and thrid is a matter of opinion on both of us, who is best qualified to judge? I think ATi has proven the case for no INT and for TBDR, there's no comparable data to judge. (In other words I don't think they have proved it, all they done is prove you can build traditional design with a fast enough throughput so that you don't get hurt on SW which would benefit from TBDR) Yes the similarity to your first example is deliberate, ain't perspective great. ;)

As for number 2, well seeing as both the likelyhood and impact of such an event is extremely low you'll excuse me if I put it under the 'ignore' category of my worries. :D
 
tb said:
Reverend said:
There is a certain DX9 game that will be benchmarked in my forthcoming Triplex Radeon 9600PRO review. It features floating point for a certain effect.

Either Dave or me will be studying the quality of this effect utilizing float textures soonish, comparing ATI's and NVIDIA's offerings.

NVIDIA's doesn't support (ath the moment) fp texture in dx9. You don't mean GunMetal?

Thomas
No, not GunMetal.

There are some new NVIDIA drivers (presently NDA'ed) which I'll be checking out with a NV35 and see what happens. Given that the game is a TWIMTBP game, features Cg and FP16 + FP32 for a certain texture format, it may prove to be interesting, especially when comparing with R3x0.
 
Reverend said:
tb said:
Reverend said:
There is a certain DX9 game that will be benchmarked in my forthcoming Triplex Radeon 9600PRO review. It features floating point for a certain effect.

Either Dave or me will be studying the quality of this effect utilizing float textures soonish, comparing ATI's and NVIDIA's offerings.

NVIDIA's doesn't support (ath the moment) fp texture in dx9. You don't mean GunMetal?

Thomas
No, not GunMetal.

There are some new NVIDIA drivers (presently NDA'ed) which I'll be checking out with a NV35 and see what happens. Given that the game is a TWIMTBP game, features Cg and FP16 + FP32 for a certain texture format, it may prove to be interesting, especially when comparing with R3x0.
I hate secrets...
 
Uttar said:
Whether using integer could be a good thing doesn't matter anymore since nVidia is not using integer anymore in the NV35

IMO, that not the case. Given the transistor req's of FP32 they cannot have replaced all the FX12 units with FP32, so its reasonable to expect DX8 (int) performance to be traded off for for fewer float units, however clock for clock NV35 has exactly the same int performance as NV30.
 
NV30 = 1FP/2TEX + 2 FX

IMO, NV35 = 2FP/2TEX + 2 FX

Probably no dependencies between the 2 FP. It's why it's difficult to see a 200% increase.
 
IIRC, the data we've seen was compatible with:

NV30 = 1FP32/2TEX + 2 FX

NV35 = 1FP32/2TEX + 2 FP16

Or maybe

NV35 = 1FP32/2TEX + 1 FP(16? 32?)

Certainly the reason NV35's performance on complex PS 2.0 shaders is not much different than NV30's is that it appears to suffer from the same extreme restrictions on temp register use. But I believe we've seen pretty clearly that dedicated FX12 units are no longer around--otherwise, how to explain the lack of performance difference between FX12 and FP16?
 
Heathen said:
(In other words I don't think they have proved it, all they done is prove you can build traditional design with a fast enough throughput so that you don't get hurt on SW which would benefit from TBDR) Yes the similarity to your first example is deliberate, ain't perspective great. ;)

You're hoisted by your own petard. You say all ATI has done is proven you can do a IR design which "fast enough" throughput that it won't get hurt on SW which would benefit from TBDR. Ok, under this criteria you suggest they have NOT proven that TBDR is not needed.

But all ATI has done with FP24 is proven that you can build a traditional design with "fast enough" throughput that it won't get hurt in SW which would benefit from INT/FP16. Ok, under this critera, you suggest that they HAVE proven INT is not needed.

See the contradiction?

You claim there isn't enough data with respect to TBDR. Well, that's because no one has designed a blazing TBDR and there are no games that push the limits of shading and depth complexity because developers always have IR designs in mind.

Well, with respect to INT/FP16, there isn't enough data, because no one has designed a blazing INT/FP16 card and there are no games that push the limits of multiprecision shading, because developers are concentrating on DX7/8 games mostly, and certainly not many are coding games specifically for a card only a small number of people own.

The fact of the matter is, we don't know how much of a benefit a good multiprecision design can eek out. Even forgetting multiprecision scalars, simply being able to allocate FP units in clusters is good: for example, I can do twice the number of 2D vector operations as 4D vector operations by packing two vectors into one register.

It is unknown how much benefit "unit pooling" and multiprecision will bring, since it depends on many factors, such as what the average shader will look like in the future. 3dLabs wildcat takes the unit clusting approach (as does Sun's MAJC) It's an area of research.


Thus, I would claim that neither multiprecision NOR TBDR have been disproven. "Proof" is a very strong word.
 
Ratchet said:
Reverend said:
tb said:
Reverend said:
There is a certain DX9 game that will be benchmarked in my forthcoming Triplex Radeon 9600PRO review. It features floating point for a certain effect.

Either Dave or me will be studying the quality of this effect utilizing float textures soonish, comparing ATI's and NVIDIA's offerings.

NVIDIA's doesn't support (ath the moment) fp texture in dx9. You don't mean GunMetal?

Thomas
No, not GunMetal.

There are some new NVIDIA drivers (presently NDA'ed) which I'll be checking out with a NV35 and see what happens. Given that the game is a TWIMTBP game, features Cg and FP16 + FP32 for a certain texture format, it may prove to be interesting, especially when comparing with R3x0.
I hate secrets...


Well I hope it's not a secret, as I can put 2 + 2 together (and hopefully not make 5)

Reverend said:
Because it can be benchmarked (kinda) and it features ps_2_0
http://www.beyond3d.com/forum/viewtopic.php?t=6860

For the past one week I have been corresponding with the developer of a hugely popular game franchise in terms of understanding the technologies behind the latest game in this series. The game features a hidden benchmarking feature but it is lacking in terms of benchmarking options that helps automate benchmarking the game, something reviewers truly treasure. So I'm happy to say that the developer listened to my suggestions and has implemented almost all of them (there are two outstanding ones, which I suggested to him only yesterday, one of which is the ability to color MIP levels, much like what we see in UT2003) and I can say that they all work perfectly using a private beta build I was given. My suggestion was the ability to specify various command line options (such as rez, AA, types of effects) that were all lacking in the first place as well as the benchmark outputting a CSV file showing all the results. The only outstanding problem is a bug that exists on Radeon cards at a certain setting, which the developer informed me is hopefully just a driver bug and that the ATI folks visiting him yesterday should help fix it. All of this should result in a patch soonish and once that patch is official I should be able to record a demo that will be used in perhaps all of B3D's future reviews. A demo which the public can't have of course Anyway, I spent a considerable amount of time on this... hopefully it will prove to be very useful insofar as my efforts to continually upgrade B3D's range of software used for reviews and articles.

http://www.beyond3d.com/forum/viewtopic.php?t=7006

and http://www.nvidia.com/object/game_tombraider_aod.html

Hope i haven't let a cat out of a bag , but you (Reverend) have been dropping subtle hint's about it for the last couple of weeks. Please feel free to delete this if I've spoken out of turn.

Mark
 
last time I checked, TWIMTBP game programmers don't reccomend 9800 Pro's and Demo the games on ATI hardware ;)
 
So how game developers are going to do pixel shader 2.0 effects to games in way that both NVIDIA FP32/16 mix and ATI FP24 are fully used?
 
See the contradiction?

Which answer would you prefer? ;)

Where TBDR comes in I don't really care one way or another, I was trying to highlight the matter of differing perspectives on the issue. What you see as proof I don't and vice versa.

"Proof" is a very strong word.

Of course it is, I don't disagree. But as I've tried to point out, proof is a metter of opinion in this sort of case. In my opinion ATI have proved that INT aren't needed, in your opinion they haven't.

TBDR was just an example on perspective.
 
Energy said:
So how game developers are going to do pixel shader 2.0 effects to games in way that both NVIDIA FP32/16 mix and ATI FP24 are fully used?

Well, presumably, they'd use a HLSL that supports int, half, and float types. Then, on single precision architectures like ATI, everything would be upcast to FP24. On multiprecision architectures like the NV3x (and presumably, the 3dLabs VPU) it would depend on resources and optimization. (e.g. if I can issue one int and one fp32 per cycle, I might upcast one of my expressions so that I can issue two in parallel)

OGL2.0 HLSL supports int and float, but not half, however it has the right architecture to allow the driver to do the optimizations. DX9 HLSL has int, half, float, and double, but it compiles to PS2.0 and PS2.0 has no int or double registers. Right now, there is no Common HLSL that supports all the types, and allows the GPU driver to deal with them.
 
Back
Top