Why doesn't DX9 support simple Quads?

Dio · Aug 22, 2003

Two ways of specifying the same thing aren't necessarily better than one way...

Humus · Aug 22, 2003

[maven said:
]I can't see why you don't just use PointSprites (together with per particle PSIZE).

Yes, it's only supported on GF3+, (don't about ATI, but I suspect 8500 upwards should do), but then quad-rendering is supported nowhere in DX (as it's ill-defined when the vertices aren't coplanar).

Use D3DXSprite as a fallback for non-PSIZE-hardware...

Well, I kinda just did. But point sprites are defined in such an odd way, so I can't get it to look the same way. Personally, I'd want the quad to be contructed in world space and then let it process it as a normal quad. For some reason though point sprites are processed as one vertex, then some kind of attenuation function is applied to find the point size, this value is then clamped to a range that is hardware dependent. On the Radeon 9700 this maximum point size is 256. The effect of this is that if you tune it in decently, then as long as you're reasonably far away from the particle system it will look ok. But as you get close to it the particles hit the upper limit and it looks like the particles got much smaller. So the effect of being within the particle fire is lost, instead you see a bunch of small particles flow around you. This thing also makes it resolution dependent. So if you're running in higher resolution the effect is lost earlier. In an 800x600 window it looked ok so long as I didn't get into the fire or very close to it, but in 1600x1200 the effect is lost before I even get that close.

Humus · Aug 22, 2003

JohnH said:
Oh hang on you're using it for sprites, so you'd have to submit in seperate prim calls. Though I still doubt that the extra data for teh indices is going to make that much difference to your performance, unless of course there's something dodgy with the HW you're running on.

Of course this would be fixed by a programmable prim processor.

John.

Well, 8.3% extra bandwidth may not be that big deal, but it bothers me that I have to do this at all. The same hardware can do it without this extra work or performanc penalty in OpenGL.

Yes, a programmable primitive processor though would make things easier and more flexible, and I would be able to implement my stuff the way I want it instead of using the backward fixed function point sprite model. Though, then I would have the problem of backward compatibility.

Humus · Aug 22, 2003

Xmas said:
I think we're even better off with a primitive type called "quad" that is defined as a 2-triangle-fan. Simple as that, no "ill-defined" problems any more. Just the same as it is in reality in any OpenGL implementation.

My thoughts exactly.

Xmas · Aug 23, 2003

Dio said:
Two ways of specifying the same thing aren't necessarily better than one way...

The point is simply, it is useful functionality. And D3D doesn't offer it, for whatever reason. If you mind that there are already other definitions of "quads", well, then call it something different. "Restricted quads", perhaps?

otoh, OpenGL and D3D specify several things differently, so why not quads? It's not like the term is reserved in D3D. There are quad patches, but they are entirely different.

I see no reason not to expose this feature through the API.

KimB · Aug 23, 2003

One does have to wonder why MS doesn't support some things that seem trivial to support, and quite useful in the right circumstances... (my current beef being integer calcs in PS 2.0 and higher).

JD · Aug 25, 2003

D3d is targeting engines that's the reason it has reduced functionality over gl. Gl can be used for engines as well as tools. I use polys in gl but I might go back to indexed tris. I need to write it up and test for speed up. Though, polys are easier to work with than indexed tris.

Dio · Aug 25, 2003

Xmas said:
Dio said:

Two ways of specifying the same thing aren't necessarily better than one way...

Click to expand...

The point is simply, it is useful functionality. And D3D doesn't offer it, for whatever reason. If you mind that there are already other definitions of "quads", well, then call it something different. "Restricted quads", perhaps?

I don't like the current situation. I'm in favour of anything that makes it better. But I don't have much control either so <shrug>

akira888 · Aug 25, 2003

Chalnoth said:
(my current beef being integer calcs in PS 2.0 and higher).

I don't understand why this should be supported at all. Overbright lighting (among other things) all are helped greatly by the FP format, and as we move into fragment shaders with 100's of consecutive operations all dependent upon previous calculations integer precision quickly loses whatever value it once may have had. I'm glad to see it gone.

KimB · Aug 25, 2003

FP16 isn't much better for recursive calcs than INT12. And not all calculations need large dynamic range. I've said it before, and I'll say it again. While nVidia may have put far too much integer power compared to FP power in the NV30-34, that doesn't totally invalidate the use of the integer format for the reason of performance. INT12 is still higher in detail than the output to the framebuffer for these video cards, and so there are still a number of calculations that would look no worse in INT12 than they do in any other format.

Mariner · Aug 25, 2003

Chalnoth said:
FP16 isn't much better for recursive calcs than INT12.

Which I assume is the reason that the minimum for PS2.0 was defined as FP24 by Microsoft.

Personally, I think that FP16 will probably be enough accuracy for the first generation of DX9 games as it is doubtful that these will make too much use of complex shaders which might require the additional accuracy of FP32/FP24. I can therefore understand why FP16 could possibly have been useful as part of the DX9 spec.

Arguing that INT12 should have been supported, however, seems too much of a retrograde step to me. After all, FP16 is still better than INT12 and how many downward steps in quality do we want to allow? The NV30 reliance on FP16 or even INT12 for good performance was simply a poor design decision. R300 et al have shown that good performance is possible within DX9 spec and at higher quality than NV30 as well.

It seems obvious that NV themselves are aware of their error as the NV35 appears to have improved FP performance and they discontinued the NV30 at the first possible opportunity.

KimB · Aug 25, 2003

Mariner said:
Chalnoth said:

FP16 isn't much better for recursive calcs than INT12.

Click to expand...

Which I assume is the reason that the minimum for PS2.0 was defined as FP24 by Microsoft.

The minimum for PS 2.0 is FP16, when using the partial precision hint. And FP24 also pales in comparison to FP32 when dealing with recursive calcs (which pales in comparison to FP64, etc.).

CPU's have always been about picking the right precision for the job. They support multiple precisions for a very good reason, and that reason is that sometimes it is better to sacrifice precision for speed, because that sacrifice in precision will mean nothing for the final output. This is particularly the case in 3D graphics, where the final output will be, at most, 12-bit integer (currently the highest in PC 3D is 10-bit, but most still output at 8-bit).

Personally, I think that FP16 will probably be enough accuracy for the first generation of DX9 games as it is doubtful that these will make too much use of complex shaders which might require the additional accuracy of FP32/FP24. I can therefore understand why FP16 could possibly have been useful as part of the DX9 spec.

One also has to recognize that not all calculations will exacerbate the errors. Some calculations will tend to hide them, by their very nature. Just because a shader is complex doesn't necessarily mean that it will require much higher accuracy than the final output. It all depends on what calculations are done, and what kind of data those calculations are done on.

I fully believe in the philosophy of making multiple precisions available, and integer calculations are every bit as valid as floating point. The only question that should be asked is, for most 3D graphics programs, will it be better for hardware to support integer calcs (or any given precision) explicitly? Or will more performance be obtained if those transistors were instead used to improve performance for a higher-precision format?

Arguing that INT12 should have been supported, however, seems too much of a retrograde step to me.

I'll make it simple. If INT12 had been supported, nVidia wouldn't be inclined to force the use of lower precisions. The way it is now, DirectX 9 is ensuring lower-quality rendering on the NV30-34 processors, as nVidia must use auto-detection to make use of the significant integer processing power. If INT12 was supported in the API, games could both perform higher and look better on these video cards.

And I'll say it one last time. Stating that INT12 or FP16 are just bad for 3D graphics is an arbitrary judgement. Whether or not they are useful depends on the algorithm. Both formats are still higher in quality than the final output, so obviously there will be a number of calculations that will not benefit from higher precisions.

demalion · Aug 25, 2003

Have we got any new answers to this discussion, or are we simply engaging in pure repetition?

Althornin · Aug 26, 2003

Chalnoth said:
I'll make it simple. If INT12 had been supported, nVidia wouldn't be inclined to force the use of lower precisions. The way it is now, DirectX 9 is ensuring lower-quality rendering on the NV30-34 processors, as nVidia must use auto-detection to make use of the significant integer processing power. If INT12 was supported in the API, games could both perform higher and look better on these video cards.

Talk about reverse logic!
If nVidia had decent FP speed, they wouldnt need to revert to lower precision modes.
I'd say, its more like "the way it is now, if nVidia stops cheating, nVidias crappy FP performance is ensuring crappy DX9 performance on NV30-35 processors, as nVidia feels it must cheat in order to appear competetive. If INT12 was supported in the API, no one would use higher precision, because they all pander to the lowest common denominator, and that would set graphics back a while - but nVidia would reign supreme again!"

KimB · Aug 26, 2003

The NV30-34 need to use some integer precision for good performance.

nVidia will ensure that the NV30-34 will have good performance under DirectX.

nVidia therefore must drop down to INT12 precision in a somewhat arbitrary way. If Microsoft would simply support the format, at least programmers would have control over when this happens, so that they could use the higher precisions when it's necessary to do so. Right now, programmers don't even have that option on this hardware.

FUDie · Aug 26, 2003

Chalnoth said:
The NV30-34 need to use some integer precision for good performance.

nVidia will ensure that the NV30-34 will have good performance under DirectX.

At probable cost of violating the DX9 spec.

nVidia therefore must drop down to INT12 precision in a somewhat arbitrary way.

At definite cost of violating the DX9 spec.

If Microsoft would simply support the format, at least programmers would have control over when this happens, so that they could use the higher precisions when it's necessary to do so. Right now, programmers don't even have that option on this hardware.

Which is not Microsoft's problem. NVIDIA knew what the DX9 requirements were. NVIDIA chose not to implement them as efficiently as the competition. NVIDIA is to blame here, not Microsoft. I don't care about "woulda, shoulda, coulda", the point is that the DX9 spec. is FP24 minimum, with FP16 allowed on operations specified with _pp, that is it.

-FUDie

KimB · Aug 26, 2003

FUDie said:
Which is not Microsoft's problem. NVIDIA knew what the DX9 requirements were. NVIDIA chose not to implement them as efficiently as the competition. NVIDIA is to blame here, not Microsoft. I don't care about "woulda, shoulda, coulda", the point is that the DX9 spec. is FP24 minimum, with FP16 allowed on operations specified with _pp, that is it.

-FUDie

There are two possible reasons for what happened.

1. The use of integer formats was decided somewhere around two years before the release of DirectX 9. If this is the case, then nVidia cannot be at fault, because Microsoft didn't develop the DirectX 9 spec until at least 12-18 months into development. It would have cost too much to change by then, and, besides, nVidia had other problems at the time.

2. The original NV30 spec wasn't supposed to have inherent support for INT12, and was supposed to be quite fast at FP16 and FP32. The current NV30 was borne from a number of problems and mistakes that occurred. If this is the case, then it is nVidia's fault, but only in that their original design was too aggressive for the resources allocated (i.e. it was a management and/or engineering problem).

Either way, you, along with a great many other people, seem to be operating under the assumption that just because the NV30 was finally released after DirectX 9, that nVidia actually had the ability to change the NV30's architcture in order to match what Microsoft decided was to be the DirectX 9 spec. That's simply ludicrous. It takes too much time and too much money to change a significant part of the spec like that.

API specs have always been designed after hardware. It's not nVidia who decided to violate the DirectX 9 spec. It's Microsoft who decided to write the DirectX 9 spec to not work well with the NV30-34.

MDolenc · Aug 26, 2003

Chalnoth said:
API specs have always been designed after hardware. It's not nVidia who decided to violate the DirectX 9 spec. It's Microsoft who decided to write the DirectX 9 spec to not work well with the NV30-34.

Times change... Also what if MS supported int? Then NV35 would have problems, becouse mixing ints and floats are just not good idea on this GPU. If you feel int is enough you can as well use ps_1_1...
On the other hand you seem to claim that NV30 is slow in DX9 JUST becouse MS didn't let ints into DX9 spec right? Wrong! Even in ps_1_1 shaders (which are perfectly fine to run in int mode on NV30) NV30 will fall behind Radeon 9700 which runs in f24 mode... NVIDIA has a LOT OF WORK to do in their drivers before they can say: "OK now our performance is as good as DX spec allow it", and hacking around the spec for specific applications doesn't help them.

KimB · Aug 26, 2003

MDolenc said:
Times change... Also what if MS supported int? Then NV35 would have problems, becouse mixing ints and floats are just not good idea on this GPU. If you feel int is enough you can as well use ps_1_1...

Int is enough only for some calculations. Even with PS 1.1, many calculations were done in much higher-precision floating-point format (specifically, the texture ops). Anyway, PS 2.0 added a fair amount more than just FP.

On the other hand you seem to claim that NV30 is slow in DX9 JUST becouse MS didn't let ints into DX9 spec right?

No. The NV30 clearly has other significant shader performance problems. Microsoft's writing of the DX9 spec is but one aspect.

andypski · Aug 26, 2003

Chalnoth said:
CPU's have always been about picking the right precision for the job. They support multiple precisions for a very good reason, and that reason is that sometimes it is better to sacrifice precision for speed, because that sacrifice in precision will mean nothing for the final output. This is particularly the case in 3D graphics, where the final output will be, at most, 12-bit integer (currently the highest in PC 3D is 10-bit, but most still output at 8-bit).

I know that I've pointed out before that I believe that you are looking at the wrong architectural example, so I'll do it again -

My questions here are simple :

- Why do you (and others) regularly insist on picking CPU architectures as providing the example of what VPUs should or should not do?

- What similarities in design do you see between VPUs and CPUs that leads you to believe that this is an appropriate or valid argument?

- Why do you not pick DSPs as the architectural precedent, or dedicated high-speed SIMD/vector processors such as Cray? How do you see VPUs as being more similar to CPUs than to these architectures.

Personally, I think that FP16 will probably be enough accuracy for the first generation of DX9 games as it is doubtful that these will make too much use of complex shaders which might require the additional accuracy of FP32/FP24. I can therefore understand why FP16 could possibly have been useful as part of the DX9 spec.

Click to expand...

One also has to recognize that not all calculations will exacerbate the errors. Some calculations will tend to hide them, by their very nature. Just because a shader is complex doesn't necessarily mean that it will require much higher accuracy than the final output. It all depends on what calculations are done, and what kind of data those calculations are done on.

And just because a shader is simple doesn't necessarily mean that it automatically requires low accuracy. I can have a program that contains one arithmetic instruction, one dependent texture read (wow - 2 whole instructions in length) and display the results with an 8-bit per channel screen depth and display serious calculation inaccuracies.

The only question that should be asked is, for most 3D graphics programs, will it be better for hardware to support integer calcs (or any given precision) explicitly? Or will more performance be obtained if those transistors were instead used to improve performance for a higher-precision format?

That would depend if you are trying to accelerate a specific performance case (low precision), or the most generally applicable performance case (high precision).

Do the programs you describe need the specific case, or the general one? Legacy apps, designed around low precision, will only require the specific low precision case. Future applications, designed with higher requirements in mind may perhaps need the general case more than a specific one, so perhaps a forward-looking design should target this case?

That is the sort of design decision that has to be made by ATI, nVidia and anyone else making 3D graphics architectures.

I'll make it simple. If INT12 had been supported, nVidia wouldn't be inclined to force the use of lower precisions. The way it is now, DirectX 9 is ensuring lower-quality rendering on the NV30-34 processors, as nVidia must use auto-detection to make use of the significant integer processing power. If INT12 was supported in the API, games could both perform higher and look better on these video cards.

So what you're saying is that if everyone kow-towed to nVidia and made things the way they dictate then amazingly they wouldn't feel pressured in the market by competitors coming up with superior implementations? If they have to creatively interpret the specifications it's because they operate in an evil market that allows free competition? How inconvenient that must be for them.

Why should everyone else in the market have to kow-tow to nVidia's 'vision' whether it's superior or not?

And I'll say it one last time. Stating that INT12 or FP16 are just bad for 3D graphics is an arbitrary judgement. Whether or not they are useful depends on the algorithm. Both formats are still higher in quality than the final output, so obviously there will be a number of calculations that will not benefit from higher precisions.

Yes, this is true. There will be a number of calculations that will not benefit from higher precision. It also appears that this 'vision' of freely mixing precisions does not automatically make you faster than a processor designed specifically to accelerate the most general case of high precision.

Surely if freely mixing precisions is such a great feature then an architecture that can do so should be winning on all legacy apps (where it can use whatever precision it likes) by some huge margin? Why is this not the case?

How do you go about botching such a 'superior' vision to an extent such that even when mixing precisions freely you still can't necessarily match the performance of another architecture that is designed simply to accelerate the most general high-precision case? Even worse, perhaps it turns out that even running at higher clock rates (in some cases much higher) you cannot necessarily make up the deficit?

Maybe it's just not necessarily a superior vision after all.

Why doesn't DX9 support simple Quads?

Dio

Humus

Crazy coder

Humus

Crazy coder

Humus

Crazy coder

Xmas

Porous

KimB

JD

Dio

akira888

KimB

Mariner

KimB

demalion

Althornin

Senior Lurker

KimB

FUDie

KimB

MDolenc

KimB

andypski

Similar threads