Why doesn't DX9 support simple Quads?

FUDie said:
Chalnoth said:
Mariner said:
Well, here I'd say that forseeing any problems which might occur is part of the design process. Therefore not forseeing some of the problems they encountered made their design decisions inherently flawed.
I don't think so. I don't think that it is possible to forsee all problems that will occur during development. One big factor was that, from what Uttar's been posting, nVidia's original transistor budget was a fair bit higher than the transistor count of the final NV30. That alone could have resulted in a very large obstacle for getting the design implemented properly.
Since when is Uttar a spokesperson from NVIDIA? If you're basing your whole line of thinking on some quite unsubstantiated rumors, then you're very gullible.

-FUDie

ROFL! Although you're certainly right - You shouldn't base everything on what I say, my info, while IMO being always fairly accurate, is far from perfect :)

Something I'd insist on though: I never, ever said that the transistor bugdet was originally higher than what it is today. I've got NO idea about that. For all I know, they could have been retrieving some stuff in order to fit in some other stuff they realized was necessary or whatever. I don't know all that, sorry.

Also, I'd say this discussion is 100% futile because:
1. The Det50s will, AFAIK, improve FP performance by a substancial amount.
2. The NV40 is probably FP32 from top-to-bottom, really this time. Note the probably - I don't have it written white on black anywhere.

So I, for one, don't care anymore about whether integer would still be a good idea in an architecture. That's just as useful as speculating on Mojo today.


Uttar
 
JohnH said:
Why would you add HW for something that is hardly ever use ? They may be simple to implement, but you end up expanding the HW test matrix for no real reason. Anyway, as I said, I would be surprised if that much HW actually directly supported them (3DLabs maybe?), as the gain is so small.
And I would be very surprised if there is any modern chip that doesn't directly support them and needs indices generated by the driver.

Some things are done in hardware even though they are rarely used, because they're incredibly cheap to implement.
And quads aren't that rarely used. Point sprites and lines are quads, too. Generation of vertices is different, but still the hardware has to order these vertices correctly to get 1-2-3 1-3-4.


I did have a question burried in one of my previous post's that no ones answered, so I'll unburry it : What do you need quads for given that the extra BW requirements for indices are unlikley to have much, if any impact, on performance ?
Saving CPU overhead would most likely be the biggest benefit, but when you render lots of separate quads, you usually render particles with only position, and one texcoord. So index size is relevant here.
 
JohnH said:
I suspect you'll find that in OGL the driver is just turning your quads into triangles. You might find that using D3D and indexed triangles, where you supply the indicies in a static index buffer is faster as the driver would no longer have to mess with the data.

Well, that's not my experience. I have seen no performance reduction myself using quads, though I'm usually for most parts fillrate limited. I have never heard anyone else complain about slow quads either, though I've heard plenty of other slowdown factors in geometry processing, so I doubt any reasonably recent hardware lacks the ability to handle quads. It should really be very cheap to implement in hardware.
 
JohnH said:
I did have a question burried in one of my previous post's that no ones answered, so I'll unburry it : What do you need quads for given that the extra BW requirements for indices are unlikley to have much, if any impact, on performance ?

Convenience.
 
Inconvenient ? That much ? Slower ? How so ? Really ? But I'm not a 3D programmers so...

EDIT: I guess you would lose some fillrate on the larger surface, ok, makes sense, wonder if it's that bad in practice tho... Anyway!
 
Humus said:
so I doubt any reasonably recent hardware lacks the ability to handle quads. It should really be very cheap to implement in hardware.
I've only spent about 10s thinking about this, but how would you tell if the hardware genuinely supported quads or not if the system states that quads must be planar in both geometry and texture/lighting coordinates?

Surely, if you supply a quad strip
Code:
A---C----E
|   |    |
B---D----F
as "ABCDEF" would that not also be valid for the triangle strip equivalent?
 
Xmas said:
JohnH said:
Why would you add HW for something that is hardly ever use ? They may be simple to implement, but you end up expanding the HW test matrix for no real reason. Anyway, as I said, I would be surprised if that much HW actually directly supported them (3DLabs maybe?), as the gain is so small.
And I would be very surprised if there is any modern chip that doesn't directly support them and needs indices generated by the driver.
Proof ? (I don't mind being proven wrong!)
Some things are done in hardware even though they are rarely used, because they're incredibly cheap to implement.
And quads aren't that rarely used. Point sprites and lines are quads, too. Generation of vertices is different, but still the hardware has to order these vertices correctly to get 1-2-3 1-3-4.
Point sprites and lines are examples where getting the CPU to generate them makes it difficult to use the HW's geometry processing unit (as the extra vertices are derived in screen space), so not supporting them would have an extreme performance penalty, this is not the case for quads.
I did have a question burried in one of my previous post's that no ones answered, so I'll unburry it : What do you need quads for given that the extra BW requirements for indices are unlikley to have much, if any impact, on performance ?
Saving CPU overhead would most likely be the biggest benefit, but when you render lots of separate quads, you usually render particles with only position, and one texcoord. So index size is relevant here.

What CPU overhead ? Use a static index = zero CPU overhead. If you calc the BW numbers I think you'll probably find that even in your quoted case it probably won't impact perf i.e. you'll just hit the core's setup limits (if you're not fillrate bound anyway).

John.
 
JohnH said:
I suspect you'll find that in OGL the driver is just turning your quads into triangles.

Completely unrelated to the problem that Humus is having, but decomposing quad's into tri's doesn't get you the same results in all cases. The one I can think of off the top of my head is if you have glPolymode set to GL_LINE (draw the outline of the primitive, rather than a filled version)

Quad:
Code:
----
|  |
|  |
----

Quad decomposed into Tris:
Code:
----
| /|
|/ |
----

A corner case, sure, but it's still wrong.
 
PSarge said:
JohnH said:
I suspect you'll find that in OGL the driver is just turning your quads into triangles.

Completely unrelated to the problem that Humus is having, but decomposing quad's into tri's doesn't get you the same results in all cases. The one I can think of off the top of my head is if you have glPolymode set to GL_LINE (draw the outline of the primitive, rather than a filled version)

A corner case, sure, but it's still wrong.

Tend to get around that one with edge flags on the poly when generating internally.
 
Humus said:
JohnH said:
I suspect you'll find that in OGL the driver is just turning your quads into triangles. You might find that using D3D and indexed triangles, where you supply the indicies in a static index buffer is faster as the driver would no longer have to mess with the data.

Well, that's not my experience. I have seen no performance reduction myself using quads, though I'm usually for most parts fillrate limited. I have never heard anyone else complain about slow quads either, though I've heard plenty of other slowdown factors in geometry processing, so I doubt any reasonably recent hardware lacks the ability to handle quads. It should really be very cheap to implement in hardware.

Have you ever compared performance? Although if you're fill limited your not going to see any.

Thinking about it, for a non indexed quad list the driver could pull the same trick with a static index buffer as I suggested the app might do, so all things given, perf may be the same anyway.

Humus said:
JohnH said:
I did have a question burried in one of my previous post's that no ones answered, so I'll unburry it : What do you need quads for given that the extra BW requirements for indices are unlikley to have much, if any impact, on performance ?

Convenience.
I was hoping that you would at least say sub division surfaces! Generally speaks Quads->triangles is a very minor inconvenience, so thats no justification for HW support!

Humus said:
Inconvenient and slower?
But you said yourself that you're fill limited, and if you use a static index buffer what happens?

Enough of this babbling I'm off for a long weekend.
John.
 
JohnH said:
Proof ? (I don't mind being proven wrong!)
I can't prove it because IHVs rarely publish such details.
Point sprites and lines are examples where getting the CPU to generate them makes it difficult to use the HW's geometry processing unit (as the extra vertices are derived in screen space), so not supporting them would have an extreme performance penalty, this is not the case for quads.
But they are quads, and the only thing you need to support quads is some logic that fetches vertices in the correct order from the transformed vertex cache (same with triangle fan/strip, along with some edge flag and cw/ccw flipping). You do that for lines and point sprites, too, so why not for quads?
 
Xmas said:
JohnH said:
Point sprites and lines are examples where getting the CPU to generate them makes it difficult to use the HW's geometry processing unit (as the extra vertices are derived in screen space), so not supporting them would have an extreme performance penalty, this is not the case for quads.
But they are quads, and the only thing you need to support quads is some logic that fetches vertices in the correct order from the transformed vertex cache (same with triangle fan/strip, along with some edge flag and cw/ccw flipping). You do that for lines and point sprites, too, so why not for quads?
As I said, they are very different to quads as they imply addional vertices derived in screen space, these vertices generally being generated at a completly different point in the pipeline so need specific support, this is not the case for quads which can trivially be converted to triangles by an app, or even the driver where the app is being awkward. As I think said before, the main reason for not wanting to supporting quads is that it just increases the amount of testing needed e.g. we would have to add, quad list, strip and fan support in indexed and non indexed variants which would need testing against all the various combinations of vertex stream manipulation, for little or no gain.

If the industry had a longer cycle time then I'd be more than happy with quads and other minor conveniance features making there way into to API's (I know they already in OGL) and then HW, as we might then have time to test the things better and avoid all those nasty little surprises (often described as driver bugs) ISV's find when they try something in a hitherto untested combination/sequence.

John.
 
quads vs. rest

Well, I do know, that when I draw billboards I need six vertices per billboard (two triangles, using D3DPT_TRIANGLELIST).

Since I like trimmed-down VB's, I store only four vertices and use six indices to these four vertices (per particle/billboard).

With quads could use only four vertices, or four vertices and four indices (however in this case the indices would be redundant, while now they are very useful, since the IB is always the same, only VB or it's offset changes).

Triangle FANS are out of the question, completely, unless I want to do separate DrawPrimitive() or DrawIndexedPrimitive() call PER BILLBOARD. This is totally out of question, as the budget for those calls is limited- they burn a lot of CPU.

It's better to draw 1000's of primitives per Draw*Primitive() than 10's, or 1 in the case of fans-- remember that fans don't terminate like strips with similiar vertices put to stitch strips together.

However, even if we got quads for D3D, it propably would have little effect on the framerate as sending indices down doesn't appear to be the bottleneck.

200,000 particles @ 100 fps on RADEON 9700 PRO, each with unique color, rotation in screenspace, size in world coordinates, position and texture *1).

The bottleneck is fillrate for me, even if the particles are quite small. Heh.


*1) particles are organized groups, so that same texture is used as often as possible -- subtexturing is used to implement multiple different "textures" into single hardware texture, there is index stored in the VB which "subquad" the texture coordinates are generated for.. so in effect get wide range of different textures for the particles with low overhead. Ofcourse takes some notice from application developer, but results are very good. ;-)
 
Re: quads vs. rest

SeppoCitymarket said:
Well, I do know, that when I draw billboards I need six vertices per billboard (two triangles, using D3DPT_TRIANGLELIST).
Of course, there are many ways of billboarding. The traditional rectangle way is just the most convenient.

Point sprites are the best general method if you don't need to pack multiple images onto a single texture. Assuming that's not the case, there are three basic options:

1. If the billboards are very small such as particle systems, then using single equilateral triangles is more efficient (half the triangle rate and 75% of the vertex rate, and much more efficient should they need to be clipped). This will require reasonably careful packing of the art assets.

2. If the billboards are very large and contain a lot of transparent areas, it can be worth adjusting the geometry to cut out a lot of the areas that would be transparent, because any transparent pixel is wasted pixel fill rate. E.G. for roughly circular things like explosions, use hexagonal figures to 'cut the corners'. This reduces wasted fill rate, and discourages bad artists from making explosions with square corners ;)

3. Two triangles / one quad is of course the easiest way...
 
Re: quads vs. rest

SeppoCitymarket said:
However, even if we got quads for D3D, it propably would have little effect on the framerate as sending indices down doesn't appear to be the bottleneck.
Actually it can be significant if the application is CPU limited, which is why for static geometry it's important to use index buffers, but static particle systems aren't very interesting :).
 
Re: quads vs. rest

Dio said:
but static particle systems aren't very interesting :).
They'd be great for the "freeze the action and spin the camera around" effects ... surely that hasn't been done to death yet. :rolleyes:
 
heh

Yeah. Besides, I'd expect the framerate to still be good before and after the freeze-frame is released and time starts to flow at "Normal" speed again.

I don't like point sprites for *billboards*, they are good enough for particles, but billboards I expect to be able to world-scale intuively.

I do compute the rotation of the particle center from source (model -or- world coordinates) using ModelView transformation, then I generate the four corner vertices applying the rotation vector to each corner.

The rotation vector contains direction *and* scale of the particle, it's essentially:

v.x = sin(angle) * size
v.y = cos(angle) * size

I experimented with storing angle and size, and just the rotation half vector, and on GFFX the sincos() was v. fast, but on ATI it was a bit slower. I decided I can do the sincos() at setup, when uploading particles and even with animated billboards it performs pretty good. But atleast benchmarked both ways and found this best for our needs. Could even implement both paths, just need two shaders, obviously, and choose the best for given situation, but haven't gone to that yet.

Next, the view coordinate vertices are transformed with Projection transformation and sent forward to the pixel pipeline, being shader or fixedpipe, doesn't matter, both are good for the VS frontend.

Texture coordinate are also generated in the GPU to space, speedwise it doesn't seem to make that big of a different and I expect the GPU processing power increase in the future at faster rate than memory bandwidth (could be wrong, but opt to generate the texcoords for now).

So point sprites are non-intuive to scale, and also cannot rotate. Rotation is very important for smoke, dust, fire, etc.

It's just a way to offload traditional billboarding workload to the vertex shader, since the performance is great on GeForce3 and later class hardware, it does the work it's intended just fine. On older hardware the CPU based pipeline with FPU/SSE/etc. optimizations steps into the picture. The CPU vertex shaders still lose in performance to handwritten code (not assembly, using intrinsics).

Basicly the pipeline can sustain order of magnitude more particles the hardware has hopes of backing up with fillrate, unless it's really tiny particles but those aren't as interesting as we know the 1x1 miplevel's color and can use D3DPT_POINTLIST for particles we know to be smaller than 1 pixel in screenspace. The 200K figure earlier was for "real" particles, though, with this level-of-detail 'reduction' added the figure is, uh, quite a lot bigger.
 
Back
Top