Why doesn't DX9 support simple Quads?

correction

Correction to earlier:

v.x = sin(angle) * halfsize
v.y = cos(angle) * halfsize

halfsize = size * 0.5, this is because obviously this "v" (rotation half vector) origin is at center of the billboard in screenspace, so..
 
Re: heh

SeppoCitymarket said:
I don't like point sprites for *billboards*, they are good enough for particles, but billboards I expect to be able to world-scale intuively.
My main dislike for point sprites is the lack of a rotation angle - as you mention - there are several types of effects where the rotation angle is useful.

However, they are blindingly fast in terms of vertex rate - if people really want to do hundreds of thousands of 4x4 particles, they are definitely the way to go...

So I don't think there's a one-size-fits-all solution.
 
Re: quads vs. rest

Dio said:
SeppoCitymarket said:
However, even if we got quads for D3D, it propably would have little effect on the framerate as sending indices down doesn't appear to be the bottleneck.
Actually it can be significant if the application is CPU limited, which is why for static geometry it's important to use index buffers, but static particle systems aren't very interesting :).

Only the index buffer is static, the vertices change in the same way as they would for quads.

Thinking about it more, what you want is write back to vertex buffer from the shader. The app then just generates a vertex buffer containing the initial positions of the particals the HW could then just be used to progressively update the contents without CPU intervention.

John.
 
yup

Someone, somewhere, wrote a demo with GPU doing the simple particle system (animation)-- I haven't looked if feedback is possible to VB from VS? It is? It must be, if someone wrote a demo like this, I can't remember the reference, sorry.. but you propably know what I'm talking about anyway. ;)

We still need four vertices, though- even if the GPU is doing the position and rotation updates- the "sad" fact is, that each vertex is processed as their own unique entity, so the "physics/animation" workload would be 4x higher than it needs to be.

Solution: SetStreamFrequency() in VS 3.0 and later, so that can store the components which are same for whole billboard at 1/4 the frequency (ie. 1/4 the storage and bandwidth costs). But that's only "highend-in-the-future" solution, we still need fallback to older paths for YEARS, but it's nice to see progress.

My wishlist for DX10 is also more semantics.. since we already have COLOR0, POSITION0, blablabla... it would kind of nice to have, for example: SEMANTIC0..SEMANTIC15 - for starters! ;-) But this isn't necessary.. just something that might happen, or might not.. I definitely don't know (but for me it sounds feasible enough =)

fp32 precision requirement on PS 3.0 is a nice touch-- this will be very useful when synthesizing data. For now have to think of fp16 all the time and the range issues that crop up.

Ie. currently shaders feel wonderful and very flexible, but when you really start to use the power to drop CPU out of the loop, completely, you begin to write hacks like four identical vertices with different weight vectors (for particles, just example) or inventing new ways to do noise so that you can avoid excessive dependent texture lookups, etc. Then you write own bi-cubic texture filtering, because hardware does only linear interpolation (bi-linear filter) and you really, really would love to have cubic filter, because you could do a perlin noise octave with single bi-cubic texture lookup. Ofcourse I would use noise() intrinsic function from HLSL; but it's not implemented in HLSL's current version (publicly available anyway), ofcourse could switch to Cg.. etc.. so there are still things to straighten out and otherwise streamline.

But generally this is order of magnitude more powerful way to write graphics code than fixed-function pipeline, that's for sure.

Everyday, digging deeper and deeper.. this is a hole where one don't want to come out. ;-)
 
Re: yup

SeppoCitymarket said:
Someone, somewhere, wrote a demo with GPU doing the simple particle system (animation)-- I haven't looked if feedback is possible to VB from VS? It is? It must be, if someone wrote a demo like this, I can't remember the reference, sorry.. but you propably know what I'm talking about anyway. ;)

There's no write back capabilities. But it would certainly be cool to have such capabilities in future hardware.
Anyway, you can create simple particle systems on the GPU if you like. During my time at ATI I did a RenderMonkey project that did just that. Using time and a particle tag value between 0 and 1 I could place the particle at a desired point with some math. It wasn't the best particle system in existance, but it worked fairly nice for creating a fire effect. With the ability to read a texture or with a full noise function you could basically do a just as good particle system on the GPU as you would on the CPU.
 
Re: heh

Dio said:
My main dislike for point sprites is the lack of a rotation angle - as you mention - there are several types of effects where the rotation angle is useful.

The reason I don't like point sprites is the lack of influence on texture coordinate generation.
We like to pack multiple particles in a single texture to either have an effect with randomly varying particles (looks more natural), or have animated particles.
Packing allows the drawing of the entire effect in one go.
 
Cg..

Aaah, found the "reference" ;

The Cg Tutorial, Kilgard M & Fernando R.
6.3.3 The Particle System Parameters

It's just a simple integrator, never mind, now that I read the complete article (wasn't very long chapter). I just had the topic vaguely in mind, from browsing through the book.

Yeah, writeback would be cool, but propably would make the design of these chips a lot more complex-- it's not forget that the stream can be UP, for instance and that's something we don't want to write back to.

Would propably fuckup parallelism in a great way, but I wouldn't bet on it not being hardware engineer. I hope it wouldn't!

Vertex samplers are always on the wishlist. While speaking of samplers, programmable samplers would be cool, but that's just random babble, ignore. While talking about writeback, generally being able to use VB's as rendertargets (sort of) would open new functionality, I'm sure of it, but my thoughts are a bit random and disorganized at the moment as it's 03:00-- but the point is there propably would be some great uses for this aswell. Don't sue me if the ideas are completely ridiculous: but the generic idea here is that could share resources better between processing units and stream data from point A to point B more flexibly inside the GPU. I'm sure something could be done with that...

But then reconfiguring the GPU from CPU would propably become way too expensive and the overhead would kill all the gains there would be to made. Maybe PCI Express based GPU's have something along these lines in store for developers? To be honest, can't wait.. even if there aren't commercial applications immediately, fetists like me surely will be entertained!
 
Re: Cg..

SeppoCitymarket said:
Vertex samplers are always on the wishlist. While speaking of samplers, programmable samplers would be cool, but that's just random babble, ignore.
Sorry I'm refusing to ignore. What do exactly do you mean by "Programmable samplers"? Obviously the sample point and LOD is under program control and I suppose you can choose filter mode dynamically to some extent. Do you mean having an actual microcoded filtering unit?
 
poop

float noisebuffer[256][256];

inline float4 sample(int x, int y)
{
int x0 = (x + 0) & 255;
int x1 = (x + 1) & 255;
int y0 = (y + 0) & 255;
int y1 = (y + 1) & 255;

return float4(
noisebuffer[y0][x0],
noisebuffer[y0][x1],
noisebuffer[y1][x0],
noisebuffer[y1][x1]);
}

static inline float cubic(float v)
{
return (3.0f - 2.0f * v) * v * v;
}

static float getv(float x, float y)
{
int ix = static_cast<int>(x);
int iy = static_cast<int>(y);
float fx = x - ix;
float fy = y - iy;
float4 v = sample(ix,iy);

float i = cubic(1.0f - fx);
float j = cubic(fx);
float s = (v.x * i + v.y * j) * cubic(1.0f - fy) +
(v.z * i + v.w * j) * cubic(fy);

return s;
}

This implements one octave of perlin noise using 256x256 lookup table. If we replace the "cubic" function with "lerp", it's nothing but BILINEAR texture lookup!

With programmable sampler could implement much more wider range of things than just bicubic filtering.

But I said: ignore, because:

1. This can already be done with pixel program, just take four point samples and implement the blending in pixel program, no fat deal-- it is a bit more tricky to compute the cubic interpolator between[0.0,1.0] from fractional texture coordinates than if integer texel coordinates were used.. in hardware computing the lerp factors is trivial and fast, so I figured those variables would be accessible in "more convenient, already computed" -format, so massaging them through curve re-programming like linear-to-cubic would be feasible, but not fast... (see below)

2. Would most propably be very inefficient to have programmable samplers anyway

3. The benefits of programmable samplers in practise are questionable, besides this limited case of implementing noise, I can't think of anything else at the moment- therefore I said: ignore, because I really meant it to be ignored except for mindless rant it was

Ie. I had no intention to imply, that doing so would be efficient or desirable in hardware, I was just amusing myself since just coded such noise function with GPU for normal map generation-in-GPU.

Speaking of this "normal map generation", the biggest "problem" was limited precision of pixel shaders.. couldn't use the "big range" of values I need for the heightfield to generate the noise, so had to play a lot of numeric value games to get the results I desired.. the prototype in C++ was clean but when transfered the idea to GPU, well,.. let's just say that it wasn't as trivial as the pseudo-code suggested. ;-)

I had this level-of-detail landscape with geomorphing and the problem with changing vertex density is that lighting solution changes, so the obvious solution was to do lighting in texture space-- trilinear filter guarantees seamless lighting level-of-detail solution. I was first worried that lighting would look "weird", when the normal map samples were filtered, but it turned out to look good-- even if it's not "correct", it looks good, moving over the field is very smooth and there are no artifacts, so I'm pleased with the results.

www.liimatta.org/misc/scape1.mpg

Looks stupid as it's currently tiling same texture over and over, but the normal maps are generated in the GPU now.. it used to very slow as it takes 3 octaves of noise and the normal maps are 512x512 for certain texture region (I call them texture region because the tesselation engine is not really quad based, except where there are texture changes).

The video renders only 2048x2048 heightfield, it's over the "treshold" of 1400x1400 (about) where 128 MB of memory at that resolution runs out for the different resources it uses for rendering, so the cache and normal map synthesizer are already @ work. Texture synthesis is the next step, I could use splatting but I'm rather using alphamask based texture composition.. alpha masks are generated with CPU as GPU really runs out of flexibility for that kind of stuff (unless want something really trivial).

Anyways, the I have NO CLUE what the final looks will be like, this is just for fun anyway, I am currently not doing any work that actually requires a landscape or see any work in the future where I would be doing anything like that. So it's just for fun and don't have all day to work on this, but it might get somewhere eventually. Or not, but that wouldn't be too bad. ;)

OH, right, so basicly the remark about programmable samplers was just sarcastic comment based on what I been doing lately with DX9. Emphasis on SARCASTIC. If someone takes it seriously that's his problem. ;)
 
Re: yup

SeppoCitymarket said:
Someone, somewhere, wrote a demo with GPU doing the simple particle system (animation)-- I haven't looked if feedback is possible to VB from VS? It is? It must be, if someone wrote a demo like this, I can't remember the reference, sorry.. but you propably know what I'm talking about anyway. ;)

We still need four vertices, though- even if the GPU is doing the position and rotation updates- the "sad" fact is, that each vertex is processed as their own unique entity, so the "physics/animation" workload would be 4x higher than it needs to be.

Solution: SetStreamFrequency() in VS 3.0 and later, so that can store the components which are same for whole billboard at 1/4 the frequency (ie. 1/4 the storage and bandwidth costs). But that's only "highend-in-the-future" solution, we still need fallback to older paths for YEARS, but it's nice to see progress.

My wishlist for DX10 is also more semantics.. since we already have COLOR0, POSITION0, blablabla... it would kind of nice to have, for example: SEMANTIC0..SEMANTIC15 - for starters! ;-) But this isn't necessary.. just something that might happen, or might not.. I definitely don't know (but for me it sounds feasible enough =)

fp32 precision requirement on PS 3.0 is a nice touch-- this will be very useful when synthesizing data. For now have to think of fp16 all the time and the range issues that crop up.

Ie. currently shaders feel wonderful and very flexible, but when you really start to use the power to drop CPU out of the loop, completely, you begin to write hacks like four identical vertices with different weight vectors (for particles, just example) or inventing new ways to do noise so that you can avoid excessive dependent texture lookups, etc. Then you write own bi-cubic texture filtering, because hardware does only linear interpolation (bi-linear filter) and you really, really would love to have cubic filter, because you could do a perlin noise octave with single bi-cubic texture lookup. Ofcourse I would use noise() intrinsic function from HLSL; but it's not implemented in HLSL's current version (publicly available anyway), ofcourse could switch to Cg.. etc.. so there are still things to straighten out and otherwise streamline.

But generally this is order of magnitude more powerful way to write graphics code than fixed-function pipeline, that's for sure.

Everyday, digging deeper and deeper.. this is a hole where one don't want to come out. ;-)

No direct VB write back in the current Dx9, you can emulate similar behaviour by writing out too a texture in the PS and then use the VS sampler to pick up the results (obviously needs 3.0 HW).

With the write back you would generate you incremental pass on point positions etc. Then do another pass that, as you said, creates the other from low rate streams.

But the parameterised approach works OK, bit difficult to drop particles when there lifetime expires though.

John
 
Back
Top