Will this approach be possible?

LuxMentis

Newcomer
I know that no one here knows what VS 3.0 capable hardware will be capable of (and if you do, you're under NDA) but I wanted to know if my approach was just so ridiculously slow that it' impossible for the NV40/R420 to be able to do it, unless an unimaginable improvement comes along.

As a hobby project, I would like to design a game that has huge terrains that are extremely detailed. Since I can't possibly store the entire terrain geometry, I have to make it almost completely procedural. That is, you have a texture map that specifies the height every 64 or 128 meters or so, and all points in between are interpolated and have perlin noise added for variety. I'd like to think we're at the point where I don't have to deal with slow, CPU-intensive, mindlessly complicated algorithms that constantly change the limited geometry; basically I want to push a static flat mesh (with built-in LOD) to the vertex shader, which would perform all the height calculations and extrude the "terrain" to the appropriate height one point at a time, for every frame.

Would this be impossibly slow? I only own a GF2MX right now :( , so I don't know anything about the speed of vertex shaders. My Pentium VI 2 GHz processor can perform the algorithm on 64k points in 200 milliseconds. I figure with their 16 parallel vertex shaders the cards of the future (NV40) ought to have no problem doing 64k vertices. The problem is, I'll probably have a lot more vertices than that. How many exactly, I don't know. I'm doing lots of testing now to see what looks the best but is still a small number. Does anyone who has created "realistic" terrain have an idea? I'd also like to be able to see quite far...

Anyway, it seems like it would certainly be possible based on the fact that its at least 16 times faster than my test processor (a CPU which doesn't perform terribly) and probably much faster than that since it can do certain math operations very quickly. On the other hand, I've heard people say that shaders that run into the hundreds of instructions will crawl if given more than a few objects. This runs counter to my estimations, and since my vertex shader would be quite long (might even skirt the 512 limit) I'm thinking maybe my forward looking hope or a simple, good-looking, near-infinite terrain is simply impossible. I know no one knows exactly how these cards will perform, but since a lot of you have some experience with the hardware, I figured you'd at least know if the plan is ridiculous. And thats obviously something I should know before I spend more time on it.
 
Check out Humus' infinite terrain demo. It can even run on your GF2MX! Of course, this is evaluating the perlin function on the CPU rather than in a vertex shader, but the general answer is...yes, it can certainly be done (with some clever optimizations).
 
'course, the type of games where you'd want to use infinite terrain with randomly-generated details seem pretty limited. I would imagine most would rather go the route of compression than random generation.

Btw: yes, I know that the procedural algorithm is deterministic. But the point is that it's still pseudo-random, meaning it doesn't lend itself well to hand-tweaking.
 
Texture access for vertex shaders 3.0 is said to be quite fast once you have the hardware (but software dies here).
On the other hand if you want 512 instruction vertex shader for perlin noise, that might be a problem. Though I don't know why you need that many?
 
My own terrain renderer also uses Perlin Noise (well, fractal Brownian motion) to generate terrain -- and it is potentially infinite. Actually, I made it so I can vary the terrain for each segment and blend them seamlessly based on other data.

It isn't currently done in the vertex shader, of cousse, but I did have the same thought. Until VS 3.0 comes around, however, it is unlikely to be practical due the potentially large number of instructions needed for the noise function.

Another potential problem is a potentially large number of texture lookups. My own algorithm (run on the CPU, of course) requires a bare minimum (using linear or cosine interpolation) of 4 noise lookups per octave. One octave would be kinda boring. I use 6 or 7. 24 to 28 texture lookups. . . :oops: Of course, the problem may just be my algorithm. I guess one could always do it in the fragment shader, rendering multiple passes (one or two octaves per pass) to a texture and then using that texture as a displacement map. . .
 
MDolenc said:
Texture access for vertex shaders 3.0 is said to be quite fast once you have the hardware (but software dies here).
On the other hand if you want 512 instruction vertex shader for perlin noise, that might be a problem. Though I don't know why you need that many?

I might not. But I would also like to generate normals using the derivatives of the octave surfaces, as well as some other things. It's only a few instructions, but since I'd be adding a feature or two (or three or four) to the normal height generator (and I'd also like to do Rayleigh-Mie effect lighting calculations), I think it'd get bigger than 256. A lot of the number 512 has to do with the fact that I worry too much before actually doing anything. Also, what do you mean when you say "Texture access for vertex shaders 3.0 is said to be quite fast once you have the hardware (but software dies here)"?

Chalnoth said:
'course, the type of games where you'd want to use infinite terrain with randomly-generated details seem pretty limited. I would imagine most would rather go the route of compression than random generation.

You're right, so I wouldn't completely be using noise. I want to do it so that you can control the basic terrain by placing "vectors" that point straight up every 64 or 128 (something like that) meters, and then using a function to interpolate the values of the vertices that fall inside there. I think you could gain a lot of control that way if you weight those vectors as much more influential than the noise. The noise would be in (relatively) small amounts to disturb the surface and make it more realistic looking. Since there wouldn't be many vectors (world size^2/64^2), you could encode their data into a giant texture map. The big reason I have to wait for VS 3.0 is so I can read the texture map in the vertex shader.

Ailuros said:

Uh oh. Was the 16 shaders just a rumor I heard?

Dave H said:
Check out Humus' infinite terrain demo

Thanks for the link, that site is great. However, I wanted to be able to see really far and use a multi-octave noise function. If this demo did that, I don't think it would run too well without a lot of fancy optimization. While I'm sure it's possible to do it, I can imagine its not for the faint of heart. I find a lot of the modern terrain algorithms very complicated as it is. I'm hoping my idea works because it is excessively easy in comparison. That there is zero CPU overhead is kind of a bonus (a nessecary bonus, since my once "new" computer means less and less every day). On the other hand, the other tutorials are really helpful for a newbie like me.
 
LuxMentis said:
Also, what do you mean when you say "Texture access for vertex shaders 3.0 is said to be quite fast once you have the hardware (but software dies here)"?
I mean that once the vs 3.0 hardware is availible it will be fast, but if you try and play around with software vs 3.0 shaders don't expect to much from it.
 
Considering transistor count of one NV3x vertex shader unit...
The vertex shading of the NV40 would take 144M transistors, that's over 95% the rumored transistor count! :oops:

Although I kinda put those figures in doubt - could be a third of that, too, thus 48M - but even then, that's a ridiculous amount of transistors for the Vertex Shader, hehe.


Uttar
 
16 scalar processors -> 4 vector processors ? (~35 M transistors)

I think that I've heard something like this. But other sources say other things. So who knows ? :)
 
LuxMentis said:
You're right, so I wouldn't completely be using noise. I want to do it so that you can control the basic terrain by placing "vectors" that point straight up every 64 or 128 (something like that) meters, and then using a function to interpolate the values of the vertices that fall inside there. I think you could gain a lot of control that way if you weight those vectors as much more influential than the noise. The noise would be in (relatively) small amounts to disturb the surface and make it more realistic looking. Since there wouldn't be many vectors (world size^2/64^2), you could encode their data into a giant texture map. The big reason I have to wait for VS 3.0 is so I can read the texture map in the vertex shader.
Well, if your noise is small enough, you could just use a procedurally-generated normal map and bump map in the fragment shader. This might be more preferable if you plan on, say, a multiplayer game where you may not want this random noise to largely affect gameplay. It would also make things easier on collision detection.

Anyway, the primary reason I was thinking this might not be a good idea for most games would be in how the terrain intertwines with structures.
 
Chalnoth said:
Anyway, the primary reason I was thinking this might not be a good idea for most games would be in how the terrain intertwines with structures.

Yeah, that bothered me for a while. Basically I figured I could use the concept of a "sector" vector. A sector is any 64x64 region defined by 4 corner vectors extracted from the large texture. The upper left vector's additional information is used to modify the original algorithm.

In other words, I would use a 32 bit texture map instead of an 8 bit one. The rough height map would be stored in one channel (say, the R channel). The G and B channels would store information about how to procedurally generate the surface using the other 3 vectors. In particular, the G channel would be the noise strength, relative to the height vectors. By that I mean if it was a really low number, the noise would hardly affect the terrain at all compared to the rough height vectors. If it was large, it would affect it a lot. Whenever there would be a building around the noise strength would go way down for the sector(s) the building was in. So the building would basically be sitting on a flat surface. This would look sensible I think, because usually when you build a building, you try to make it relatively flat in the region around it. Hopefully it won't be too obvious that those regions are always multiples of 64 by 64.
 
Is it just the pints of Old Speckled Hen, or is 'sector vector' inherently somewhat amusing?
would look sensible I think, because usually when you build a building, you try to make it relatively flat in the region around it.
You've clearly never been to Wales.
 
Of course there are exceptions, but I think he's mostly right. When you build a building, one of the first steps is usually to flatten the surrounding land. Of course it doesn't always happen, but it is common.

Anyway, side note. If you want to see unique architecture, take a visit to UC Davis. I heard that there was an Unreal Tournament level built after one of the buildings (never played the level myself).
 
LuxMentis said:
I know that no one here knows what VS 3.0 capable hardware will be capable of (and if you do, you're under NDA) but I wanted to know if my approach was just so ridiculously slow that it' impossible for the NV40/R420 to be able to do it, unless an unimaginable improvement comes along.

As a hobby project, I would like to design a game that has huge terrains that are extremely detailed. Since I can't possibly store the entire terrain geometry, I have to make it almost completely procedural. That is, you have a texture map that specifies the height every 64 or 128 meters or so, and all points in between are interpolated and have perlin noise added for variety. I'd like to think we're at the point where I don't have to deal with slow, CPU-intensive, mindlessly complicated algorithms that constantly change the limited geometry; basically I want to push a static flat mesh (with built-in LOD) to the vertex shader, which would perform all the height calculations and extrude the "terrain" to the appropriate height one point at a time, for every frame.

Would this be impossibly slow? I only own a GF2MX right now :( , so I don't know anything about the speed of vertex shaders. My Pentium VI 2 GHz processor can perform the algorithm on 64k points in 200 milliseconds. I figure with their 16 parallel vertex shaders the cards of the future (NV40) ought to have no problem doing 64k vertices. The problem is, I'll probably have a lot more vertices than that. How many exactly, I don't know. I'm doing lots of testing now to see what looks the best but is still a small number. Does anyone who has created "realistic" terrain have an idea? I'd also like to be able to see quite far...

Performance will probably be bearable. But regenerating all that data every frame on the GPU/VPU is going to be much slower than the amortized cost of generating it once on the CPU, then just treating it as static data. I don't recommend going nuts with vertex shader programs unless your data is inherently dynamic and changes every frame (for example, like a skeletal animated mesh).

Note that Perlin noise is good for "filling in the little details", but not great for generating large-scale datasets. You would probably be better off generating terrain with a resolution in the several-meter range in a dedicated program that models erosion, etc. And then just use procedural effects when magnifying its data up to the several-centimeter range.

Anyway, it seems like it would certainly be possible based on the fact that its at least 16 times faster than my test processor (a CPU which doesn't perform terribly) and probably much faster than that since it can do certain math operations very quickly. On the other hand, I've heard people say that shaders that run into the hundreds of instructions will crawl if given more than a few objects. This runs counter to my estimations, and since my vertex shader would be quite long (might even skirt the 512 limit) I'm thinking maybe my forward looking hope or a simple, good-looking, near-infinite terrain is simply impossible. I know no one knows exactly how these cards will perform, but since a lot of you have some experience with the hardware, I figured you'd at least know if the plan is ridiculous. And thats obviously something I should know before I spend more time on it.

I'm just guessing here but I think with the next generation it should be reasonable to expect 4X more GPU/VPU vertex throughput than with an identical CPU-based algorithm, partly because there are lots more independent pipelines, partly because the data is flowing directly between the units without being read/written from intermediate storage 3X. But if it's a question of generating the data once and rendering it statically, versus generating it every frame, the "generate once" approach seems pretty attractive.

I don't expect to see a 16X GPU/VPU advantage over the CPU for a long time. Here you'll either need a huge number of pipelines (like 32 or 64), or GPU speeds will need to increase relative to CPU speeds. Right now GPU/VPU's are operating at around a 6X clock disadvantage to CPU's (3200 divided by 500), so they need about 3 parallel vertex pipelines just to keep up (we'll award them a 2X scaling factor because they avoid the memory latency problems of CPU's). But I think both will happen over the next few generations. Once the GPU/VPU is designed as a whole lot of simple FPU's operating in parallel, there's no reason you can't make the clock rate more competitive with CPU's. And GPU/VPU's can make more effective use of more parallel pipelines than CPU's, because all their operations are inherently parallel, whereas CPU's have to do something like Hyperthreading which is dependent on major changes to how apps are programmed.

Just my opinion.
 
Rev - I don't think GPUs will be clock competitive with CPUs anytime soon. Why? They seem to have more transistors, greater utilization of functional units, and are behind CPUs when it comes to target process.
AFAICS this means GPUs are more constrained by power and heat issues
than CPUs. Also, high frequency CPUs (i.e. P4) contain lots of custom logic, which seems like it would be hard to do on the scale of a GPU with such short design cycles...

just my 2c,
Serge
 
:) That's just what I need for my tiny Los Angeles apartment
with no AC... it's too hot to use even a toaster.
 
Back
Top