I have to ask: vertex limit for 8800

Frank

Certified not a majority
Veteran
Yes, REYES.

Can the 8800 do it? And if not, why? Because I can think of at least three different ways. I just don't know enough about that chip and 3D programming to come up with a reasonable answer. But the needed accumulation buffer is AFAIK no problem anymore.

And, if possible, how do you get the curved polygons uploaded and processed? Or do you need the CPU to do that? That would be a major bummer.

It might be simpler to approximate the curved surfaces first to more or less flat polygons, before tesselating them to triangles that are each at most half a pixel in size, or it might not.

Last but not least, if that can be done, I don't think a simple lighting model that includes shadows (or rather, a very low ambient lighting gradient on the add to the accumulation buffer), occlusion, transparency and reflection will be much of a problem, although I would draw the line at beam splitting. If you don't do that, a simple B-buffer (like a Z-buffer, but for beams) might work.
 
Last edited by a moderator:
REYES for terrain is easy. Just raytrace it in the pixel shader with a special-purpose hierarchical scheme :) The speed could be fairly alright, I think. And yes, if it's per-pixel level, it fits the definition of REYES.

Another possibility is using the GS indeed, and tesselate quads all the way down to >1 pixel big (it's not "true" REYES, but pretty damn close). There's a very efficient way to do this, but the one proposed in the thread is quite horribly bad, of course. In fact, my scheme calculates the screenspace triangle size in the GS and conditionally tesselates based on that... I was going to submit it to GPU Gems 3, but given I don't have a good idea of whether it'll performance like shit or not, and that plain PS raytracing would likely be faster anyway, and that I haven't implemented much yet... heh.

That GS-based scheme actually has another use: you can do it to do terrain tesselation so that all your triangles are, say, ~64 pixels big, in order to maximize overall chip efficiency while managing excellent image quality. It's hard to say if that's truly doable or not, because it's obviously useless if it's slower than a most native approach because the tesselation would take too long.

Anyway the problem of REYES on modern GPUs isn't that. The problem is compression. You really should expect compression (and MSAA efficiency!) to go right out of the window with triangles that you tesselate down to ~1 pixel. And the raytracing approach would write depth directly, which would also mean bye-bye to depth compression.

So if what you want to do is a REYES pass for terrain and traditional rendering for the rest, imagine if you did it before rendering everything. All your pixels begin by no longer being compressible; while they might become compressible once they're overwritten by other things, you get at least 1 pass, and most likely more, without good compression efficiency, and possibly heavy bandwidth bottlenecks.

Now, if you do the REYES terrain pass after everything else, you don't have terrain-related early-z, and tons of things need to be shaded twice. Obviously, unless your shaders are laughably simple, that's not a good compromise. The solution I imagined is quite simple, but it has at least two problems of its own. Basically, I planned to render a conservative approximation of the terrain in the z-pass, and then do the REYES after normal color rendering.

The first problem with that is that there is no such thing as a perfectly "conservative" approximation of a heightmap. The naive way to do that is to minimize height, and that works most of the time, but not if the player can go underground, such as in caves. Of course, that has its own set of problems with heightmaps, but you can hack away at it and get something reasonable going; here, you're simply NOT going to get a good conservative approximation if you need your camera to be able to go "under" the heightmap, though.

The second problem is that some post-processes, such as depth of field, would still need to access the post-REYES depth buffer, and the efficiency there won't be any better. On the plus side of things, you're most likely limited by other things there, but there's no guarantee that's the case either.

As for REYES on generic models... There's another problem I didn't mention, because it's not too catastrophic for heightmaps. Z-Culling works on triangles, not pixels. The majority of the culling process happens after rasterization. So extending this scheme to everything generic would fundamentally limit you by triangle setup up to a stupid scheme, because you need to setup every single triangle in the viewport, not just those that might be visible!

So you come back to raytracing for that. And in the end, if you're doing raytracing, you've got no reason to use DirectX/OpenGL IMO; you should just go straight to CUDA and CTM. I'm sure you can get something midly efficient going on modern GPUs, and I wouldn't be surprised if some people already managed to do that; just don't expect mind-blowing visuals at mind-blowing performance, because all your performance will already be used at performing the primary raytraces, which really won't buy you anything (but minuscule polygons nobody will even notice). In the end, the result will look like shit, but hey, who cares? It's REYES! ;)


Uttar
EDIT: You mention a "vertex" limit in your thread title... yes and no. The G80's VS threads can cover all of the ALU latency, but if you add in texture fetches, you'll sometimes be limited by the number of VS threads it can manage at the same time, I think. The same is even more true for GS, but there you might not even have enough threads to cover your ALU latency at all, depending on your output format; there's a good post from Bob on the subject.
 
Thanks for the excellent reply!

So the main bottleneck is triangle setup and culling. And you want it to run in the PS as much as possible, for texture access. And you have a problem with compression, because the z-buffer(s) used are treated like heightmaps.

But, if you use triangles that are about half a pixel in size, you could get away by expecting them all to be the same size, and skip the triangle setup. If possible, of course.
 
And you have a problem with compression, because the z-buffer(s) used are treated like heightmaps.
I'm not sure what's that supposed to mean :) The problem is that z-buffer compression is dependent on the idea that you only have X planes (triangles) per YxY tile. I'm not sure what X and Y are on modern chips, but you shouldn't expect anything to be compressed whatsoever if you use micropolygons. BTW, obviously one problem I didn't mention is that your pipeline still works in 2x2 quads, even on G80, so the ~1 pixel polygon approach is only going to be acceptable when combining it with deferred rendering.

But, if you use triangles that are about half a pixel in size, you could get away by expecting them all to be the same size, and skip the triangle setup. If possible, of course.
I think you just reinvented raytracing there ;) Of course, the devil is in the details... If you want terrain REYES, I think raytracing is your best bet anyway though, even on a GPU, because there are a lot of special-case optimizations to be done when raytracing a "mere" heightmap... For everything else, I'm not sure why REYES is even such a good idea (displacement mapping+parallax mapping sounds way smarter to me if you're at all performance-limited, but heh!), but maybe it'll come one day anyway.


Uttar
 
Yes, when building a model in my head I noticed it turning into ray tracing. ;)

Ok, I think that ends my fascination with REYES. It looks and sounds so much simpler and cleaner, while bump maps and such are like a hack, but I get the overall idea. And you encounter the same problems anyway.
 
I'm not sure what's that supposed to mean :) The problem is that z-buffer compression is dependent on the idea that you only have X planes (triangles) per YxY tile.
Uttar

There are z compression algorithms that are not based on planes.
 
There are z compression algorithms that are not based on planes.
Okay, you may feel free to school me on that stuff one of these days! :D Or give me a few links, either here or on IRC, of course.

DiGuru: Don't worry, *my* fascination with REYES hasn't ended yet, hehe. I don't think it makes sense (except perhaps hackily for things like terrains in certain cases) unless raytracing takes over though, but I'm sure it eventually will. Think about it this way: If Moore's Law keeps going and clocks also scale as you would expect them to, we'll have more than 100 times today's power in about 12 years.

If resolutions and refresh rate don't increase much, what do you do with that power? You can't increase the polygon count 1000 times via rasterization, and it's unlikely the perceptual differences between shaders that are 2K instructions long and 10K instructions long will be that dramatic. So you're pretty much forced to change the paradigm. If you think about it, the way the paradigm will change will be very progressive, and that's because rasterization is faster by definition in many (but not all) cases.

If you want really good reflection and refraction, you won't get it without raytracing though. If you think about it, the natural evolution of these things is that you'll first have the primary rays computed via rasterization, and a feedback mechanism for secondary rays. So opaque rendering would be done possibly in a deferred rendering pass, while secondary rays would be done through raytracing. How soon is that going to happen? I don't know. How soon is it going to become massively used in games? I don't know. I'd (aggressively?) predict ~2011 for widespread usage (read: several major engines using it), though. If done intelligently, this could already make it possible to reduce polygon counts a fair bit.

REYES would be the logical next step, but obviously it's more expensive, so it'll come gradually, IMO. And I'm sure we'll have several more hacked-up (but possibly incredibly smart!) approaches before we move away from traditional rendering schemes. Obviously, things aren't there yet, and won't be for some time, because (among other things), there is an obvious need to support pre-DX10 architectures. I wouldn't be expecting DX10-exclusive titles until 2008 or so, except perhaps what some might not-so-nicely label as "indie gimmicks".


Uttar
 
I'm a fan of stochaistic non-viewport-dependent global illumination ray sampling personally, with multi-area non-time-based recomposition. Oh, and I also like non-linear multi-way approximative rasterization techniques for estimating soft shadows. But you don't see me blindly quoting any predictions from non-existent papers on these subjects, do you? :) j/k

(sorry, I'm just poking some fun at the fact you're contradicting an entire discussion based on a few paragraphs from a paper I had never read before - while the paper definitely is interesting in a few ways, your post kinda sounds as if anything not predicted in that paper is irrelevant, which is probably not the way you intended it, but I still read it that way...)

Anyway, more seriously, that paper is nice, and definitely gives some good ideas and insight. I disagree with part of it though, and I think some of it just doesn't make sense for the 2010 timeframe at all, IMO. Furthermore, there is one BIG thing I don't like with it. It assumes raytracing is better by definition, and yet it argues that REYES and true global illumination won't be used. So, errr, why would I even want raytracing on *everything* again?

Much of the time, I do not personally believe raytracing is going to give you an advantage in that timeframe, as I detailed in my above post. It makes a lot more sense to use a feedback mechanism IMO; most pixels do not need and do not benefit from raytracing compared to rasterization. Furthermore, it is easy for a rasterization process to determine which pixels in your scene need raytracing and write that, possibly in a MRT, or ideally via a more efficient mechanism. If you're doing deferred rendering anyway, you need a full-screen pass, so triggering reflection/refraction raycasting midly efficiently shouldn't be too hard.

I think the way GPUs are evolving is perfect for this transition, and that if companies like Intel hope to be able to get any of these computations "back", they're kidding themselves. The gradual transition scheme will guarantee GPU vendors keep that shit in their turf. BTW, try comparing some of these requirements listed in the paper's conclusion and some of the things CUDA bring to the table ;) Some of it is in, some of it isn't. You'd expect the remaining non-implemented ideas will be added eventually in next-generation architectures, if they make sense. Once again, it's a gradual process, and nobody is going to shift away from rasterization for all purposes overnight.


Uttar
 
I think you already know about it:

Efficient Depth Buffer Compression pdf
› Jon Hasselgren (Lund University)
› Tomas Akenine-Möller (Lund University)

paper

Akenine-Möller's group presented at least three really good papers this year at Siggraph/Graphics Hardware. He even has some really interesting slides and notes about graphics hardware on his web page. That without even mentioning The Book. I expect even more fun stuff next year :).

Okay, you may feel free to school me on that stuff one of these days! :D Or give me a few links, either here or on IRC, of course.
Uttar
 
Back
Top