REYES for terrain is easy. Just raytrace it in the pixel shader with a special-purpose hierarchical scheme
The speed could be fairly alright, I think. And yes, if it's per-pixel level, it fits the definition of REYES.
Another possibility is using the GS indeed, and tesselate quads all the way down to >1 pixel big (it's not "true" REYES, but pretty damn close). There's a very efficient way to do this, but the one proposed in the thread is quite horribly bad, of course. In fact, my scheme calculates the screenspace triangle size in the GS and conditionally tesselates based on that... I was going to submit it to GPU Gems 3, but given I don't have a good idea of whether it'll performance like shit or not, and that plain PS raytracing would likely be faster anyway, and that I haven't implemented much yet... heh.
That GS-based scheme actually has another use: you can do it to do terrain tesselation so that all your triangles are, say, ~64 pixels big, in order to maximize overall chip efficiency while managing excellent image quality. It's hard to say if that's truly doable or not, because it's obviously useless if it's slower than a most native approach because the tesselation would take too long.
Anyway the problem of REYES on modern GPUs isn't that. The problem is compression. You really should expect compression (and MSAA efficiency!) to go right out of the window with triangles that you tesselate down to ~1 pixel. And the raytracing approach would write depth directly, which would also mean bye-bye to depth compression.
So if what you want to do is a REYES pass for terrain and traditional rendering for the rest, imagine if you did it before rendering everything. All your pixels begin by no longer being compressible; while they might become compressible once they're overwritten by other things, you get at least 1 pass, and most likely more, without good compression efficiency, and possibly heavy bandwidth bottlenecks.
Now, if you do the REYES terrain pass after everything else, you don't have terrain-related early-z, and tons of things need to be shaded twice. Obviously, unless your shaders are laughably simple, that's not a good compromise. The solution I imagined is quite simple, but it has at least two problems of its own. Basically, I planned to render a conservative approximation of the terrain in the z-pass, and then do the REYES after normal color rendering.
The first problem with that is that there is no such thing as a perfectly "conservative" approximation of a heightmap. The naive way to do that is to minimize height, and that works most of the time, but not if the player can go underground, such as in caves. Of course, that has its own set of problems with heightmaps, but you can hack away at it and get something reasonable going; here, you're simply NOT going to get a good conservative approximation if you need your camera to be able to go "under" the heightmap, though.
The second problem is that some post-processes, such as depth of field, would still need to access the post-REYES depth buffer, and the efficiency there won't be any better. On the plus side of things, you're most likely limited by other things there, but there's no guarantee that's the case either.
As for REYES on generic models... There's another problem I didn't mention, because it's not too catastrophic for heightmaps. Z-Culling works on triangles, not pixels. The majority of the culling process happens after rasterization. So extending this scheme to everything generic would fundamentally limit you by triangle setup up to a stupid scheme, because you need to setup every single triangle in the viewport, not just those that might be visible!
So you come back to raytracing for that. And in the end, if you're doing raytracing, you've got no reason to use DirectX/OpenGL IMO; you should just go straight to CUDA and CTM. I'm sure you can get something midly efficient going on modern GPUs, and I wouldn't be surprised if some people already managed to do that; just don't expect mind-blowing visuals at mind-blowing performance, because all your performance will already be used at performing the primary raytraces, which really won't buy you anything (but minuscule polygons nobody will even notice). In the end, the result will look like shit, but hey, who cares? It's REYES!
Uttar
EDIT: You mention a "vertex" limit in your thread title... yes and no. The G80's VS threads can cover all of the ALU latency, but if you add in texture fetches, you'll sometimes be limited by the number of VS threads it can manage at the same time, I think. The same is even more true for GS, but there you might not even have enough threads to cover your ALU latency at all, depending on your output format; there's a good post from Bob on the subject.