I wrote a compute shader implementation of surface net algorithm. Iterative refinement of vertex positions (move along SDF gradient). 2 passes: First generates vertices (shared between connected faces) and second generates faces. IIRC it takes around 0.02ms (high end PC GPU) to generate a mesh for a 64x64x64 SDF (output = around 10K triangles).
We're using an octree to store our SDF - we need the ability to have fine details in some areas, but using a fine-enough regular grid would take way too much memory. We're doing our meshing incrementally on the CPU rather than on the GPU at the moment and only uploading the differences. I have a GPU implementation of Dual Marching Cubes which is looking promising: last time I measured it was taking around 100 milliseconds to fully remesh an octree with about 1.4 million nodes on a GTX 1080. That's obviously still too slow for now, but I'm hoping that adding incremental remeshing support will make a big difference. There should also be some gains from improving the way I Iook up neighbouring nodes, so I'm cautiously optimistic.
But we don't render triangles. Our renderer ray traces SDFs directly. On high end PC, the primary ray tracing pass (1080p) is only taking 0.3 ms. Secondary rays (soft shadows, AO, etc) take (obviously) longer time to render. 60 fps is definitely possible on current gen consoles with a SDF ray tracer.
I'd love to get raytracing working for our SDF - it'd eliminate the entire cost of remeshing! - but so far I haven't been able to make it fast enough. The main bottleneck is locating nodes in the octree. Our octree is often 13 or 14 levels deep (more in some cases), so every sample we take along a ray ends up requiring at least that many texture accesses. I'm sure my implementation was far from optimal but I ran out of time to optimise it. Sticking to the octree will probably limit how much performance we can get, but it would be difficult for us to change this now. One thing I'd like to try, if I ever get the time, is to cache the 6 neighbour indexes for each node; could use that to avoid most of the top-down tree traversals, but populating the cache in itself might be costly.
I recently saw a preprint of an Eurographics paper titled "GPU Ray Tracing Using Irregular Grids", by Arsène Pérard-Gayot, Javor Kalojanov & Philipp Slusallek (not allowed to post links here yet, sorry). I'd like to try out their approach as well, at some point.