I don't know what rpg.314 is referring to either, but I can think of at least one thing that can break both the overdraw removal and the submission order independence of TBDRs quite badly: Virtual Texturing.
Especially if the form introduced by AMD (
http://www.anandtech.com/show/5261/amd-radeon-hd-7970-review/6) and made part of DX11.2 becomes the preferred form: this is a form of virtual texturing where, at the shader level, the texturing operation returns as part of its result whether it hit an unmapped part of the texture, and where the shader itself is supposed to record any such hits in a data structure on the side. Since such a shader necessarily has the ability to write to memory other than just the framebuffer, it is required to run even if it is opaquely-overdrawn later in the same frame, and as such, a TBDR cannot just skip it like it can with more traditional content.
You could use another render target (MRT) and write the missing page ids to the render target (one page per pixel). This way you don't need UAV output. This is 100% comparible with TBDR.
However, using the hardware PRT in a way you describe is very bad for the performance. First, if you try to access an unmapped texture area, you get a full TLB miss (page fault) per pixel = VERY SLOW (actually two faults if trilinear/aniso is used and both mips fail). A single object usually has multiple textures. Now assume you open a door, and suddenly 5 million texture fetches miss the TLB
When you notice that you access an unmapped part of a texture, you must somehow solve the situation RIGHT NOW, or you will present a corrupted image. You could for example have a dynamic loop that tries to load lower resolution mip data. That data might also page fault (STALL). Also a dynamic loop isn't exactly cheap for each texture read, and UAV writes (with atomics) per pixel aren't cheap either (in cases where hundreds of thousands of pixels page fault). A single 256x256 pixel page can cover 64k pixels in the screen. Meaning that even a single missing texture page can cost a lot (64k faults + 64k UAV atomic writes).
Virtual textured games do not access unmapped texture areas. You have R8 map that has one texel per page (256x256 pixels) and that map tells you the most precise mip level that is available on this virtual UV area. First you point sample that 8 bit map (with max filter), then you clamp your mip level to the highest available, and then you read the real texture. This way you never read unmapped areas, and you don't need dynamic loops to solve the data corruption issue.
Also you never want to load data that is hidden (textures of objects/terrain hidden by walls/hills/etc). You want to write the page ids to an additional render target (32 bit id is more than enough), and read that render target when the scene rendering is complete (= only the visible surfaces remain). Many games render the page id buffer at lower resolution (using a predicted camera to hide the loading latency).
Intentionally reading unmapped areas is not a good idea. The error code is mostly useful for debugging purposes (it allows you to output debug instead of crashing the GPU).
It's UAVs that are the problem because MS specified DirectX 11 without tilers in mind - virtual texturing by itself is perfectly fine.
In GLES3.1 or Vulkan, you would simply use an atomic buffer to get an unique index then write to that location without order guarantees. The specification is extremely clear that there are no guarantees that fragments will be executed if it does not ultimately contribute to the framebuffer, irrespective of side effects.
If I understand correctly, DirectX doesn't give you any depth culling ordering guarantees when it does not affect the end result (blending, stencil, etc), unless you add [earlydepthstencil] attribute to your pixel shader entry function. The documentation is a little big vague about this, but it states that the GPU is allowed to perform depth culling either before or after the pixel shader execution if this attribute is not present. I am not sure whether this means that TBDR is allowed to run the pixel shader only for the closest surface. If not, then Microsoft needs to add another attribute that allows the GPU to behave that way. Let's call this [allowdepthrejection]. Problem solved