1) That analysis is correct but would certainly change in view of large amounts tessellation. You would not want to render geometry twice.
2) Not having to shade quads is a win.
My take on deferred rendering (and tile based techniques)...
I personally try to avoid all techniques that require rendering geometry twice, because geometry transform/rasterization is the step that has by the far the most fluctuating running time. Draw call count, vertex/triangle count, quad efficiency, overdraw/fillrate, etc all change radically depending on the rendered scene. Screen pixel count is always the same (720p = 922k pixels). All algorithms you process just once for each pixel in the screen incur a constant cost. That's why I like deferred techniques (= processing after all geometry is rasterized). Constant stable frame rate is the most important thing for games. Worst case performance is what matters in algorithm selection, average performance is meaningless (unless it's guaranteed to amortize over the frame).
I am not a particular fan of LiDR and it's descendants (including Forward+). Depth pre-pass doubles the most fluctuating part of the frame rending (draw calls / geometry processing). It also is a waste of GPU resources. All the 2000+ programmable shader "cores" of modern GPUs are basically idling while the GPU crunches though all the scene draw calls and renders them to Z-buffer (depth testing, filling, triangle setup, etc fixed function work). Memory bandwidth is also underutilized (just vertex fetching and only depth writes, no texture reads or color writes at all). For good GPU utilization you have to have balanced load at every stage of your graphics rendering pipeline. Depth pre-pass isn't balanced at all.
Various displacement mapping techniques will be used more and more in future games, and these make the extra geometry pass even more expensive. DX11 supports vertex tessellation and conservative depth output. Tessellation will promote usage of vertex based displacement mapping techniques, and conservative depth is very useful for pixel based displacement mapping techniques (allows early-z and hi-z to be enabled with pixel shader programmable depth output). A side note: The programmable depth output and pixel discard isn't a good thing for TBDRs (making pixel shader based displacement quite inefficient). Vertex tessellation also adds some extra burden (how bad that is remains to be seen in the future).
Brute force deferred rendering with fat g-buffers isn't either the best choice in the long run. Basically all source textures are compressed (DXT variants, DX11 even adds an HDR format). A forward renderer simply reads each DXT texture once a pixel. A deferred renderer reads the compressed texture, outputs it to a uncompressed rendertarget and later reads the uncompressed texture from the render target. DXT5 is 1 byte per pixel, while uncompressed (8888 or 11f-11f-10f) is 4 bytes per pixel. Forward reads 1 byte per each texture layer used, deferred reads 5 bytes and writes 4 bytes (9x more BW used). This problem isn't yet a big problem, because most games don't have more than two textures per object (8 channels for example can fit: rgb color, xy normal, roughness, specular, opacity). But in the future the materials will become more complex and the g-buffers will become fatter (as we need to store all the texture data to the g-buffer for later stages).
I personally like to keep geometry rendering pass as cheap as possible. Rendering to three or four render targets and reading three or four textures isn't cheap. Overdraw gets very expensive and quad efficiency and texture cache efficiency play a big (unpredictable) role in the performance. It's better just to store the (interpolated) texture coordinates to the g-buffer. This way you get a very fast pixel shader (with no texture cache stalls), quad efficiency and/or overdraw doesn't matter much, full fill rate (no MRTs), low BW requirement, etc. All the heavy lifting is done later, once a pixel, in a compute shader. Compressed textures are read only once, and no uncompressed texture data is written/read from the g-buffers. This kind of system minimizes the fluctuating cost from geometry processing/rasterization and it compares very well to a TBDR in scenes that have high overdraw. IMR still has more overdraw and TBDR, but the overdraw is dirt cheap. (**)
What matters in the future isn't the geometry rasterization performance. Geometry rasterization is only around 10%-20% of the whole frame rendering cost if you use advanced deferred rendering techniques. TBDR/IMR aren't that different if 80%+ of frame rendering time is spend running compute shaders.
(**) The biggest downsize of the technique described above is that the "texture coordinate" (= texture address) must contain enough data to distinguish all the texture pixels that might be visible in the frame (and bilinear combinations of those). Basically with current architectures this means you need a big texture atlas, and you need to store all your textures there. This is not a viable strategy for games that have a gigabyte worth of textures loaded at memory at once. Virtual texturing however only tries to keep texture data in memory that is required to render the current frame. The whole data set fits to a single 8192x8192 atlas (virtual texture page cache). With this kind of single atlas, the "texture coordinate" problem becomes trivial: Just store a 32 bit (normalized int 16-16) texture coordinate to the g-buffer.
Basically, don't rasterize shadow maps immediately after binning is complete. Wait until the next render wants to lookup the texels. Then rasterize just one tile of the shadow map opportunistically and then immediately use it to shade the fragments.
[...]
I don't think this technique can be copied by an IMR.
This technique is very similar to virtual shadow mapping. Virtual shadow mapping works pretty much like virtual texturing, except that you use projected shadow map texture coordinates instead of the mesh texture coordinates. By using depth buffer and shadow map matrix you can calculate all the visible pages. Each page (frustum) is rendered separately (we can of course combine neighborhood pages to single frustums to speed up the processing). Shadow map fetching uses the same indirection texture approach as virtual texturing (cuckoo hashing is also a pretty good fit for GPU). The best thing about this technique is that it renders shadow maps always at correct 1:1 screen resolution. Oversampling/undersampling is much reduced compared to techniques such as cascaded shadow mapping.
Page visibility determination (from depth buffer) of course takes some extra time, but you can combine it with some other full screen pass to minimize it's impact. Rendering several smaller shadow frustums (pages) of course increases draw call count (and vertex overhead), but techniques such as merge-instancing can basically eliminate that problem (single draw call per page / subobject culling for reduced vertex overhead). With some DrawInstancedIndirect/DispatchIndirect trickery that's doable, but dynamic kernel dispatching by other kernels would make things much better (GK110 will be the first GPU to support this).