I think the NVidia presentation is not arguing what is needed for CG quality, but are arguing what sample resolution is needed to match up with the human visual system's limits to be close to from reality. This is a tractable problem that can be readily calculated in a power point.
On the flip side, trying to calculate the amount of power needed for each sample's lighting to approach reality (let alone physics of movement), is not easy to calculate. That is, we don't have a "theory of The Matrix" yet on which to base solid calculations.
Now, as I have argued in the past, I can readily discern the "real" from the "rendered" on even NTSC video, so sample resolution alone is not the primary component we use to detect fake imagery. Lighting is much more important (e.g. better pixels)
On the other hand, if you hooked up two NTSC resolution displays to my eyes via a VR headset, tied into a camera attached to a robot head (telepresence) that moved as I did, I still would not be fooled. I think it's clear that in order to give me the same experience I get in the real world, I need a wide FOV, stereoscopic display, high foveal density, and lack of spatial aliasing, coupled with low latency (if I move my head, there better not be a perceived lag in the screen update)
The Nvidia powerpoint is more or less like a Feynman-style nanotech proclaimation: He's saying "look how much farther we have to go on resolution alone" and "if we use brute force, and assume exponential growth in circuit density, here's how many years it would take to achieve it"
I wouldn't take it as any more than an Nvidia employee waxing philosophical about what can be done in 10 years based on certain assumptions. No different than other manifestos on how much room is left in traditional semiconductors, or wishlists for quantum computing, or RSFQ.
As for why they didn't include any estimates for what's needed to do physically correct illumination for each pixel, it's a much harder and debatable problem (just what is the "correct" lighting equation?) And you can take this further: Even with correct lighting, and sample resolution/frame rate, humans won't be fooled by inaccurate physical simulation. Just look at the Matrix Reloaded Agent-Car-Jump-Highway-Sequence, which despite a year of postproduction and probably lots of tweaking, stilled looked "fake"
A 36,000X gap of improvement is just the lowerbound needed. This presentation really doesn't say anything about what future plans the company has. It's just an observation of how much further we have to go, and what is possible within the bounds of Moore's-like growth.