The problem with that is compression has the two final images and compares the delta, whereas a delta renderer would need ot predict how much something is going to change and render the changes accordingly. That's akin to building a camera that records only the parts of a scene that are going to change in the next frame.
A rotation on the spot in an FPS is 100% pixel change - how do you determine which pixels will be sufficiently unchanged that they can just be translated instead of rerendered?
Another solution is perhaps to go with partial renders of every frame, rendering some surfaces anew and others reusing old frame data? SO one frame, render the left rock, and the next frame, shift the left rock according to the camera change and update the right frame. I imagine that might look pretty wonky! But at higher framerates, it might work and give the impression of a faster framerate overall.
I also wondered about rendering interlaced, seeing CON on PS2 again recently. That'd halve the pixel workload every frame, so you could render the game 60 fps, rendering a half-res field one frame and stretching it to full field, and then render a single pixel offset for the next field and stretch that. The TV would receive a 60fps progressive image, but the image would be interlaced in having a one pixel offset every other frame. There'd be a 30 hz vertical shimmer. Of course, that wouldn't help with vertex savings, but it sound like it could work to me.
The main advantage with the current brute force systems is they are straight forward and easily scalable. Developing clever delta renderers would be a hell of a lot more work and probably not very portable between game types, as the types of changes could be very different from game to game.