Speculative execution where many outcomes are processed and sent as frames where only one will be used is extremely wasteful of compute resources by definition. And while there is 150ms delay in many games that time is largely used to generate the frame. Unless we expect a turn around time of sub 10ms for cloud servers the proximity will always be the limiting factor.
You don't need speculative execution and many outcomes to be processed. That would be terrible. You render remotely, providing cues such as motion vectors, object type ID etc, then use what's provided to the client to generate the displayed image based on the things you've calculated locally e.g camera and object position.
If you know you're using the data for this purpose you generate the data in the cloud accordingly.
A crude implementation might be providing a "level" buffer generated from a viewport to accommodate camera movement (highly predictable and small increments), and game objects with motion vectors. With the data you calculate locally you can adjust what the viewport shows, and adjust the position of objects. Simple 2D moving and warping stuff. Positions, collision, game logic all feel instantaneous just like they normally do.
It's the grunt work of most of the rendering pipeline that's remote. Any latency sensitive logic is local, and you do the best job you can in manipulating the graphic data you have from the cloud to make it fit what's happening in the instant.
A more involved solution would also be to prioritise streaming latency critical stuff (e.g. gunshot effects once the event is triggered locally) over stuff that could comfortably handle a couple of frames of "make do" re-manipulation (e.g. distant backgrounds during camera panning).
You prioritise what you stream based on the urgency of updating due to latency (triggered event) or image degradation due to growing inaccuracy.
This is an exciting area. For some game types e.g. Halo online, WoW, etc, it could allow gameplay results identical to current implementations, with similar graphical quality, from smaller, cheaper, vastly more power efficient client devices.