eg. Low-res voxelised GI with some higher fidelity traced lighting. Also maybe separate AO instead of properly sampling the sky light or GI room light.
It will still look like games.
Voxel GI solutions are very inaccurate because they approximate the environment with a single digit number of (leaky) cones. This gives you bounce light, even color bleeding, but your eyes will spot the cheat. Irradiance is low frequency, but we need good accuracy nevertheless.
AO does not help either - it's a completely artificial effect. It can look similar than GI but it will never look realistic in general. (GI adds light, AO can only remove it) We will use it only as long as we need it to add some detail that GI can't handle.
Area shadows can create very realistic images, but only in cases where the contribution of GI is so tiny we do not spot it.
I'm pretty sure what you miss is GI.
Here are two videos that show the difference in a non Minecraft setting:
(I still claim to have similar quality in 3ms on first gen GCN for those diffuse settings, but only for the first video. I have no test case for a scene so large as in the second one, and i still have infinite work to do.)
Notice those videos use only one bounce most of the time, which is still to dark. (They change this setting with console inputs for short times)
But it looks more realistic than games, almost perfect - or not? (Not sure how subjective this is.)
But we need still more.
Having GI means we know about the environment of the shading point and have all information we need. But then we still need to shade it, which is what PBS handles already very well in real time. But it's also a never ending research task if we think of complex materials (layered, coated, subsurface, etc...).
Finally also the geometry matters a lot. Again CG tends to simplify, to be too perfect, to handle transitions badly (e.g. abrupt transition from grass to a tree, or rocks intersecting each other due to copy pasted instances.)
CG tending to be too sharp is surely a very major point. Kane and Lynch 2 looked super realistic because of using camera alike filters to blur but to sharpen as well. I also know a tiny indy horror game but forgot the name. (They make the screen VHS alike and it looks astonishing real just because of that)
Tone mapping is something only very few companies seem to do right. Uncharted looks vivid and colorful eye candy, Tomb Rainder looks grey and dead. Tone mapping could fix this with little manual work and zero runtime cost. I wonder it is so underutilized in games, but it is probably not the key to realism IMO.