The problem I suspect here is that for every single frame, the engine determines a set of tiles that the camera can see. This has a relatively fixed memory cost, and the tiles have to be constantly replaced as you move around, by streaming in and decompressing/recompressing more tiles from the optical disc.
There's probably some caching involved as well, to reuse tiles around you, but seeing how the textures are completely unique it's probably only good to cover the case of the player suddenly doing a 180 degree turn; and maybe read ahead in the most possible movement direction.
Even if the players are just a few meters apart, it could already change 60-80% of the required tiles, maybe even more. The biggest memory hogs are probably the highres tiles right around the player, and split screen would have to duplicate this part. There's not enough memory, optical disc bandwidth, or transcoding processor power to support this on a single system.
Even if they'd stop using the highest MIP level for all tiles to conserve RAM, it'd still require two independent streaming processes that'd mess up the optical drive, jumping all around the disc to read different pieces of data.
Also, the streaming and transcoding parts are almost completely independent of the actual frame rate, they are done in the background anyway and are more closely related to the movement speed. The entire system can be transplanted into Doom4 easily, and the framerate cut to 30fps will only effect the lighting and shading parts of the engine (if they follow in the footsteps of Doom3, it'll pretty much require dynamic lighting). So cutting the framerate wouldn't help at all, either.