1408x792 itself is pretty arbitrary. It's just +10% in each dimension versus 720p. At the very least, it's divisible by 8, which should come in handy for down-sampled intermediate buffers. It's not great for scaling to either 1080p or 720p.
Beyond that, it's hard to say what other buffers they keep around simultaneously. Post-process buffers can reuse some memory spaces, depending. A 2k x 2k shadowmap is 16MB already. 8MB if they choose 16-bit. Then add more for cascades or other shadow filtering methods. Keep the shadows that need to be updated every frame in ESRAM, the slower update shadowmaps outside...
Every 32-bpp target would simply be ~4.25MiB, which doesn't really neatly "fit" the 32MB ESRAM in any multiple combination (including 64-bit formats). Of course, it remains to be seen how spilling over into DDR3 (partial rendertargets) affects things.
If they were going to go for a framebuffer that neatly filled 16MB or 32MB, 1536*888 (anamorphic 16:9 scaling) would be very close for 12/24-byte per pixel. Obviously, whether they should fill the entire ESRAM for just the framebuffer at a given time depends on their needs.