Why wouldn't it be ideal for mobile? Are there really that many interdependencies between tiles for a rendering engine? Are there any at all? If not, then I don't really get what a cascading fetch with a decently sized SRAM pool couldn't do to hide latency.
I'm not sure what pixel formats are used in PVR, but let's say RGBA with 4 bytes/pixel. So 400 bytes for a 100 pixel tile. Let's say we have a 300 cycle wait between fetch and first-byte memory return (rather high, especially at GPU clocks, a few orders of magnitude too high, but let's say we're talking CPU clocks here).
So to effectively hide access-to-first-byte latency, you'd need a 120,000 byte or roughly a 128KB SRAM pool. That doesn't seem too far-fetched.