While the eDRAM inclusion could technically limit other things (assuming a total-transistor budget!) I don't believe that including the eDRAM caused any other features to be removed, or shader power to be limited beyond what ATI would have made it regardless.
The choice to go for eDRAM was one that
reduced the overally size of the main GPU die. For one, if there was no eDRAM, there'd be no daughter die and thus the main core would be at least 252-257million transistors. This assumes that the ROPs would function perfectly in that situation. But, the inclusion of the eDRAM, imo, is exactly the same as the new ring bus(/memory controller) in the X1000s (haven't looked to see if RV515/530 get the same advantage as R520).
Looking at
this despite the 7800 having 24 pipes @600MHz compared to X1800XT with 16 pipes @~630MHz, the latter still has a very small lead. And this should be attributable to either the AA or the AF. If it's the AF, then it's probably due simply to the decoupled texture units (in G70, every 16xAF filtered texture fetch would block that pixel pipe's shaders for 16-32 cycles? Or just the first ALU?). But if it's the AA, then it's not because of bandwidth, because the overclocked GTX512 has significantly more BW than the stock Sapphire X1800XT (~20% more!). Which means it's due to the efficiency of the memory accesses between ROPs <-> memory. Since the 8 Xenos ROPs sit right on the daughter die, and with a large amount of BW at that, their memory accesses should be extremely efficient, no? Latency would be low or extremely predictable (what should the typical latency of the eDRAM be?), etc. So, either way, it was a good decision, IMO. But this is a tangent. What would have happened if this didin't exist? For one, what if the memory controller had to increase in size to deal with the read/write/modifies of the ROPs? If they didn't, then the bottleneck of the system might end up framebuffer ops rather than shading or texturing! So, we're looking at 260+ million transistors? And we haven't added any extra shaders! Then, to add another array costs how many million transistors? If they're 1.5m/each ALU, then we have 280+million, and if they're 2.5m/each, then 300+m transistors?
Then, to deal with the extra array, we probably have to make increases elsewhere in the GPU to make sure that it's properly fed (even if games do approach a 1:3 filtered tex: shader op ratio! and then increase it to 1:4!). On the other hand, we could increase filtered texture fetch units. Of course, then, we run into an even larger problem. Now we have, at max, the same RAM on a double-wide bus, probably more than half of which will be consumed by the framebuffer, leaving with the same bandwidth as now or less for 50% or 100% more filtered texture fetch units and the CPU to share. I wouldn't count on it. So, now the unit isn't weighted properly for an increasing shader: texture ratio anyway, it's starving for bandwidth, and it's probably in the 300m transis. ballpark... and all on one single die. If that wasn't already a yield/heat issue for MS, then I don't think we'd have two dies right now. And ROPs, I think, are out of the question. 20-25m of the transistors on the daughter die are the 8 ROPs (is there anything else there?), so we've still increased the size of the main die by 40-50m transistors to 270 or 80 million. And what gain would there be? Do we expect to be dealing with 4Gpixels/s? It handles up to its max of 4xAA w/o a fillrate hit, so.... running at 60fps.... is 720p going to benefit at all from 8Gpixel/s over increased efficiency, bandwidth, etc?
So, that leaves cache. But, is there some reason ATI would be working on a transistor-based budget, as opposed to a transistor
per-die budget? 332-337m transistors + 128bit bus was chosen over fewer transistors (possibly. At least 80million to spread out between ALUs, ROPs, cache, or other logic) and a 256 bit bus. Cache would probably help, but how much does it already have and how much does it need? At 6 transistors a bit, 256KB costs a mere ~12m transistors, which I don't think would be that big a deal to add to Xenos. If it was necessary. It might be costly further down the line, when they combine the dies, but for now, it almost wouldn't make that much a difference as far as yields go (of course, concerning all of this, I'm going off extremely limited knowledge). Also, going back to the old Anand article, in the section addressing the "modeling engine," I think there's mention of ATI wanting to use the vertex cache to feed/store results for some heavy math (I believe the claim was related to raytracing/global illumination, working on HOS, and finally phsyics/GPGPU stuff). Are they going to cut corners on cache if it seems like they're dedicating plenty to this vertex cache?
So, at the end of the day, Bill, I don't think all the doom-and-gloomin' is really necessary. If they didn't have the eDRAM, I honestly don't think that performance would have shot up at all. They would probably increase ROP complexity (in addition to moving it to the main die) and probably the memory controller along with it, and this would have forced doubling the bus. This doesn't seem like something MS would see as favorable (especially for later down the line?), and might have prompted cuts elsewhere in the system as a result. And now the system doesn't have nearly as good framebuffer efficiency/headroom. FP16 HDR would probably be out of the question, along with tons of alpha blending and such. But, eh...
I think that MS gave ATI a pretty clean slate, had them design from the gound up without too many restrictions (mostly along the lines of yield/heat issues and costs later on down the line, as with the bus width)... and besides, this chip is probably going to be the biggest help for their R600 later on down the line, in the form of validation for their architecture and a basis for games actually making use of all those features (through 360->PC ports, which MS seems pretty keen on). I think that rules out the cache as a possible shortcoming, due to how integral it seems to be to their GPGPU/physics stuff and DX10 stuff. And I think they truly believe that the 3:1 ratio for shaders: filtered textures is what they perceive will be what games approach (but don't necessarily exceed... by that much) in the next few years. So I kinda doubt that they'd mess with that ratio by that much.
Hm, I do have a headache, I'm tired, and that's based on a mountain of assumptions, "ifs," bad comparisons, and so on and so forth. I would still stick with shading power not going up, especially considering all the gains that it would lose by ditching the eDRAM daughter die, however. So, let the corrections come.