I'm sure it's technically possible. More thinking about commercial practicality, at least in the next 2 or 3 years or so. It took quite a while to go from 256 to 512 bits and even that has not proven to be all that necessary. With GDDR5 bandwidth double from what it is now, I don't think GPU designers are going to try to push for larger than 512 soon.
Overall I agree - but from the point of view of what GDDR5 brings.
It'll be interesting to see if R7xx chips each have a 64-bit or 128-bit memory bus. Chip-to-chip data transfer may turn out to be more important - i.e. the priority may lie in dedicating more die area to moving data between chips. The ring bus effectively posits this, being 2x the bandwidth of the entire VRAM bus.
I think we'll see stripped down stuff for a long time to come. GDDR5@64-bits: 40GB/s? Way faster than anything I've had!
Yeah, we may also be at 32nm or beyond before we see GDDR5 deployed on $50 discrete graphics cards (if they even exist).
My opinion on this hasn't changed.
8x32-bit memory channels on one side of a crossbar connecting to 4x L2s, 4x ROPs, 4x SIMDs. I guess you can cascade crossbars. Wouldn't that crossbar be quite a hotspot?
Also, even if that's irrelevant, ATI still wants to go to a fully distributed, virtualised, memory system - a "ring" looks pretty useful when you're trying to connect 2, 3, 4 or more chips.
The notion that you have to spread around stuff around a die for thermal reasons has never registered on my personal radar. Now GPUs have an unusually large power consumption, but even then you'd expect their problems filter through to others as everybody moves up on the technology ramp.
I suspect this is where the 6-monthly re-implementation schedule comes in. To keep that schedule they have to cut corners, one of which is "off the shelf".
Just google "synopsys hot spots". You'll find a bunch of stuff about lithography hot spots, routing hot spots, and power rail hot spots: all fixable with small localized changes.
I could find
1 article that's very relevant. But have a look at this: "For one thing, the engineers will consider reducing the impact of hotspots by attaching the die directly to a high thermal-conductivity heat spreader, such as a copper plate." They are talking here about very low cost packages where heat removal is indeed more of a problem because plastic isn't the best conductor in the world. GPUs have used heat spreaders for years.
I interpret that to mean that GPUs are years ahead in terms of encountering these problems.
Further down the article: "In such cases, it's best to distribute them relatively evenly over the die while still avoiding the corners and/or edges." Maybe the ring bus controllers shouldn't have been (or aren't) located at the edges after all...
Ah, but are the ring bus controllers hotspots? What the ring bus replaces (i.e. some kind of crossbar hierarchy) would have, supposedly, produced a hot-spot constrained GPU.
I still don't see how something pedestrian like a blob with interconnect and a lot of parallel FIFO's (with only 1 working at a time) is of larger concern wrt thermal behavior than a dense core with hundreds of adders and multipliers that are all glitching like crazy and moving data each and every clock cycle. It just doesn't make any sense.
I'm all out of ideas.
I doubt it. It may be for thermal reasons at the system level, but that's something different entirely.
Well here's some recent stuff:
Semiconductor device with integrated heat spreader
THERMAL MANAGEMENT DEVICE FOR MULTIPLE HEAT PRODUCING DEVICES
ATI seems to spend a fair amount of effort on solving heat problems generally. I think the second of those is for stuff you'd find in a mobile device. Then again, maybe these are just land-grab patent applications.
Unless ATI found the holy grail of redundant random logic, I think they're very much complementary. Until then, if you can to chose one, block level redundancy is more efficient.
GPUs are "embarrassingly parallel" - they should be pretty amenable to fine-grained redundancy. Especially on the scale we're seeing now. I think ATI's general model of having one die for one or two SKUs, per performance category (with 4 or 5 categories altogether) drives them towards fine-grained redundancy. Apart from RV560/570, every other "block level redundant" die in recent years has only existed as some kind of end of life part (XL/GTO, X1900GT/1950GT, HD2900GT that kind of thing).
Jawed