R520 to have 300 Million Transistors??

DegustatoR said:
Ailuros said:
I wouldn't exclude myself a possible increase in quads; I just don't have the slightest idea where the bandwidth will come from to feed the resulting fill-rates (more or less 6 GPixels/s trilinear fill-rate).
Almost every shader takes more than one clock today. Tomorrow this trend will be even more visible. So you don't need huge bandwidth since you'll be doing shader math inside of the chip most of the time. And that's what's really important for future titles -- really fast complex shader execution, not just pure singletexturing fillrate...

If you have a let's say 50% increase in single-texturing fill-rates, that persentage won't change either with multi-texturing.

Any trilinear fill-rate inefficiencies will be most likely covered/hidden by tons of filtering related optimisations, so yes in that regard my point is rather moot.

For the record bandwidth won't remain idle nor will it decrease in future accelerators. Today the NV40 seems to have a very efficient fill-rate to bandwidth efficiency; R420 to a lesser degree. I wonder if a X800PE today equipped with let's say ~700MHz GDDR3 wouldn't see a single benefit.
 
In certian fill operations R420 is likely to scale better than NV40 with higher bandwidth relative to its core because of its high blending fill-rate.
 
DaveBaumann said:
In certian fill operations R420 is likely to scale better than NV40 with higher bandwidth relative to its core because of its high blending fill-rate.

A rough estimate how much performance increase more or less on average that could mean?

OT: there's a PM for you about something else, in case you haven't seen it.
 
Sxotty said:
I have a quick question for all you people that actually know something about engineering.

Could/ will they ever start making smaller chips like 8 pipes that they can stack into a socket. Everyone says that the reason they don't make multi GPU cards is the the PCB gets complex, but would it be possilbe in the future to make an expandable socket. The way I am thinking is that GPU's are not interchangeable like CPU's so they could make the socket much smaller and more fragile, so that under the heatsink would be 1-4 cores all nestled up snug with the packaging not bigger than the actual cores..,. the reason of course would be to reduce the %failed chips and also they wouldn't have to use them for 9500pro's x800, 6800's and so forth but could instead just meet whatever demand there was ...


Two main issues....

The packaging cost would increase in a non-negligable way. Basically you would have to go to a much more advanced substrate design.

You would have to integrate a method for the chips to talk to each other. This will add design complexity and cost.

And you will take a significant hit in yields for the packaged parts. With 4 dies per package, you now have at least a 4x higher chance that the packaged part won't work. For something like a high end MCM that sells in the thousands to the tens of thousands, this isn't an issue, but for a part that has to sell for less than a hundred it is.

Aaron Spink
speaking for myself inc.
 
aaronspink said:
Sxotty said:


Two main issues....

The packaging cost would increase in a non-negligable way. Basically you would have to go to a much more advanced substrate design.

You would have to integrate a method for the chips to talk to each other. This will add design complexity and cost.

These I realized (also I thought that the thermal expansion might not be identical so they could crack).

And you will take a significant hit in yields for the packaged parts. With 4 dies per package, you now have at least a 4x higher chance that the packaged part won't work. For something like a high end MCM that sells in the thousands to the tens of thousands, this isn't an issue, but for a part that has to sell for less than a hundred it is.

This is exaclty the opposite effect of what I thought. Now I am confused, can't they test them before they put the 4 dies on the card? The whole reason I was suggesting this was to increase yields b/c die would be smaller and then they can get fully working cores and put them together. In other words I thought you would have a 4x less chance of a defective part since each die is 4x smaller. I might be completely of base, that is why I am asking.

I had thought that as the complexity of the die continues going up the cost benifit of having a more complex socket/package, could be outweighed by having more % working dies.

I am thinking of basically a multiGPU card with the gpus individual packages being shrunk to the point that they all fit under one heatsink and to the casual user would look like one GPU. For example like this
ex.jpg
 
hmmm, interesting idea, this maybe would help out in the cooling as well since the hot spots so to speak would be spread out on the heat sink. Maybe a variation where the chips are plugged into a memory controller or something which has an internal cache. Once something becomes so complex breaking it down into components which are replaceable maybe the better method. Hate to see a 300-400 million transitor design fail due to a few flawed transitors in the wrong place. Also to note it seems ATI and Nvidia is having a hard time getting their current generation out in numbers. With the same method how would the next generation fare?
 
Precisely the number of transistors is getting ridculous and the approach ATI started and nvidia is following of locking broken pipes will only get you so far.

Plus look at that picture and doesn't make you feel warm and fuzzy 144 pipelines :p

Oh and if you have seen the new AMD chip that is like 1.6watts or something running at 1.2ghz I was thinking along those lines, slower GPU's but more pipes b/c it seems that the heat decrease is not linear with speed i.e. 1/2 as fast may be 1/10 the heat. Of course I know it will be expensive, just thinking it may be the way to go in the end.
 
The only problems I can see are packaging costs (which will make high-end products even more expensive) and routing (how are you going to route traces to the interior chip(s)?).
 
Oh chanloth I agree, I am simply wondering if at some point the packaging cost increase will be less than the increase in cost for trying to fab a 400-500million transistor chip. I mean there has to be a point where it becomes cheaper to try something like this. That is all I am saying. I thought maybe some engineer type here would go "Yes we think that when you hit the 473million transitor mark pursuing a strategy like this will be more economical" :)
 
Well, the issue is that the costs involved in creating a complex chip go down over time with refined processing. That is, as you get better and better at making such things, you succeed more, and thus it becomes cheaper. By contrast, the cost of a multi-chip board like the above would be more fixed by nature (cost would still go down over time, just not by as much).

I think this is the primary thing that current GPU manufacturers have been banking on. In the short term it may indeed be cheaper to produce a, say, 4-chip NV44 board (assuming NV44 would be, for example, a 4-pipeline NV4x part) than it would be to produce one NV40 board, but as the cost to develop the NV40 chip itself decreases over time through refined engineering, it becomes cheaper to build the single NV40 board. Besides, you don't have to worry about the inefficiencies of distributed processing (it's nontrivial to allow each chip to access external memory optimally).

That said, before long it will be necessary to go for multi-chip boards in order to continue to continue improving performance. We'll just be hitting hard limits in clock speed, transistor densities, and heat. But more chips will still increase performance, and can serve to ease those burdens.
 
The routing of a bunch of chips could be done but you'd need an extremely think PCB in order to route the traces. On top of that noise/interference would be insane trying to allow for all those traces.


It might be nice however if there was an empty socked on the mobo where you would stick just a GPU that you went out and bought. It could have it's own little memory slot as well. That way you could upgrade either your GPU for your video ram seperately. It would be a little harder to design for but more parts could be reused. Signaling would be the only concern that I can think of. On top of that you'd have clearence for a giant copper block to cool the damn things. Get a mobo that comes with dual DVI outputs etc and all you'd have to upgrade is the GPU or video ram. That way if you wanted you could have a slow core and crapload of ram. Just an idea however.

Heck you could even give it a hypertransport link to one of the CPUs to do some additional processing for it. That would be more than fast enough. And with dual cores coming out it could be possible to throw the system into a "single cpu" setup and use the second "cpu/core" for nothing but video processing at certain times. I'm not sure what all it would be good for but i'm sure there's something the CPU is more efficient at doing than a GPU.
 
Dave,

What are you doing here now? piece of advise from my field of expertise, turn off the PC, turn on wife! Honeymoon now!
 
g__day said:
Dave,

What are you doing here now? piece of advise from my field of expertise, turn off the PC, turn on wife! Honeymoon now!
what the heck. his comment/post disappeared.

epic
 
Back
Top