Rv770/Rv790/rv740 and beyond - linear frequency scaling FTW? (or why AMD wins and your electric bill loses)
After looking at the recent leaks of 4890, it all seemed to fall in line with what rv770 could do if given the power and a process capable. I broke down the math, and it falls in line. If you think this is crazy, I encourage you to read the rational for why this is important a few paragraphs down under the italic header.
Percentage of mhz increase vs. freq/tdp if linear:
690-700mhz = 110W (4850)
120%= 828-840mhz= 160W (4870)
140% = 966-980mhz = 210W
160%=1104-1120mhz = 260W
mhz vs % increase/wattage if linear:
850mhz = 122-124%- 160-162W
1000mhz = 143-145% = 217-222W
wattage versus % if linear:
225W = 146% = 1007-1022mhz
300W = 176% = 1214-1232mhz
The first and second numbers include the tdp of 4850/4870 and avg max clockspeed at their set voltage, they do fall in line, and are 'knowns'. 4850 is used as a baseline. Also, linear scaling shows how 1000mhz at under 225W (2x6-pin) is possible (as well as 'coincidentally' 4890's max oc slider in CCC), as well as a possible tdp @ 850mhz. Rough rule of thumb would be frequency percentage increase over 700/700, and taking that number when .2=50W. Add that to 110W, and you'd have a TDP estimation.
ex: 1000/700 = 1.428... .428... when .2 = 50W = 2.14. 2.14x50 = 107. 107+110= 217W.
I'm way too tired to try to figure out the theoretical algebraic formula.
IMO, a GTX285 at stock would require a rv790 @ ~1100mhz. Such is the world with coincidences: The original GTX280 has a 236W TDP, a 1050mhz rv790 theoretically would have a tdp of 235W. On the flip side, at GTX285's TDP, you'd end up with a rv790 @ 910mhz, which would compete well with the original 280.
In other words:
GTX280 TDP ~= rv790 tdp @ GTX285 performance
GTX285 TDP ~= rv790 tdp @ GTX280 performance
Is this coincidence a prelude to the future?
Bye-bye performance-per-watt, hello performance-per-mm2 (or How Everything Old is New Again)
This makes sense for several reasons. Obviously, small die sizes help AMD with profit. With such a scalable (freq) architecture, the trickle down philosophy makes sense, as parts with less shaders can inevitably be clocked higher to compensate against former higher generation parts they are replacing as the newer chip is built on a smaller process, or even if it's not...Just give it more power. For instance, 800mhz and 950mhz rv740 parts replacing rv770, after the initial launch at lower frequencies to replace rv730. I imagine this leads to smaller architectural changes and greater frequency hikes in the future, even at the high end. Could rv870,for instance, be around 1ghz, and it's 32nm counterpart 1200mhz? With as few as 1280 shaders this could be formidable to 384sp nvidia counterpart on 40nm and 32nm, as smaller die sizes could allow for greater obtainable frequencies in a similar power envelope. If the die is kept small but somewhat similar in size, the option to bump frequencies is always there, if competition demands, just like rv790. You have much less performance flexibility with a larger die.
For instance, if nvidia's 40nm performance chip is close to 300mm2,and ATi's is closer to 200mm2, it's not absurd to think that ATi is shooting for a very high clockspeed (1ghz), and nvidia is shooting again for a greater amount of units with a lower clockspeed (700c/1750s and/or 800c/2000s). In such a scenario, ATi would likely have a similar performing product using 2/3 the wafer space. Think of it as an massively overclocked 3870 versus a 9800GTX. Rv870 will have a similar die size to rv670. GT212 will have size similar to G92. 3870 had a TDP of 105W. 9800gtx had a tdp of 156W. What if 3870 would have had a second 6-pin connector, a 225W tdp, and a voltage hike? How would have they competed in that case?
I think we're about to find out.
I find the die size game between nvidia and ATi amusing; medium, larger, huge, back to , and now smaller with a freq hike. It seems to me that ATi is one step ahead in this game, and it may culminate this next round. I imagine they have some crazy formula that the ghost of Noodle whispered to Dave one night sometime around the time of rv670. Something along the lines of "The chips must be proportional in size...205mm2 (rv870), 137mm2 (rv740), 68mm2 (rv810). The ratio of units must be in proportion to power envelopes for frequencies now and in the future in adjacent performance sectors, build the smallest 256-bit you can, the memory controller on trickle-down 128-bit and 64-bit chips will be compensated by trickling down of faster memory, and in doing this, will allow for higher frequencies to compensate on the high end."...etc.
"Oh yeah, and don't forget to wipe."
That's one smart cat, I tell you.
At any rate, I find it interesting that everything old is new again. Back to smaller chips, only now with a new spin. Now it seems the frequency that can be cranked out from smaller dice actually can compete with the architecture additions that that clockspeed has to be balanced against. That's a big deal, and a large fundamental change. ATi sees this coming...Does Nvidia?
Another question remains: Is the smallest possible 256-bit bus chip that fits the architecture and it's fab processes the ideal candidate to test the theory, or is there a better balance of slightly larger dice still able to obtain high speeds in a similar power envelope?
If all inventive competitions are based around building a better mousetrap, does the tdp/architecture balance now heavily tilt towards adding frequency to that equation with a weight much higher than before?
I think so.
After looking at the recent leaks of 4890, it all seemed to fall in line with what rv770 could do if given the power and a process capable. I broke down the math, and it falls in line. If you think this is crazy, I encourage you to read the rational for why this is important a few paragraphs down under the italic header.
Percentage of mhz increase vs. freq/tdp if linear:
690-700mhz = 110W (4850)
120%= 828-840mhz= 160W (4870)
140% = 966-980mhz = 210W
160%=1104-1120mhz = 260W
mhz vs % increase/wattage if linear:
850mhz = 122-124%- 160-162W
1000mhz = 143-145% = 217-222W
wattage versus % if linear:
225W = 146% = 1007-1022mhz
300W = 176% = 1214-1232mhz
The first and second numbers include the tdp of 4850/4870 and avg max clockspeed at their set voltage, they do fall in line, and are 'knowns'. 4850 is used as a baseline. Also, linear scaling shows how 1000mhz at under 225W (2x6-pin) is possible (as well as 'coincidentally' 4890's max oc slider in CCC), as well as a possible tdp @ 850mhz. Rough rule of thumb would be frequency percentage increase over 700/700, and taking that number when .2=50W. Add that to 110W, and you'd have a TDP estimation.
ex: 1000/700 = 1.428... .428... when .2 = 50W = 2.14. 2.14x50 = 107. 107+110= 217W.
I'm way too tired to try to figure out the theoretical algebraic formula.
IMO, a GTX285 at stock would require a rv790 @ ~1100mhz. Such is the world with coincidences: The original GTX280 has a 236W TDP, a 1050mhz rv790 theoretically would have a tdp of 235W. On the flip side, at GTX285's TDP, you'd end up with a rv790 @ 910mhz, which would compete well with the original 280.
In other words:
GTX280 TDP ~= rv790 tdp @ GTX285 performance
GTX285 TDP ~= rv790 tdp @ GTX280 performance
Is this coincidence a prelude to the future?
Bye-bye performance-per-watt, hello performance-per-mm2 (or How Everything Old is New Again)
This makes sense for several reasons. Obviously, small die sizes help AMD with profit. With such a scalable (freq) architecture, the trickle down philosophy makes sense, as parts with less shaders can inevitably be clocked higher to compensate against former higher generation parts they are replacing as the newer chip is built on a smaller process, or even if it's not...Just give it more power. For instance, 800mhz and 950mhz rv740 parts replacing rv770, after the initial launch at lower frequencies to replace rv730. I imagine this leads to smaller architectural changes and greater frequency hikes in the future, even at the high end. Could rv870,for instance, be around 1ghz, and it's 32nm counterpart 1200mhz? With as few as 1280 shaders this could be formidable to 384sp nvidia counterpart on 40nm and 32nm, as smaller die sizes could allow for greater obtainable frequencies in a similar power envelope. If the die is kept small but somewhat similar in size, the option to bump frequencies is always there, if competition demands, just like rv790. You have much less performance flexibility with a larger die.
For instance, if nvidia's 40nm performance chip is close to 300mm2,and ATi's is closer to 200mm2, it's not absurd to think that ATi is shooting for a very high clockspeed (1ghz), and nvidia is shooting again for a greater amount of units with a lower clockspeed (700c/1750s and/or 800c/2000s). In such a scenario, ATi would likely have a similar performing product using 2/3 the wafer space. Think of it as an massively overclocked 3870 versus a 9800GTX. Rv870 will have a similar die size to rv670. GT212 will have size similar to G92. 3870 had a TDP of 105W. 9800gtx had a tdp of 156W. What if 3870 would have had a second 6-pin connector, a 225W tdp, and a voltage hike? How would have they competed in that case?
I think we're about to find out.
I find the die size game between nvidia and ATi amusing; medium, larger, huge, back to , and now smaller with a freq hike. It seems to me that ATi is one step ahead in this game, and it may culminate this next round. I imagine they have some crazy formula that the ghost of Noodle whispered to Dave one night sometime around the time of rv670. Something along the lines of "The chips must be proportional in size...205mm2 (rv870), 137mm2 (rv740), 68mm2 (rv810). The ratio of units must be in proportion to power envelopes for frequencies now and in the future in adjacent performance sectors, build the smallest 256-bit you can, the memory controller on trickle-down 128-bit and 64-bit chips will be compensated by trickling down of faster memory, and in doing this, will allow for higher frequencies to compensate on the high end."...etc.
"Oh yeah, and don't forget to wipe."
That's one smart cat, I tell you.
At any rate, I find it interesting that everything old is new again. Back to smaller chips, only now with a new spin. Now it seems the frequency that can be cranked out from smaller dice actually can compete with the architecture additions that that clockspeed has to be balanced against. That's a big deal, and a large fundamental change. ATi sees this coming...Does Nvidia?
Another question remains: Is the smallest possible 256-bit bus chip that fits the architecture and it's fab processes the ideal candidate to test the theory, or is there a better balance of slightly larger dice still able to obtain high speeds in a similar power envelope?
If all inventive competitions are based around building a better mousetrap, does the tdp/architecture balance now heavily tilt towards adding frequency to that equation with a weight much higher than before?
I think so.