Rv770/Rv790/rv740 and beyond - linear frequency scaling FTW?

turtle

Regular
Rv770/Rv790/rv740 and beyond - linear frequency scaling FTW? (or why AMD wins and your electric bill loses)


After looking at the recent leaks of 4890, it all seemed to fall in line with what rv770 could do if given the power and a process capable. I broke down the math, and it falls in line. If you think this is crazy, I encourage you to read the rational for why this is important a few paragraphs down under the italic header.

Percentage of mhz increase vs. freq/tdp if linear:

690-700mhz = 110W (4850)
120%= 828-840mhz= 160W (4870)
140% = 966-980mhz = 210W
160%=1104-1120mhz = 260W

mhz vs % increase/wattage if linear:

850mhz = 122-124%- 160-162W
1000mhz = 143-145% = 217-222W

wattage versus % if linear:

225W = 146% = 1007-1022mhz
300W = 176% = 1214-1232mhz

The first and second numbers include the tdp of 4850/4870 and avg max clockspeed at their set voltage, they do fall in line, and are 'knowns'. 4850 is used as a baseline. Also, linear scaling shows how 1000mhz at under 225W (2x6-pin) is possible (as well as 'coincidentally' 4890's max oc slider in CCC), as well as a possible tdp @ 850mhz. Rough rule of thumb would be frequency percentage increase over 700/700, and taking that number when .2=50W. Add that to 110W, and you'd have a TDP estimation.

ex: 1000/700 = 1.428... .428... when .2 = 50W = 2.14. 2.14x50 = 107. 107+110= 217W.

I'm way too tired to try to figure out the theoretical algebraic formula.

IMO, a GTX285 at stock would require a rv790 @ ~1100mhz. Such is the world with coincidences: The original GTX280 has a 236W TDP, a 1050mhz rv790 theoretically would have a tdp of 235W. On the flip side, at GTX285's TDP, you'd end up with a rv790 @ 910mhz, which would compete well with the original 280.

In other words:

GTX280 TDP ~= rv790 tdp @ GTX285 performance
GTX285 TDP ~= rv790 tdp @ GTX280 performance


Is this coincidence a prelude to the future?

Bye-bye performance-per-watt, hello performance-per-mm2 (or How Everything Old is New Again)

This makes sense for several reasons. Obviously, small die sizes help AMD with profit. With such a scalable (freq) architecture, the trickle down philosophy makes sense, as parts with less shaders can inevitably be clocked higher to compensate against former higher generation parts they are replacing as the newer chip is built on a smaller process, or even if it's not...Just give it more power. For instance, 800mhz and 950mhz rv740 parts replacing rv770, after the initial launch at lower frequencies to replace rv730. I imagine this leads to smaller architectural changes and greater frequency hikes in the future, even at the high end. Could rv870,for instance, be around 1ghz, and it's 32nm counterpart 1200mhz? With as few as 1280 shaders this could be formidable to 384sp nvidia counterpart on 40nm and 32nm, as smaller die sizes could allow for greater obtainable frequencies in a similar power envelope. If the die is kept small but somewhat similar in size, the option to bump frequencies is always there, if competition demands, just like rv790. You have much less performance flexibility with a larger die.

For instance, if nvidia's 40nm performance chip is close to 300mm2,and ATi's is closer to 200mm2, it's not absurd to think that ATi is shooting for a very high clockspeed (1ghz), and nvidia is shooting again for a greater amount of units with a lower clockspeed (700c/1750s and/or 800c/2000s). In such a scenario, ATi would likely have a similar performing product using 2/3 the wafer space. Think of it as an massively overclocked 3870 versus a 9800GTX. Rv870 will have a similar die size to rv670. GT212 will have size similar to G92. 3870 had a TDP of 105W. 9800gtx had a tdp of 156W. What if 3870 would have had a second 6-pin connector, a 225W tdp, and a voltage hike? How would have they competed in that case?

I think we're about to find out.

I find the die size game between nvidia and ATi amusing; medium, larger, huge, back to , and now smaller with a freq hike. It seems to me that ATi is one step ahead in this game, and it may culminate this next round. I imagine they have some crazy formula that the ghost of Noodle whispered to Dave one night sometime around the time of rv670. Something along the lines of "The chips must be proportional in size...205mm2 (rv870), 137mm2 (rv740), 68mm2 (rv810). The ratio of units must be in proportion to power envelopes for frequencies now and in the future in adjacent performance sectors, build the smallest 256-bit you can, the memory controller on trickle-down 128-bit and 64-bit chips will be compensated by trickling down of faster memory, and in doing this, will allow for higher frequencies to compensate on the high end."...etc.

"Oh yeah, and don't forget to wipe."

That's one smart cat, I tell you.

At any rate, I find it interesting that everything old is new again. Back to smaller chips, only now with a new spin. Now it seems the frequency that can be cranked out from smaller dice actually can compete with the architecture additions that that clockspeed has to be balanced against. That's a big deal, and a large fundamental change. ATi sees this coming...Does Nvidia?

Another question remains: Is the smallest possible 256-bit bus chip that fits the architecture and it's fab processes the ideal candidate to test the theory, or is there a better balance of slightly larger dice still able to obtain high speeds in a similar power envelope?

If all inventive competitions are based around building a better mousetrap, does the tdp/architecture balance now heavily tilt towards adding frequency to that equation with a weight much higher than before?

I think so.
 
To put this into perspective I'd love to get the IHVs stands on how content they were/are with recent processes' results.

If both say (and honestly mean it!): Everything went smooth as silk, then you'd definitely have a point.
 
Turtle, you didn't consider the RAM?

It would appear to give the 4870 an even better case, but screws around with the percentages.
 
With only two datapoints both subjected to noise, (and a narrow interval,) anything will look linear.
To make a better investigation, simply take a 4870 and downclock and upclock it in fairly close steps. A plug level power meter is sufficient to read power at the different frequencies. Subtract the baseline for clarity of curve shape.

This is a reasonable methodology BUT it misses the most important point - voltage.
If you down clock, you can get away with lower voltages, and enjoy drastically lower power consumption. This is a common path among graphics enthusiasts that also value power conservation and/or silence. If you increase clocks on the other hand, you will need to increase voltage (and cooling) to maintain stability, drastically increasing power draw.

Over the years and under competitive pressure, the PC semiconductor industry have pushed their parts ever higher on the power curve, and closer to the limits of stable operation. The data I've seen show the function of power vs. clock for most parts to be roughly O(n^3) including the voltage effects above.
 
Also, with HD4890X2 unlikely to be "official" (HD4870X2 is pretty much at the limit) it seems AMD's garden is not particularly rosy.

Jawed
 
Also, with HD4890X2 unlikely to be "official" (HD4870X2 is pretty much at the limit) it seems AMD's garden is not particularly rosy.

Jawed

In general, all the vendors are at the limit. Everyone has pretty much reached peak power in each of the graphics market segments.

You can look at it as thus:

Max perf without external power connectors
Max perf with 1 power connector
Max perf with 2 power connectors

Both ATI and Nvidia have hit those points and actually exceeded them in reality though not on purpose it seems.

So going forward unlike the past neither have the easy solution of just throw more watts at the problem. In general this will probably result in either a greater time between release (which some will argue has already happened to some extent and will only get longer) and/or slower overall increases in performance. One can in some regard liken it to the slowdown in performance that affected CPUs once they hit the 100+ W point.
 
It's going to be interesting to see how 40nm changes the landscape. Nvidia didn't gain much in the way of power savings with the transition from 65-55nm. If the transition to 40nm doesn't save much power either we're in for a pretty boring DX11 generation from a performance standpoint.
 
Nvidia didn't gain much in the way of power savings with the transition from 65-55nm.
Hmm...

GTX 280 -> 285, 50W lower TDP while ~10% increased clocks.

Also on G92 brought 55nm and time some nice improvements:
http://www.xbitlabs.com/articles/video/display/palit-gf250gts_6.html#sect0
...GTS 250 with 738/1836MHz (128SPs) has a similar consumption as the 65nm 8800 GT with 600/1500MHz (112SPs):
http://www.xbitlabs.com/articles/video/display/gainward-bliss-8800gt_5.html#sect0
 
It's going to be interesting to see how 40nm changes the landscape. Nvidia didn't gain much in the way of power savings with the transition from 65-55nm. If the transition to 40nm doesn't save much power either we're in for a pretty boring DX11 generation from a performance standpoint.

G80 turned out over 2.5x times faster than a G71 without increasing by that margin power consumption, while being on the same 90nm manufacturing process. Any objections you might have to that example, you may want to consider that GT3x0 has been laid out for 40nm from the get go and include possible architectural changes that would help controlling better power consumption.

I don't think any of the two IHVs would rely exclusively on 40nm for saving their day in that department. Especially since the performance increase targets with each new technology generation is bigger than with anything else.
 
the 55nm GTX 260 did use the same amount of power than the 65nm version but it's because the G200-B2 revision was used (always a B3 revision on the 285 and 295)
 
the 55nm GTX 260 did use the same amount of power than the 65nm version but it's because the G200-B2 revision was used (always a B3 revision on the 285 and 295)

Wrong:
http://www.xbitlabs.com/articles/video/display/evga-geforce-gtx260-216-55nm_5.html#sect0
...Xbit-Labs saw also some nice improvements on 55nm B2 EVGA GTX 260.

I think it was more a problem, that some sites compared 1.12V 55nm cards with 1.06V 65nm GTX 260 (which entered the market in late 08), so they got equal numbers.

B3s strengths are higher clocks (GTX 285, B3-GTX260-OC results) and the possibility to power GTX 295 die @ only 1.05V.
 
So going forward unlike the past neither have the easy solution of just throw more watts at the problem. In general this will probably result in either a greater time between release (which some will argue has already happened to some extent and will only get longer) and/or slower overall increases in performance. One can in some regard liken it to the slowdown in performance that affected CPUs once they hit the 100+ W point.
It'll be interesting to see if there's a decent bump in performance/W from Global Foundries - something that won't become relevant for a while though.

Jawed
 
I think it was more a problem, that some sites compared 1.12V 55nm cards with 1.06V 65nm GTX 260 (which entered the market in late 08), so they got equal numbers.
Seconded. There was large variation on the samples, so one might have been better off with taking a look at the supplied voltages.
 
Back
Top