AMD RV770 refresh -> RV790


The recent history of graphics processing units does not know examples when a company introduced a brand-new GPU with new internal design that would only be 20% - 30% faster compared to predecessor in the same price-range. Therefore, the information about increased number of stream processors inside RV790 is not correct, according to sources with known of the situation.
RV740 seems to be this.

Jawed
 
Bla, bla, bla. Somebody thinks.....:LOL::LOL::LOL:

I can't and I don't want to believe..... This story is suspicious. ATi do not need so much time/ 10 months/ just to increase the frequencies of its chips..
 
Haha, every1 is confused like it was at first w/ rv770. IF rv790 ist just a oc rv770, why would they name it rv790 when its still a rv770? Think about that.
 
I wonder how in the world only a small 10-15% OC on core could lead to +20%-30% performance, if SP remain the same... :smile:

IMHO:

1) +20-30% performance? More SP (and maybe different production process)

2) Very little performance increase? Then that article could be right.

It's not impossible if memory clocks are increased more than the alu clocks. RV770 is quite unbalanced in it's peak alu/memory bandwidth. It's 33 for rv7700 but only 18 for gt200. Though this ratio is expected to increase over time, yet rv770 was a massive jump in this ratio.

They would be barking mad to tape out a new chip on the same process with 20% more alu's and introduce a new brand new arch for dx11 on a new process for their mainline chips.
 
It's not impossible if memory clocks are increased more than the alu clocks.
RV770 in HD4870 already has excessive bandwidth. It has 80% more bandwidth than HD4850 yet at best offers around 50% performance gain with 8xMSAA. Typically it's ~25-35% faster (i.e. 4xMSAA as this is what most tests evaluate).

RV770 is quite unbalanced in it's peak alu/memory bandwidth. It's 33 for rv7700 but only 18 for gt200. Though this ratio is expected to increase over time, yet rv770 was a massive jump in this ratio.
ALUs use zero bandwidth. You want to think in terms of texel rates or fillrates.

They would be barking mad to tape out a new chip on the same process with 20% more alu's and introduce a new brand new arch for dx11 on a new process for their mainline chips.
My signature's staying unchanged for the time being. Though I suppose it'd be clearer if I actually mentioned 40nm :LOL:

Jawed
 
Haha, every1 is confused like it was at first w/ rv770. IF rv790 ist just a oc rv770, why would they name it rv790 when its still a rv770? Think about that.
Thought about for two secs: R420 -> R481 would be a precedence case. :) No other conclusion on this matter though.
 
RV770 in HD4870 already has excessive bandwidth. It has 80% more bandwidth than HD4850 yet at best offers around 50% performance gain with 8xMSAA. Typically it's ~25-35% faster (i.e. 4xMSAA as this is what most tests evaluate).

In which case nv overshot with gt200, it has ~half the compute throughput but ~25% more memory bandwidth. Then we can be sure that the next chip, gt212 (if it hasn't been cancelled) will see massive/enormous/huge jump in compute. ~2.5x like rv770 anyone?:devilish:

The AA stats could be because of a migration from memory bound to compute bound while shading pixels. Ie, it is memory bound in 4xmsaa but is compute bound in 8xmsaa. It takes only a small push to go over the hill if you are near the threshold. Perhaps, somebody could remind me of the results of tests for 16xaa to check my theory for sanity.

ALUs use zero bandwidth. You want to think in terms of texel rates or fillrates.

I know. :eek: I was referring to peak alu throughput, or peak compute bandwidth. Somewhat unusual term I admit.
 
Haha, every1 is confused like it was at first w/ rv770. IF rv790 ist just a oc rv770, why would they name it rv790 when its still a rv770? Think about that.

I totally agree with you. ;)

It's not impossible if memory clocks are increased more than the alu clocks. RV770 is quite unbalanced in it's peak alu/memory bandwidth. It's 33 for rv7700 but only 18 for gt200. Though this ratio is expected to increase over time, yet rv770 was a massive jump in this ratio.

They would be barking mad to tape out a new chip on the same process with 20% more alu's and introduce a new brand new arch for dx11 on a new process for their mainline chips.

In which case nv overshot with gt200, it has ~half the compute throughput but ~25% more memory bandwidth. Then we can be sure that the next chip, gt212 (if it hasn't been cancelled) will see massive/enormous/huge jump in compute. ~2.5x like rv770 anyone?:devilish:

The AA stats could be because of a migration from memory bound to compute bound while shading pixels. Ie, it is memory bound in 4xmsaa but is compute bound in 8xmsaa. It takes only a small push to go over the hill if you are near the threshold. Perhaps, somebody could remind me of the results of tests for 16xaa to check my theory for sanity.

The raw bandwidth data says very little about bandwidth usage... ;)
RV770 showed to need much less bandwidth than GT200. This may also be due to the much higher internal bandwidth of RV770, if compared to GT200, resulting less bottlenecked.
If a chip is not bw limited, increasing bw would not result in a big performance increase. Thus it is almost impossible to squeeze +20% or even +30% more performance from RV770 on air with only 850MHz of core... ;)
 
Last edited by a moderator:
The AA stats could be because of a migration from memory bound to compute bound while shading pixels. Ie, it is memory bound in 4xmsaa but is compute bound in 8xmsaa. It takes only a small push to go over the hill if you are near the threshold. Perhaps, somebody could remind me of the results of tests for 16xaa to check my theory for sanity.
Increasing the MSAA level reduces the workload, as per shaded fragment the RBEs:
  • without MSAA have to do a single Z test/update and write a single colour (if Z test allows)
  • with 4xMSAA have to test/update 4 samples' Z and write corresponding samples' colour
  • with 8xMSAA have to test/update 8 samples' Z and write corresponding samples' colour
Jawed
 
Ya, but it needs to run the pixel shader 1x, 4x and 8x times respectively doesn't it? So all the pixel shading math increases that many number of times. Or is it the case that pixel shader runs for only the fragment but only these tests are done for all the sub pixels. :???:
 
Or is it the case that pixel shader runs for only the fragment but only these tests are done for all the sub pixels. :???:
Yep that's it and the sub pixels are called samples. MSAA (multi sample anti-aliasing) only operates at the edges formed by triangle rendering (either the edges of triangles or where triangles overlap).

Jawed
 
Thought about for two secs: R420 -> R481 would be a precedence case. :) No other conclusion on this matter though.
R420->R480 transition was aimed mainly at yields. There were some issues with the low-k process (waffers were extremely fragile or something like that).

RV790 can target higher clock-speed and possibly idle power consumption. That could be enough to make HD48xx more interesting, than GTX260-216. I think these changes would be sufficient to make the chip competitive, but I wouldn't be surprised by 2 more SIMDs. I don't expect any more changes, ATi doesn't need them (at least for now).
 
RV790 can target higher clock-speed and possibly idle power consumption. That could be enough to make HD48xx more interesting, than GTX260-216. I think these changes would be sufficient to make the chip competitive, but I wouldn't be surprised by 2 more SIMDs. I don't expect any more changes, ATi doesn't need them (at least for now).

If they'd only tackle that one successfully, it'd make HD 4950 much more interesting than a clock/perf increase alone, since IMO it performs quite well for its price point.


R420->R480 transition was aimed mainly at yields. There were some issues with the low-k process (waffers were extremely fragile or something like that).
And yet, it stayed the same chip on the same process node. Nothing bad about it though.
 
Yep that's it and the sub pixels are called samples. MSAA (multi sample anti-aliasing) only operates at the edges formed by triangle rendering (either the edges of triangles or where triangles overlap).

Jawed

Thanks for explaining that.

Considering that most ppl are speculating a modest increase in speeds for rv790, what do you think are the chances that rv790 is rv770 with sideport done right?:LOL:
 
Back
Top