AMD: R7xx Speculation

Status
Not open for further replies.
This could be many different things all of which doesn't necessarily have anything to do with rendering algorythm. Imo AFR will stay as the main rendering mode for all multi-GPU cards for the nearest future.

Right. As it's said there, GT200 isn't a proper codename, so you might wanna stop using it altogether -)

This is the board that has long been rumoured to be "almost 1 TFLOP" (and was supposed to launch in November).
Er, no.
As i see it, GT200 is a G100 chip which is at least two times faster than 1TF.
1TF board could be some cancelled chip like 192 SPs G90 or it could be the upcoming G92GX2, but i doubt that it is G100-based board. The way i see it G100 never was their fall'07 product.
 
Well since GT200Ultra(2Tflop) will most likely be a dual chip solution just like R780, unless Nvidia either magically goes to 45nm or try to fab a chip the size of a quater. The match will be more like between CF vs. SLi right? ;)

Where does it come from that their next generation chip will be dual-chip anyway? Vincent is willing to bet in some of his recent posts that it might be some sort of Pentium-D style dual core on a single die; while it's an interesting idea he's the only one I ever read anything relevant so far and that kind of config is a couple of miles away from being "dual-chip".
 
This could be many different things all of which doesn't necessarily have anything to do with rendering algorythm.
So, what things could it be?

How is it AFR based and not "CrossFire on a card"?

Right. As it's said there, GT200 isn't a proper codename, so you might wanna stop using it altogether -)
When I've got a good reason to use something else, I will.

Er, no.
As i see it, GT200 is a G100 chip which is at least two times faster than 1TF.
1TF board could be some cancelled chip like 192 SPs G90 or it could be the upcoming G92GX2, but i doubt that it is G100-based board. The way i see it G100 never was their fall'07 product.
The nearly 1TFLOP board was clearly setup to be released in November, because GPGPU users were primed with this information by NVidia - since changed to 2008H1. This is pretty much incontrovertible, because of double-precision. No G92-based GPU is double-precision.

Jawed
 
Where does it come from that their next generation chip will be dual-chip anyway? Vincent is willing to bet in some of his recent posts that it might be some sort of Pentium-D style dual core on a single die; while it's an interesting idea he's the only one I ever read anything relevant so far and that kind of config is a couple of miles away from being "dual-chip".

Like I said, to make a single 2Tflop chip Nvida has to either use 45nm or try to make a quater-sized chip, neither of which is realistic which leads me to believe anything that's capable of 2T for the coming generation will be a dual chip solution.
 
Fablemark is a stencil fillrate limited synthetic and tries to mimic the ancient Doom3 engine in a way:

http://www.computerbase.de/artikel/...0_rv670/10/#abschnitt_theoretische_benchmarks

While such a case isn't all that relevant anymore for today's scenarios, it does expose at least one weakness and it's not at all filtering related.
This is true, the result of all the r6xx cards are a complete disaster with 4xAA (pretty much approaching what you'd expect with supersampling...). No idea what happens here, maybe some weakness with stencil buffer + AA (though note G80/G92 take a very big hit too with AA, but it's not that drastic).
I wouldn't be in the least surprised if the future RV770 is stronger both in terms of Z/Pixel- as well as Texel-Fillrates compared to it's predecessor.
texel-fill certainly, I dunno about ROPs. I guess it would make sense given the other specifications (that is, twice the texture units, 1.5 times the shader units, twice the memory bandwidth). But maybe the ROPs are just slightly tweaked (something like more z compares per clock), and I wouldn't be surprised if they are unchanged neither.
 

Dead horse ain't dead enough yet, eh?
I'll forgive you that you mistaenly used the link with different resolutions, but more to the point this too only shows the combination of enabling AA & AF simultaneously. Everybody just assumes the hit is from AA, where it's likely a large part is due to AF.
I'm singing that song for some time now, but I'll summarize the numbers from the AA & AF scaling link I gave above:
The performance hit from enabling 4xAA on a HD3870 is less than enabling 4xAA on a 8800GT in all 3 tested games (Anno 1701, Fear, Oblivion) (sorry there's no r580 numbers in there, but so far I haven't heard any complaints that G92 AA is broken...). This doesn't mean there aren't games out there where it's worse (in fact that's probably somewhat likely), but if you want to prove that "AA is broken" (or slow if you want) then you better come up with benchmarks which separately enable AA & AF!
The performance hit from enabling 16xAF on a HD3870 is more difficult to evaluate vs. 8800GT. It would depend on the settings used. Sufficient to say that HQAF definitely performs much worse than the comparable G92 mode (why do you think the G92 has an advantage with "only" 4 times more filtering units...), whereas "normal" AF sometimes (but certainly not always, just look at Stalker) has a similar (or even smaller) performance drop than G92. But needless to say, the performance drop from enabling AF can be large.
 
Thinking back to this picture:

20080108a828732bfda38deri4.jpg


and OpenGL Guy's thoughts:

http://forum.beyond3d.com/showpost.php?p=1114277&postcount=454

Looks more like thermal compound stuck to a chip to me.
Makes me wonder if what we're seeing there is a PCI Express router chip (which might do other things too, e.g. CrossFire, UVD?). The board is a single GPU chip board, presumably (rumoured to be an RV770 board).

So the dual GPU chip board (R780 is what I'm calling it) would look similar, with this router chip placed below the module containing a pair of RV770s. The router chip is responsible for making the correct connections for certain functions and adapts depending on the number of GPU chips it's connected to (1 or 2).

For this to work it would need a bizarrely shaped package to sit atop this router chip. Normally, as far as I can tell, a package is soldered into position so that it it is held snugly against the PCB it's mounted upon. Is that right?

If that's really a chip there (router, or whatever) that's sat below the GPU chip, then I guess the GPU chip's package would need to be shaped to fit around it, so that it can fit snugly against the main PCB.

Sounds mad, I know, and there's the questionable thermals. Still, I can't help thinking that OpenGL Guy was being "helpful"...

Jawed
 
:oops: I've just realised that the rumour points to something that's 4x RV630 :!:

b3da007.gif

I feel a bit stupid for not realising and I guess I should apologise for trying to find a 4 SIMD configuration that fits the rumour. It's pretty funny that I've been talking about 4xRV630 (or should that be 4xRV635) for months now, but the penny didn't drop :oops: ...

The entertaining thing about this, though, is that it's a 12 SIMD configuration :oops:

Jawed
 
So, what things could it be?
Shared memory pool has nothing to do with rendering method for example and such a card would certainly not be a 'crossfire on a card' since you can't share a memory via a CF link.
I can think of other less extreme possibilities that will make RV770 in some way 'designed for multi-chip' too. PCIE switch integration being one of them.
What i'm saying is that this tidbit doesn't translate to 'not AFR card', he's just saying that RV770 will be in some way optimized for multichip boards configuration. But these multiple chips will most probably use AFR again as a method of multi-GPU rendering.

When I've got a good reason to use something else, I will.
Whatever. It's your choice to use Fuad bullshit.

The nearly 1TFLOP board was clearly setup to be released in November, because GPGPU users were primed with this information by NVidia - since changed to 2008H1. This is pretty much incontrovertible, because of double-precision. No G92-based GPU is double-precision.
Aren't Teslas and Quadros DP-capable already?
 
Shared memory pool has nothing to do with rendering method for example and such a card would certainly not be a 'crossfire on a card' since you can't share a memory via a CF link.
I can think of other less extreme possibilities that will make RV770 in some way 'designed for multi-chip' too. PCIE switch integration being one of them.
What i'm saying is that this tidbit doesn't translate to 'not AFR card', he's just saying that RV770 will be in some way optimized for multichip boards configuration. But these multiple chips will most probably use AFR again as a method of multi-GPU rendering.
Oh well, we'll just have to wait and see.

Aren't Teslas and Quadros DP-capable already?
No - CUDA supports DP, but there is no hardware support in currently-public cards. Though I wouldn't be surprised if UIUC, for example, has some GT200s (or whatever they're called) already, even if it's not ready for prime time.

Jawed
 
:oops: I've just realised that the rumour points to something that's 4x RV630 :!:

I feel a bit stupid for not realising and I guess I should apologise for trying to find a 4 SIMD configuration that fits the rumour. It's pretty funny that I've been talking about 4xRV630 (or should that be 4xRV635) for months now, but the penny didn't drop :oops: ...

The entertaining thing about this, though, is that it's a 12 SIMD configuration :oops:

Jawed
But 4x rv630 like that would add quite some complexity - this wouldn't be a "simple" 12 SIMD configuration but rather a 4x3 SIMD configuration. I've got some doubts about this approach...
 
But 4x rv630 like that would add quite some complexity - this wouldn't be a "simple" 12 SIMD configuration but rather a 4x3 SIMD configuration. I've got some doubts about this approach...
Yeah, me too. It seems to have a huge "control overhead" when compared with RV670.

The alternative would be to gang RBEs, e.g. pairs of RBEs per SIMD group, so 6 SIMDS each of 4 quads, each feeding a pair of RBEs, which I suggested earlier.

I came up with 4x RV630 because I was trying to work out something that would be symmetrical in terms of ring stops. But, since I've never worked out how RV630 is symmetrical in terms of ring stops (ultimately, is it?), this stuff has always posed a bit of a problem for me.

Jawed
 
This is true, the result of all the r6xx cards are a complete disaster with 4xAA (pretty much approaching what you'd expect with supersampling...). No idea what happens here, maybe some weakness with stencil buffer + AA (though note G80/G92 take a very big hit too with AA, but it's not that drastic).

Well if you guys want to set a toombstone on the "broken/fixed" crap surrounding R600, I'd just propose that one could say it falls short in terms of fillrates. One can then go deeper and try to analyze in depth maximum theoretical Z/stencil/pixel/texel etc fillrates.

texel-fill certainly, I dunno about ROPs. I guess it would make sense given the other specifications (that is, twice the texture units, 1.5 times the shader units, twice the memory bandwidth). But maybe the ROPs are just slightly tweaked (something like more z compares per clock), and I wouldn't be surprised if they are unchanged neither.

I'm not so sure I'd call a ROP optimized for 4xAA samples/clock vs. one with 2xAA samples/clock "slightly optimized". If I'd count Z fill based on the amount of samples I'd come up with following theoretical examples:

16*800*2 = 25600
16*800*4 = 51200

I'm not saying or indicating that AMD will go such a route, but there are some quite funky numbers floating around concerning RV770 that indicate quite healthy increases in terms of fillrates in general.
 
Like I said, to make a single 2Tflop chip Nvida has to either use 45nm or try to make a quater-sized chip, neither of which is realistic which leads me to believe anything that's capable of 2T for the coming generation will be a dual chip solution.

G80 was such a "quater-sized" chip and that on 90nm. If someone told you in early 2006 when G71 just appeared that they're working on a 480mm2 die with 680+M transistors (while G71 was again on 90nm just 196mm2 large with 278M transistors) your reaction would had been similar.
 
G80 was such a "quater-sized" chip and that on 90nm. If someone told you in early 2006 when G71 just appeared that they're working on a 480mm2 die with 680+M transistors (while G71 was again on 90nm just 196mm2 large with 278M transistors) your reaction would had been similar.

IIRC G92 is 330mm2, G96 is 230mm2, I'll be lazy and assume both are approaching half Tflop. This makes me think a 2Tflop 65nm chip will be way bigger than G80 even if Nvidia improves the per transistor efficiency.
 
IIRC G92 is 330mm2, G96 is 230mm2, I'll be lazy and assume both are approaching half Tflop. This makes me think a 2Tflop 65nm chip will be way bigger than G80 even if Nvidia improves the per transistor efficiency.

G92 is up to now at 624 GFLOPs/s; if there's going to be a higher end variant than the GTS/512 the FLOP rate might rise even more. Some of the rumours that float around speak of 55nm; if that's true and the target is at least 2TFLOPs than of course will it be bigger than G80, but way bigger is highly relative to one's POV. G80 is at 484mm2 (w/o NVIO) and G71 was at 196mm2; that was a jump of over 250% in die size, that was truly "way bigger".
 
On a different note: Wouldn't the "X2" configuration of this chip hit the ground running, so to speak? The architecture is quite similar to the 3870x2, and they would have had more than 6 months to work on the driver for it?

Perhaps also, some games will start coming out that are optimized for AFR, so they won't specifically need driver tweaks to run better when they are released.
 
On a different note: Wouldn't the "X2" configuration of this chip hit the ground running, so to speak? The architecture is quite similar to the 3870x2, and they would have had more than 6 months to work on the driver for it?
The trouble with AFR drivers is that you have to work on them all the time.
Most of RV670x2 troubles are found in the newer games, and if they will fix those troubles in a couple of months there will always be newer games that will have troubles with X2 again.
So it's an ongoing process of 'fixing' drivers, not something you that can be done once and for all.
Another problem is that by the time they'll fix some X2 problem in some new game this game might already become quite irrelevant since everybody would already finish it (in one chip mode in case of an X2 card).
That's the main problem of any AFR card imo.

As for the games being optimized for AFR -- it's interesting to note that most of the games now is in NV's program (TWIMTBP or whatever) which should make them SLI-compatible (AFR mostly too) right from the release day, but they're not. Crysis isn't that SLI/CF-compatible even with the patch which was made for MGPU-compatibility. WiC has serious issues on X2. Jericho isn't playable on X2 in MGPU mode. BioShock DX10 gain nothing from CF, etc.
That makes me question if there is some fundamential incompatibility between AFR (or MGPU rendering in general) and newer DX10 games -- some methods and algorythms maybe that make AFR/SFR rendering quite ineffective?
 
Last edited by a moderator:
To be cynical, if NVidia plan to continue to compete at the high-end with large monolithic GPUs versus AMD's dual-chips then it isn't in their best interests to push development of AFR in TWIMTBP games at the present time.

Such a stance would, in theory at least, benefit NVidia in the short-term as the dual-GPU solutions are still very much in the minority.

Personally, I think NVidia are hard-nosed enough to hold back from assisting developers too much with multi-GPU solutions for this reason. This kind of attitude is why they have such a successful business.

Just a speculation.
 
Status
Not open for further replies.
Back
Top