AMD: R7xx Speculation

Blazkowicz · Nov 27, 2007

Arun said:
Errr, no. GPUs are massively parallel in terms of *processing* logic, but not in terms of *control* logic. Here is a list of systems in a DX10 GPU that are *not* amenable to fine-grained redundancy:
- All I/O & display analogue.
- Video decoder hardware.
- PCI Express controller.
- Memory controller.
- Input assembly*.
- Clipping/Culling.
- Triangle Setup*.
- Rasterization.
- Global VS/PS arbitration.
- Texture addressing*.
- ALU Scheduler.
*: Depends slightly on implementation or only part of the process may be fine-grained (i.e. not enough)...

That's NOT a small list, and it's NOT a negligible part of the GPU. Good luck implementing fine-grained redundancy for any of those things, unless your definition of fine-grained is to duplicate things (increasing area by 25% to 100%) and not using them!

so, the slave chips could get away with non-functioning display I/O, video decoder and possibly PCIe controller.
would that significantly help?

_xxx_ · Nov 27, 2007

Buntar said:
Which hardware would be simplified? The ROP unit in RV610 is advertised to be exactly the same as the one in RV630.

AA resolve is done in the shaders, so it has nothing to do with the ROP's I guess.

Entropy · Nov 27, 2007

Jawed said:
It does concern me that R6xx is a costly "base architecture" - 390M transistors for RV630 is sort of ludicrous. Yet a vast number of those transistors are caught up in D3D10-specific functionality that you won't find in, say, RV570 (which has 330M transistors and is often faster - though tends to be slower in newer games). And what are the finer processes for, if not to add features and make newer games run faster?

To make smaller, cheaper, and more power efficient chips?

hoom · Nov 27, 2007

AA resolve is done in the shaders, so it has nothing to do with the ROP's I guess.

Are you suggesting that the ALUs are simplified instead?

Probably more like some Marketing fulla saw 8*AA test results & asked to have 8*AA disabled in drivers to prevent comparison (would also relieve the driver team of having to try to optimise 8*AA on such a low performing part so they'd hardly be complaining)

_xxx_ · Nov 27, 2007

No, I'm only saying that the ALU's do the AA-resolve. And them being programmable, I guess there is no "hard" limit on that. But I don't know what the reason might be unless they capped it to keep people from playing with single-digit fps or some such.

Jawed · Nov 27, 2007

To generate 8xMSAA the rasteriser needs "extra bits" to go beyond 4xMSAA, since it needs to generate twice as many Zs.

To be honest, I don't think the rasteriser complexity associated with 8xMSAA would be much more than 4xMSAA.

Presumably, also, the compression gubbins in the RBEs (for both colour and Z) needs to be much beefier for 8xMSAA than 4xMSAA. That's a doubling in the size of the tag tables and presumably an extra level of complexity when building the cache read/write hardware.

Available bandwidth is a pretty compelling argument (even if R6xx is more efficient with its bandwidth than prior GPUs) and the people this is targetted at may never even know what MSAA is.

Does Aero use MSAA? I presume it does...

Jawed

neliz · Nov 27, 2007

_xxx_ said:
But I don't know what the reason might be unless they capped it to keep people from playing with single-digit fps or some such.

I think that is the case, I just checked some recent games (cod:4, Dirt and NFS

S) and most will start with AA applied, be it 2x or 4x.

Apocros · Nov 27, 2007

Buntar said:
I was under the impression that of concern for hotspots was not the MC itself but rather the "memory bus" wires surrounding it. As I understand it, those "buses" run at the DRAM frequency, and that frequency is way higher than the core frequency.

Personally, I would think that the primary issue motivating this MC architecture is not thermal hot-spots but rather routing hot-spots (ie routing congestion).

Jawed · Nov 30, 2007

http://www.bit-tech.net/hardware/2007/11/30/rv670_amd_ati_radeon_hd_3870/18

That’s not to say that RV670 is without its problems, but a lot of the major ones have been worked out. There’s still a complete lack of hardware-based MSAA resolve in RV670’s render back-ends, which hampers performance when you’re using anti-aliasing. However, having spent a lot of time talking with AMD’s GPU product managers, I decided that it wasn’t that the current generation Radeon’s anti-aliasing performance was particularly bad – it just isn’t as good as Nvidia’s efficiency at 4xMSAA.

The reason we say this is because performance on Nvidia’s hardware drops off at 8xMSAA to such an extent that the GeForce 8800 GT is no longer noticeably faster than the Radeon HD 3870 – so much so that you’d be pushed to tell the difference between the two in a blind taste test. It could be that Nvidia just hasn’t tuned its hardware to efficiently use 8xMSAA though, as its 8xCSAA mode is almost as good as 8xMSAA and isn’t quite so severe on the performance stakes.

Regardless, we were pleased to hear that R700 won’t suffer from the same problems and, from what I understand, there will be hardware-based MSAA resolve in there for scenarios when shader-based anti-aliasing is less than optimal for what the developer is trying to achieve. In the past we were spoiled with great anti-aliasing performance and quality on ATI’s hardware, and it just hasn’t been as good as it should be this generation – AMD knows this.

Jawed

compres · Nov 30, 2007

Jawed said:
http://www.bit-tech.net/hardware/2007/11/30/rv670_amd_ati_radeon_hd_3870/18

Jawed

Does anyone have information on how much transistors are spent on the AA resolve hardware?

Jawed · Nov 30, 2007

compres said:
Does anyone have information on how much transistors are spent on the AA resolve hardware?

In non-R6xx GPUs it's likely to be relatively few. This is because fetching the data is something the ROPs need to do anyway, so for AA resolve you're just looking at "averaging" (blending).

Since ROPs already have to do some blending due to alpha, I presume that AA resolve re-uses the blending capability of the ROPs.

But, as far as I can tell blending only has two operands, whereas AA resolve has the number of MSAA samples as operands (e.g. upto 6 in R5xx). So then you get into a question of how many "loops" are performed - or whether the GPU tries to do single-cycle resolve. Somebody else hopefully has a better idea of the trade-offs going on here.

Another expense relates to the format of the pixels. fp16 and fp32 pixels are more complex to blend or to do an AA-resolve. This appears to be one of the major motivations for R6xx's design, making floating point AA resolve purely a function of the shader pipeline.

Which is why I think Tim's got the wrong end of the stick... I sincerely believe resolve isn't the issue at all. I'm hopeful though that AMD's noises about "improved AA performance" do actually amount to something.

Jawed

Blazkowicz · Nov 30, 2007

you know I checked HD2400XT specs and it has 700MHz 64bit GDDR3.. whoops that's pretty great! the crappy cards have come a long way since FX 5500 and 9200SE.
a friend has a 8500GT and we were pretty impressed (running warcraft III everything on high, 16x/16x)
that 2400XT would have the bandwith to use 8x AA in some older games. else 4x/16x aren't too bad settings.

MSAA is useful on crappy cards, even integrated (where 800x600 2x might look and run better than 1024), and crappy/integrated cards are also what you have on laptops.

INKster · Nov 30, 2007

Blazkowicz said:
you know I checked HD2400XT specs and it has 700MHz 64bit GDDR3.. whoops that's pretty great! the crappy cards have come a long way since FX 5500 and 9200SE.
a friend has a 8500GT and we were pretty impressed (running warcraft III everything on high, 16x/16x)
that 2400XT would have the bandwith to use 8x AA in some older games. else 4x/16x aren't too bad settings.

My laptop (which was bought for work, not playtime

) has a 8400M GS with 600MHz (1200MHz effective) GDDR3, too.
It's good enough to play CoD4 in 800x600 at medium settings with 2x AA (incidentally the in-game benchmarking tool also recommends 2x AA according to my system specs), and it's surprisingly fluid at it.

Like i said elsewhere, a HD3850 512MB would probably be a really big sensation in the mobile market, just below the outrageously expensive "bricks" that we have now.
I don't think even the 8800M GT (the cheaper G92, with 64 sp's) or the current 8600M GT/8700M GT would be as good in the price/performance/power savings departments as the less expensive RV670.

itaru · Dec 3, 2007

Perhaps, the structure of R700 becomes it so.

R700 high end

GPU0---------GPU1
　　| ＼　　／|
　　|　　 IO　　|
　　| ／　　＼|
GPU2---------GPU3

First Fusion
　　 -------
CPU ------ GPU
　|　 ------
　|
Chipset
(IO)

GPUx(GPU core)
IO(RAMDAC,UVD,PCI-E,etc)
----HT link

satein · Dec 3, 2007

I knew this does not directly relate to the R700 but there is an article in Japanese at the pcwatch.impress talking about "R680 AMD which faces to dual die/di GPU".

The original japanese link is here R680 AMD which faces to dual die/di GPU
The translation by babelfish is here R680 AMD which faces to dual die/di GPU
(Anyway, I hope silent_budda can deal with it and have an explianation better than the translation)

Hope it may be a good point on the talk about R700 multi-core or multi-chip design here.

Edit: The translation caused me headache :???:

Seem there are too many reading to make me understand the article

compres · Dec 3, 2007

I sure hope the R680 is a dual die and not some XFire, like a GX2.

Edit: Looking at the slides, looks like some XFire solution. Not so nice.

ShaidarHaran · Dec 3, 2007

We've already seen shots of the HD 3870 X2/HD 3890 (or whatever AMD decides to name it). There are two discrete dies visible, and certainly not on the same package. The slides on that site reflect this. The PCIe link is likely just what it says, and the CF is taken care of on the PCB.

Geo · Dec 3, 2007

itaru said:
Perhaps, the structure of R700 becomes it so.

R700 high end

GPU0---------GPU1
　　| ＼　　／|
　　|　　 IO　　|
　　| ／　　＼|
GPU2---------GPU3

First Fusion
　　 -------
CPU ------ GPU
　|　 ------
　|
Chipset
(IO)

GPUx(GPU core)
IO(RAMDAC,UVD,PCI-E,etc)
----HT link

Been reading Goto, I imagine?

http://badhardware.blogspot.com/2007_12_01_archive.html#7585079780451775119

itaru · Dec 4, 2007

Geo said:
Been reading Goto, I imagine?

http://badhardware.blogspot.com/2007_12_01_archive.html#7585079780451775119

No.
If R700 uses the multiprocessor technology of AMD, such a forecast is very easy.
Moreover, if the rumor of Fudzilla is true, R700 will have Shader of the twice clock
in half the number of RV670.
However, the number of RBEs and TUs might be little.

Mintmaster · Dec 4, 2007

itaru said:
No.
If R700 uses the multiprocessor technology of AMD, such a forecast is very easy.
Moreover, if the rumor of Fudzilla is true, R700 will have Shader of the twice clock
in half the number of RV670.
However, the number of RBEs and TUs might be little.

R700 would have been in development long before the merger with AMD. Moreover, I seriously doubt that AMD's inter-processor connects can meet the needs of a high end GPU unless you're doing simple Xfire/SLI type connections that waste RAM.

AMD: R7xx Speculation

Blazkowicz

_xxx_

Entropy

hoom

_xxx_

Jawed

neliz

GIGABYTE Man

Apocros

Jawed

compres

Jawed

Blazkowicz

INKster

itaru

satein

compres

ShaidarHaran

hardware monkey

Geo

Mostly Harmless

itaru

Mintmaster

Similar threads