AMD: R7xx Speculation

Status
Not open for further replies.
Therefore we must assume that RV770 is more than a 4xZ per clock RV670, or ATI will have lost the performance section of the market, after abandoning the high-end last year.
I do not agree. With about 50% more shaders, all free doing classic shaderwork an not having to downsample multiple AA buffers plus a ROP circuitry, that does not have to go the long way through VRAM and shaders to do their stuff (including less than ideal ways to get data to and from VRAM), performance is bound to increase more than just clocks and number of units would indicate.

Plus AMD will surely have made minor tweaks at least to further help RV770 for a - IMO - very good standing against G92 including "b".
 
Code:
P-1
 
[GPU0]     [mem]     [GPU1]
  |          |         |
  |          |         |
  -------[bridge]-------
          like NB
             |
             |
           [PCIe]
 
P-2
 
[mem]----[GPU0]          [GPU1]----[mem]
           |               |
           |               |
           |               |
           |               |
           ---- [bridge]----
           like vsu not PLX
                   |
                   |
                 [PCIe]
 
Code:
                [mem2]
                  ||
                [GPU2]
[mem0]            ||            [mem1]
  ||              ||              ||
[GPU0]=========[Bridge]=========[GPU1]
                |    |
                |    |
             [PCIe][RAMDAC]

Not exactly what I think the part would be but some of the reasoning makes sense. If they want the dice to be as compact as possible then offloading some of the redundant features would make sense, like NVIO. They also would need a lot of bandwidth between chips.

Assuming the same chip will be used for both mid and high end parts a direct link would be problematic if they went for a 3-way configuration. The backside of the bridge could be well out of PCIe specs if they desired and still wanted the PCIe.

Also if there are any plans to integrate the chips with a CPU then sticking HTX somewhere in this system would make some sense. Maybe not now but in the future.

As for the actual part, they've shown they like their 16 TUs. My guess is they like working on 4x4 blocks so to get 32 TUs they just add another chip. So I'm guessing 48*5(240)ALU/16TU in pairs.

Ultimately performance will be determined by the effectiveness of the bridge.
 
=>itaru: I was thinking about a "P-1" setup a few years back when first R700 info appeared (apparently that was a different R700 than RV770 actually is). The problem is, such a bridge chip would not have sufficient die area for a memory bus at least 256 bits wide. Plus the need for fast connections between the chips and the increased memory latencies, it simply wouldn't make much sense. It seems you can't just tear the memory controller off the chip, the same goes for shaders or texturing units, they need to be close together and close to the memory.
AnarchX's idea of offloading NVIO seems more plausible. Basically, you could have two or three chips on a board, but the display logic and RAMDAC's and whatever the NVIO does would not have to be duplicated three times, thus saving some die area on the GPUs. Apart from that, the NVIO chip could also function as a PCIe bridge (either switching it to HyperTransport traffic or just distributing the PCIe lines like the PLX). The downside is, however, that even a single-chip card would require the IO chip and that would raise costs. Also, having NVIO and PCIe bridge functionality on one chip would make PCB design more difficult. Considering the single-chip cards are more important to ATi than the dual-chip ones, tihs approach doesn't seem probable either.
 
El cheapo:

X2 config:
Code:
             [mem0]                          [mem1]
               ||                              ||
[RAMDAC]-----[GPU0]=====[0.5*(x16 Pci2)]=====[GPU1]
                |                               |
                |                               |
         [0.5*(x16 PCIe2)]             [0.5*(x16 PCIe2)]

Normal config:
Code:
             [mem0]
               ||
[RAMDAC]-----[GPU0]
               ||
               ||
         [2*(0.5*(x16 PCIe2))]

No bridge, just splitable x16 lanes, like on chipsets with max M lanes, splitable in N devices. I have no idea why this is a bad idea, or if you need special motherboard support, or if the bandwidth is just ridiculous.
 
Why 0.5*x16 PCIE 2.0 everywhere instead of 1?

For single gpu it would be 16x. It's just a drawing artifact to make it clearer. It's a 16 lanes pin-out, either in 2* 8x or single 16x config.

You could do optimizations, like upping the clocks on the x8 full duplex inter-gpu link, just reusing the PHY, with a different logical link for internal ring bus bridging. Possibly a 5cm link could use a higher frequency PHY, with less error coding redundancy and no logical PciE2 baggage. Im not an EE.

But then you are starting to design an interchip protocol and moving away from el-cheapo land. The only benefit is then avoiding the pcie bridge and extra chip pins for inter-gpu config, thus avoiding different SKUs for high performance X2s and single GPUs down to HDnn50s.

The drawback is less inter-GPU bandwidth than a full 3*16xPciE2 bridge chip. Are they still expensive?
 
http://www.nordichardware.com/news,7809.html

Both the GeForce GTX and Radeon HD 4800 series will arrive in about three weeks. Each series will bring two new cards to the market; GeForce GTX 280 and 260, and Radeon HD 4870 and 4850. There is a big difference between the cards though as the GeForce GTX series is enthusiast range, while Radeon HD 4800 series is more mid-range. There have been talks of what GeForce GTX 280 can do in Vantage, but it has now been completed with figures for the other cards.

These are of course in no way official and we can't say for certain where they come from. The only thing we know is that the numbers are not unreasonable, but some information about the rest of the system would be nice. ATI performance (with all cards) is still subpar due to poor drivers, and should improve in Vantage with coming releases. The numbers that are circulating the web are something like this;


Graphics card Vantage Xtreme profile*
GeForce GTX 280 41xx
GeForce GTX 260 38xx
GeForce 9800GX2 36xx
GeForce 8800 Ultra 24xx
Radeon HD 4870 XT 26xx
Radeon HD 3870X2 25xx
Radeon HD 4850 Pro 20xx
Radeon HD 3870 14xx
* 1920x1200 4AA/16AF

;)
 
The only question I have in regards to the RV770 is if it will come with an EFI rom?

For the love of god, throw a bone to the Mac community, which currently only have a choice between the Radeon 2600 XT and the Geforce 8800 GT.
 
El cheapo:

X2 config:
Code:
             [mem0]                          [mem1]
               ||                              ||
[RAMDAC]-----[GPU0]=====[0.5*(x16 Pci2)]=====[GPU1]
                |                               |
                |                               |
         [0.5*(x16 PCIe2)]             [0.5*(x16 PCIe2)]

Normal config:
Code:
             [mem0]
               ||
[RAMDAC]-----[GPU0]
               ||
               ||
         [2*(0.5*(x16 PCIe2))]

No bridge, just splitable x16 lanes, like on chipsets with max M lanes, splitable in N devices. I have no idea why this is a bad idea, or if you need special motherboard support, or if the bandwidth is just ridiculous.
I don't think it'll work. Because PCI Express is a point-to-point protocol you can't hang more than one device off a connection.

On older PCI Express mobos the entire 16x slot counts as a single connection. So when you plug the X2 in, both GPUs are contending to use the single connection and all hell breaks loose.

I think the newest mobos split the 16x slot into two 8x slots - though I'm not sure of this. This would support the configuration you suggest, since each GPU on X2 would have a dedicated 8-lane connection, but unfortunately it's a minority solution and so AMD couldn't make X2 depend on mobos of this design.

Jawed
 
For the love of god, throw a bone to the Mac community
You know, it's not ATi's or nVidia's problem that Apple screwed the Mac to make it "something better" than an ordinary PC.
Jawed said:
On older PCI Express mobos the entire 16x slot counts as a single connection. So when you plug the X2 in, both GPUs are contending to use the single connection and all hell breaks loose.

I think the newest mobos split the 16x slot into two 8x slots - though I'm not sure of this. This would support the configuration you suggest, since each GPU on X2 would have a dedicated 8-lane connection, but unfortunately it's a minority solution and so AMD couldn't make X2 depend on mobos of this design.
A majority of chipsets have one 16-lane connection that can't be split, even the newer ones. But nevermind, that's why the PLX bridge chip is there. I think the problem with Karoshi's diagram is that such a solution is not really cheaper than a monolithic chip integrating the RAMDAC's. Why else would nVidia put NVIO functionality back to G92 even if they were going to make a GX2 card?
 
Code:
                [mem2]
                  ||
                [GPU2]
[mem0]            ||            [mem1]
  ||              ||              ||
[GPU0]=========[Bridge]=========[GPU1]
                |    |
                |    |
             [PCIe][RAMDAC]
If GPU2 were something like a 780G IGP (to bridge PCI Express connections and to provide RAMDAC functionality) it could work I guess :p

Code:
[mem0]          [mem2]          [mem1]
  ||              ||              ||
[GPU0]===[IGP bridge/RAMDAC]====[GPU1]
                  ||    
                  ||    
                [PCIe]
That could imply that the single GPU card would also need this IGP/bridge/RAMDAC chip.

Jawed
 
With Hybrid-Crossfire and -SLI, the necessary 2D- and Video parts could go into the Mainboard altogether once that technology has matured a bit.
 
If AMD is taking the time for an expensive modified version of RV770 without the need of a bridge chip, as some people here suggest, why wouldn't they just create an MCM? Just connect the busses from the two GPUs and make them work like one. No problems with the number of pins or bandwidth, no double memory, no extra long PCB. I also don't think thermal characteristics would be a problem, given that NVIDIA's coolers can apparently dissapate 200+ watt.

btw, thanks for the answers to my previous post. I wasn't able to visit Beyond3D for a few days so I couldn't respond earlier.
 
You know, it's not ATi's or nVidia's problem that Apple screwed the Mac to make it "something better" than an ordinary PC.

Actually it is.

Mac OS X has support for RV670 (for example and many others) but there is no Radeon HD 3870 with an EFI rom. It is up to ATI or NVIDIA to supply an aftermarket card (or a Build to Order option).

ATI even make the drivers for Apple, whereas NVIDIA gives them the code and make them write it themselves.
 
I'm not sure why it is assumed that there will be some sort of arbitration logic present on a multikRV770 SKU, as though it were necessary to allow multiple RV770 chips to communicate.

One of the wonderful side effects of a shared memory pool and DMA is that any arbitration logic becomes redundant and therefore unnecessary.
 
Actually it is.

Mac OS X has support for RV670 (for example and many others) but there is no Radeon HD 3870 with an EFI rom. It is up to ATI or NVIDIA to supply an aftermarket card (or a Build to Order option).

ATI even make the drivers for Apple, whereas NVIDIA gives them the code and make them write it themselves.


And why is it ATI and NVs fault Apple desided to require EFIs instead of a regular BIOS chip for video cards? I know, Apples laziness to support more hardware than they want to.
 
=>Pressure: You wanted to feel better than the rest of us, you wanted to feel special, so you bought a Mac. It was your choice. Now please shut up and suffer the consequences.

=>ShaidarHaran: Arbitration logic? Why? If the R700 will be the same concept as R680, no other chips will be required - except the PLX PCIe bridge, but its main purpose is not communication *between* the chips.
 
Status
Not open for further replies.
Back
Top