AMD: R7xx Speculation

Status
Not open for further replies.
Yes, possibly. But the HD38x0 are not known for being bandwidth starved.

Under certain conditions. Could you please link to the scenarios where the 3870 CF configuration is significantly faster when compared to a 3870X2 one?
 
Isn't the bandwidth requirement for the inter-frame stuff higher than the framebuffer copy? It seems strange that there would be a dedicated connection just for that.
It depends on the game. Optimally, there are no inter-frame copies at all. The dedicated connection means you can avoid at least some copies and any associated synchronization.
 
Thanks for the explanation, Dave! Can you tell us more what you mean by "hardwired"? What use is the PCIe-switch in X2 at all?
He just mentioned what traffic goes over PCIe (anything that's not a finished portion of the framebuffer in other words). The switch saves you a trip out to the mainboard for certain traffic, and lets you multicast from host to GPUs.
 
It depends on the game. Optimally, there are no inter-frame copies at all. The dedicated connection means you can avoid at least some copies and any associated synchronization.

Interesting but that then leads to a followup question. Why have dedicated connections at all with the glut of available PCIe 2.0 bandwidth? Latency concerns? This is a real surprise to me...I always thought those connections were a lot more integral to the process than just transferring the slave card's framebuffer.....
 
Interesting but that then leads to a followup question. Why have dedicated connections at all with the glut of available PCIe 2.0 bandwidth? Latency concerns? This is a real surprise to me...I always thought those connections were a lot more integral to the process than just transferring the slave card's framebuffer.....

Because the majority don't have PCIe2.0 mobo's?
 
Whats the chances of these cards having a true 1 gig of memory on them ? I really believe that at the higher res and fsaa modes games like cyrsis are limited more so by the amount of onboard ram than actual speed of the graphics card.

I'm sure that an 3870x2 class card would peform much better with twice the memory. It seems to me that I get alot of swapping and hardrive thrashing in cyrsis when I up the fsaa modes to high
 
Under certain conditions. Could you please link to the scenarios where the 3870 CF configuration is significantly faster when compared to a 3870X2 one?
Of course - I'll try to skip settings with high AA modes.
http://www.pcgameshardware.de/screenshots/original/2008/01/1201599232647.PNG
CoD4 seems not to care to much about bandwidth as GF8800 GT is almost on par with 8800 GTX for example
http://www.pcgameshardware.de/screenshots/original/2008/01/1201599898675.PNG
http://www.pcgameshardware.de/screenshots/original/2008/01/1201600162019.PNG


He just mentioned what traffic goes over PCIe (anything that's not a finished portion of the framebuffer in other words). The switch saves you a trip out to the mainboard for certain traffic, and lets you multicast from host to GPUs.
Right - thx. I just wanted to make sure, I did understand him correctly and there's indeed some traffic left to go via the PLX-Switch.

Again sorry, but I'm not a native speaker.
 
I disagree because the available bandwidth essentially goes to waste, which is why the drastic bandwidth cut for RV670 had very little impact on absolute performance in games (there may be some synthetics that show a big drop).

Jawed

Except in the professional market that makes greater use of the math and bandwidth of the R(v)6xx series...

I believe the recently released professional market Rv670 based product is lower tier and lower performance than the previously released R600 based product. I could be wrong on this as I just remember reading a blurb about it a few days ago.

If true however, that would suggest that while Rv670 suffers is some scenarios versus R600 none of those scenarios involves gaming. In other words, the any sacrifices that were made to R600 in order to reconfigure it into Rv670 don't impact the consumer space, but does impact the professional space.

Again, that's just my wholly unfounded speculation based on a blurb I read about where the professional Rv670 based product was placed in relation to existing R600 based products.

Regards,
SB
 
Last edited by a moderator:
Silent_Buddha said:
Luckily with Rv670 they were able to take the architechture and turn it into a rather attractive and fairly successful product by reconfiguring for the market segment its performance most closely matched.
Haha that has to be the fanciest way of saying "lowered the price" that I've ever seen :LOL:

Actually no, that was my attempt to be brief in saying that R600 was reconfigured to be cheaper to produce thus making it more of a match for the price point they were forced to sell the R600 for. Thus they were able to take an expensive to produce chip (R600) which left little if any margin for profit and using the same architechture turn it into a cheaper to produce chip (Rv670) which due to being cheaper to produce ends up being a successful product. In other words "reconfiguring for the market segment its performance most closely matched."

In other words, they didn't lower the price of Rv670. The price of the redone chip is a match for the price point they targetted it for. While R600 was obviously targeted for the high end, and missed by quite a bit. Rv670 was squarely targetted at the mid-range/performance mid-range and hit a bulls eye there. Finally giving Nvidia competition in at least one market segment.

And with Rv770 I'm expecting more of the same. AMD/ATI again focusing on trying to capture the mid-range/performance mid-range with a foray into the high end with an x2 type of card.

So while hardware is going to be key for capturing the mid/perforamnce segment. Drivers and multi-GPU advancements will be key in determining if they can make a viable stab at the high end/enthusiast class.

Regards,
SB
 
Of course - I'll try to skip settings with high AA modes.
http://www.pcgameshardware.de/screenshots/original/2008/01/1201599232647.PNG
CoD4 seems not to care to much about bandwidth as GF8800 GT is almost on par with 8800 GTX for example
http://www.pcgameshardware.de/screenshots/original/2008/01/1201599898675.PNG
http://www.pcgameshardware.de/screenshots/original/2008/01/1201600162019.PNG



Right - thx. I just wanted to make sure, I did understand him correctly and there's indeed some traffic left to go via the PLX-Switch.

Again sorry, but I'm not a native speaker.

Hmm, thanks. The trouble is that newer/other reviews using more recent drivers don't show a similar pattern, and even contradict the above:

http://www.techreport.com/articles.x/13967/6

http://www.techreport.com/articles.x/14284/5

http://www.bit-tech.net/hardware/2008/01/29/amd_ati_radeon_hd_3870_x2/7

http://www.bit-tech.net/hardware/2008/01/29/amd_ati_radeon_hd_3870_x2/8

The gist of my argument initially, and what was added further is that whilst there is still traffic going through the PCIE switch, it should not be significant in volume(significant enough to saturate it/make the 1.1 limitation significant). If the type of data that goes through the PLX switch is voluminous enough to bring about such a difference, you're already screwed in a multi-GPU setup because that means you have a lot of inter-frame dependencies and persistent resources to carry over, which basically nukes your AFR and makes you an overall unhappy camper:)
 
Thanks - maybe a bit depends on the individual testing methods also. :)
But for Crysis at Bit-Tech.net at 19x12/DX10: I find 20 to 14 Minimum-Fps quite significant.

The gist of my argument initially, and what was added further is that whilst there is still traffic going through the PCIE switch, it should not be significant in volume(significant enough to saturate it/make the 1.1 limitation significant). If the type of data that goes through the PLX switch is voluminous enough to bring about such a difference, you're already screwed in a multi-GPU setup because that means you have a lot of inter-frame dependencies and persistent resources to carry over, which basically nukes your AFR and
makes you an overall unhappy camper:)
AFAIK every millisecond counts in rendering, so it is not only a question of PCIe 1.1 being a limiting factor, but rather one of the factors consuming a certain amount of time. No matter if there's a GB/frame worth of data or some dozens of MBs - if you could cut that time in half, it would help.

But since Dave explained the workings of the X2, I'd also doubt, that PCIe 2.0 via Mainboard would be a dramatic improvement given the shorter communication on the X2-PCB
 
I'm not saying it would or wouldn't help, but one has to consider if the cost is worth the benefit. In this context, given the way things work, ATi probably made the right call in not bothering with a PCIE 2.0 switch, and the benefits derived from opting for one would've been too slim IRL.
 
Except in the professional market that makes greater use of the math and bandwidth of the R(v)6xx series...

I believe the recently released professional market Rv670 based product is lower tier and lower performance than the previously released R600 based product. I could be wrong on this as I just remember reading a blurb about it a few days ago.

If true however, that would suggest that while Rv670 suffers is some scenarios versus R600 none of those scenarios involves gaming. In other words, the any sacrifices that were made to R600 in order to reconfigure it into Rv670 don't impact the consumer space, but does impact the professional space.

Again, that's just my wholly unfounded speculation based on a blurb I read about where the professional Rv670 based product was placed in relation to existing R600 based products.

Regards,
SB

According to 3D Professor, the V7700 sometimes even exceeds the 8600s (partly attributed to higher clocks, partly drivers, partly Bone/Skulltrail)

It's definitely not the perf; my guess is cost- if it already reaches 90%++ of the highend counterpart (which the excess RAM is exclusively niche for) why add hi-cap RAM or redesign a board for it? Any gone performance would be back with a slight clock bump, and granted this uses GDDR4 to lessen the impact too.

V7600 had lesser performance, was noisy, but hit the sweet spot nevertheless for ATI. V7700 with Displayport is already hard to pass up as a card below a specific price. Not having a direct aim doesn't mean it can't compete- just that most who buy the V8600 wouldn't scoff at its shortcomings, and margins gained would be insignificant in the big pricetag, even more when you add in redesigns.
 
The posting by whocares was eaten twice, it seems. He was pointing to this:

http://www.nordichardware.com/news,7585.html

For me the amusing part is:

We've reported on more than one occasion that the number of unified shaders would be bumped to 480 shader processors, whereas our Eastern friends are elaborating that they will instead increase by 480.
I've added the bolding. Of course this still gets us nowhere, but it's a funny spin.

Jawed
 
But since Dave explained the workings of the X2, I'd also doubt, that PCIe 2.0 via Mainboard would be a dramatic improvement given the shorter communication on the X2-PCB
Sorry I was a bit short last night by the way, long day :( When thinking about GPU-to-GPU comms via PCIe, it pays to remember that as the GPU vendor, you're also the PCIe bus vendor as well (mostly). You still own the link because it's your core logic, so you can do some neat things there. Both AMD and NVIDIA both use non-spec transfers over PCIe in order to let the GPUs communicate. That kind of thing can happen with different setups too, to get you over latency hurdles and packet lengths and bursting considerations and all that kind of stuff.
 
www.nordichardware.com said:
The number of transistors is expected to land in the 830 million area, a 25 percent increase of the 666 million transistors RV670 sport.

R580 ATI 90nm 384M
R520 ATI 90nm 321M
-------------------------------------
In R580 extra ~63 million transistors added for 3:1 ratio = 48 pixel shaders

RV770 ATI 55nm 830M
RV670 ATI 55nm 666M
-------------------------------------
In RV770 extra ~164 million transistors

A. added additional 16 TMU's = makes total (16+16) 32 TMU's
B. added additional 96 pipelines = makes total (96+64) 160sp (800 streams).


==========================================================
R600 ATI 80nm 720M

ATI RV770 should be very close to R600 in die size. Maybe ? I don't know.

____________

Edit:
I'm still confuse about extra 164 million transistors.

How much more extra 16TMU's increases transistor count ??
How much more extra 96sp pipelines increases transistor count ??
 
Last edited by a moderator:
R600 ATI 80nm 720M

ATI RV770 should be very close to R600 in die size. Maybe ? I don't know.

No where close to R600 die size.
If RV770 really does have ~25% increase in trannies you can expect, roughly, the same increase in die size. Something like 240-250mm2, i.e. still smaller than G92 and possibly pretty close to 55nm G92b.

Edit:
I'm still confuse about extra 164 million transistors.

How much more extra 16TMU's increases transistor count ??
How much more extra 96sp pipelines increases transistor count ??
The only people that know that aren't at liberty to let us know.
Some here might be able to make rough estimates.
 
No where close to R600 die size.
If RV770 really does have ~25% increase in trannies you can expect, roughly, the same increase in die size. Something like 240-250mm2, i.e. still smaller than G92 and possibly pretty close to 55nm G92b.

It might be worthless speculative math, yet assuming the exact same transistor density/mm^2:

666 / 192mm^2 = ~3.47/mm^2
830 / 3.47 = ~239mm^2
 
Status
Not open for further replies.
Back
Top