AMD confirms R680 is two chips on one board

I not saying that Xenos can/can't use 256GB/s bandwidth or that RV670 is/isn't bandwidth limited.

I'm saying it can't be both.

Either 72GB/s isn't enough for RV670 or it is, and by extension its enough for Xenos aswell.

Aside from the compression difference and perhaps some other minor efficiency improvements, there's nothing about the Xenos core (that i'm aware of) that makes it likely to consume more bandwidth than RV670. In fact, the exact opposite is true. The fact that Xenos was designed for 720/4xAA only strengthens my point since they are very low settings for RV670 were its even less likely to be bandwidth limited.

Its nothing to do with it being in a console because if there is a situation in a console environment were a chip is bandwidth limited then that same situation can easily translate over to a PC, expecially given how many games on the 360 turn up on the PC. For example, if Lost Planet used 200GB/s on the 360, why would it only need 50GB/s for the same visual result in the PC?

The bottom line is that if RV670 is incapable of using 72GB/s in most situations then so is Xenos. Conversly, if Xenos finds itself regularly eating up well over 72GB/s then so would RV670 (afterall, both are playing mostly the same games with the same visuals).

All this is of course assuming that the compression issue doesn't make up for the large gap.

Did you at least make an attempt of reading the article I linked to?Did you check out what fits and doesn't fit in the EDRAM?Did you check out how it's typically used?Did you check out the bandwidth that Xenos normally has available to it(22.4GBs to the main memory, 32GB between Parent and daughter die, BTW), if stuff doesn't go to the EDRAM/it isn't used for some reason?Read page 4 paragraph 2 as well.

Again I'm asking how do you determine what Lost Planet uses or doesn't use or how do you take that comparison to the desktop realm?I personally have no friggin clue as to how its BW demands are. You function under the assumption that the EDRAM was some absolutely needed thing and that Xenos as it is uses the entirety of the BW it brings. That's hardly the case. I think it was relatively cheap to add, looked good in terms of providing paper specs, and gave some interesting possibilities(I'm thinking primarily about the prospect of using tiling and thus having really low performance reduction with 4X MSAA at 720p, but devs don't seem to have swarmed over it due to a number of reasons).

There's nothing showing the 3870 to be horribly BW limited in normal useage scenarios. Other parts like the 8800GT, being also an 16 ROP part, seem to do OK with even less bandwidth. Again, don't misunderstand this as some crusade against increasing BW-it isn't. But in the context of the RV670 a 512-bit bus would've been pointless, as it was in the context of the R600.

Let's get another exampleo_O1900xtx and 2900xt. Gobbles more BW, same number of ROPs, TUs and so(let's call them RBEs as per ATis nomenclature for the 2900 line). Did it translate into a whopping defeat for the x1900xtx under supposedly BW limited scenarios?Nope. And let's ignore the "Shader-resolve killing performance" argument as that's fairly invalid.

To get my point across:IMHO, the RV670 isn't in great need of BW, it's hardly a limiting factor for it, considering its typical useage scenarios. Simply look at it man, show me at least some indication of BW limits IRL. What Fellix said is probably correct(haven't checked that out....and I also don't have a RV670), but you're not going to be spending your time doing blending only, are you?
 
What Fellix said is probably correct(haven't checked that out....and I also don't have a RV670), but you're not going to be spending your time doing blending only, are you?
Well, don't we? :LOL:

Anyway, my intention was to point a simple BW-related situation with an eye for ease comparison.
IMHO, it is evident that for the moment, the most cost effective implementation for the R600 marchitecture is namely the 8-way 256-bit interface, with some *arguably* GDDR4 hi-speed touch. ;)
 
Precisely. And with 2.25-2.5GHz GDDR4 to back it up, RV670 is anything but bandwidth-deficient.

I have a HD2900PRO 512bit mem bus and some test would show that the HD3870 could be memory bandwidth limited. I clocked the HD2900pro to core or 845mhz and the memory to 850mhz. In 3Dmark 06 I was getting around 9000. I uped the memory speed to 910 and left the core the same at 845MHZ and the score went up to 9600, a 600 point jump. This would show a memory bandwidth was limiting the core at 845mhz and that was on a 512bit memory bus.
 
The main difference between console and PC is that the console has to perform well under far fewer configurations (resolutions, AA modes, etc.). This makes it possible to target the console very carefully at one or two high priority setups, and lets you design more towards the "worst-case" end of the spectrum than a "typical case".

Xenos was intended to never bottleneck on pixel throughput, and in the timeframe it was being designed the obvious candidate for backend bottlenecks was heavy alpha blending (for particles/smoke/weather/etc.) rather than the current HDR (4xfp16) and deferred rendering candidates.

So do the math. Facts:
500 MHz
32 samples/clk (8 pixels @ 4xAA)
no compression
32 bits/sample for z/stencil
32 bits/sample color at full-speed, 64 bits/sample at half-speed

Therefore, with Z/stencil read+write and alpha blending (color read+write) for each sample, the backend units can consume up to:
500 MHz * 32 samples * ((2 * 4B) + (2 * 4B)) = 238.4 GB/s

Which is 93% of the 256GB/s theoretical peak -- which is better than you'd get in a more complex setup (like more clients and having to arbitrate between them).

So yeah, Xenos' EDRAM bandwidth and ability to consume that bandwidth are very well matched and provide exactly what was being aimed at: 4xAA with alpha-blend and z-test without the backend being a bottleneck.

Doing the same analysis for RV670 is much harder: compression has varying effectiveness, and the bandwidth is shared with textures, vertex data, scanout, etc. (which not only compete for bandwidth, but make it harder to get maximum efficiency out of the DRAMs). But it's easy to see that if you've got 4xFP16 rendertargets (RV670 can blend these at full speed) and/or less than perfect compression, RV670's RBEs can be bandwidth limited even though they only do two samples/pixel/clk.

Of course this kind of heavy alpha blending w/ Z read/write is only a fraction of current frames, so the RBEs will typically consume much less bandwidth than this -- that's what I meant by designing PC parts more around typical or average case than worst case.

I'm not saying RV670 is often bandwidth-limited in current games -- just that it has the potential to be, unlike Xenos (considering only backend bandwidth, of course).
 
Again I'm asking how do you determine what Lost Planet uses or doesn't use or how do you take that comparison to the desktop realm?I personally have no friggin clue as to how its BW demands are. You function under the assumption that the EDRAM was some absolutely needed thing and that Xenos as it is uses the entirety of the BW it brings.

Again, i'm not trying to make definate ascertions either way. I'm not saying Xenos DOES use all that bandwidth or that RV670 IS bandwidth limited. What i'm saying is that two mutually exclusive statements have been made and i'm asking for that to either be acknowledged or the conflict resolved via some technical reason that i'm not aware of. For clarification, the two statements are:

  • Xenos can use near 256GB/s of bandwidth (doesn't matter that its only for limited situations and not main memory, it can still use it for some things)
  • RV670 cannot use more than 72GB/s
To me, both of those statements can't be correct, either one, or the other is wrong.

We can say that most of the time, Xenos doesn't use anywhere near that much bandwidth but the fact of the matter is that situations were it does use a lot of it (if they even exist) would also be able to exist in a PC game and thus in those situations RV670 would be bandwidth limited.

The question then becomes one of how often to those situations occur? Is it so rarely that it doesn't have much noticable impact on real world performance? And if thats the case, what implications does it have for the usefulness of all that bandwidth in the 360?
 
The 8800GT isn't, but the 8800GTS is very limited by it.

The proof of that being where?Ignoring that we're talking trans IHV comparisons that don't make that much sense(I picked the 8800GT as an (far-fetched)example because it packs a similar functional unit arrangement, the GTS(classic) handles more pixels(20>16), nV can handle 4 multisamples per cycle, ATi only does two etc.).Show me this great BW limited scenario
 
Nothing to get excited about. Ok, it might give the 14 months old 8800 GTX a fight. Yawn...

This is gonna be a crappy year.
 
Again, i'm not trying to make definate ascertions either way. I'm not saying Xenos DOES use all that bandwidth or that RV670 IS bandwidth limited. What i'm saying is that two mutually exclusive statements have been made and i'm asking for that to either be acknowledged or the conflict resolved via some technical reason that i'm not aware of. For clarification, the two statements are:

  • Xenos can use near 256GB/s of bandwidth (doesn't matter that its only for limited situations and not main memory, it can still use it for some things)
  • RV670 cannot use more than 72GB/s
To me, both of those statements can't be correct, either one, or the other is wrong.

We can say that most of the time, Xenos doesn't use anywhere near that much bandwidth but the fact of the matter is that situations were it does use a lot of it (if they even exist) would also be able to exist in a PC game and thus in those situations RV670 would be bandwidth limited.

The question then becomes one of how often to those situations occur? Is it so rarely that it doesn't have much noticable impact on real world performance? And if thats the case, what implications does it have for the usefulness of all that bandwidth in the 360?

I give up:|
 
To be realistic, ATI has been trailing nV ever since the GF6800 appeared.

But back on topic, what brand of crystal ball do you posess vertex_shader? How can you possibly know ATI will get the crown back? Or even that they'll release the product on time? Or that CF will work properly? Or that nV won't counter immediately if so needed?
 
602, NV has the crown since 5.6.06 - 7950GX2... ;)
And ATi will hold it from 28.1. to 14.2.

Lol, very good point. I would usually discount the 7950GX2 with it being a dual GPU card but that would be a little hypocritical under the circumstances :smile:
 
To be realistic, ATI has been trailing nV ever since the GF6800 appeared.

I don't know about that. NV had the feature advatage but ATI certainly had the speed advantage in NV40 vs R420.

NV didn't retake the speed crown until the 7800GTX and then later ATI re-took it with the X1900XTX (i'm ignoring the minor blip of the X1800XT).
 
what brand of crystal ball do you posess vertex_shader?

I can't talk about unreleased crystal ball sorry :LOL:

How can you possibly know ATI will get the crown back? Or even that they'll release the product on time? Or that CF will work properly? Or that nV won't counter immediately if so needed?

I have good feelings about hd3870x2, of course its won't meat everyone 100% expectations and has weak points in games (not working CF, or not scale good), but overall performance i'm optimistic now.

HD3870X2 coming on jan28, so according to the rumors its have 17 day lead against 9800GX2 what is faster but cost 50$ more (at least in paper).
 
I don't know about that. NV had the feature advatage but ATI certainly had the speed advantage in NV40 vs R420.

Well that is debatable, but as an overall package the GF6800 was capable of more than ATI's competing products and to me that counts more than a few % more speed in like half of the games.

As for R680, I have no idea but vertex, you claimed that above with certain confidence which is only found in people with only the "feeling" and no hard facts ;) Thus I asked.
 
Okay, are we still on the bandwidth subtopic?

Anyways, I've made some cross testing today with my old 2900XT and 3870 laing around here, to illustrate a bit more the puzzlement.
For the sake of apple-to-apple comparison, both GPUs were clocked at 800MHz, while 2900 and 3870 boards were set to 1800MHz and 2700MHz (115GB/s vs. 86GB/s) for the memory, respectively.

-=3DMark'06 Single Texture FillRate=-

HD2900: 8449 MPix;
HD3870: 7635 MPix;


-=FillrateBenchmark v0.92=-

HD2900:
Code:
           FrameBuffer Clear : 11392 FPS
                  Color Fill : 11936,15 M-Pixel/s
                      Z Fill : 22943,68 M-Pixel/s
              Color + Z Fill : 11754,96 M-Pixel/s
              Single Texture : 11825,42 M-Pixel/s
  Single Texture Alpha Blend : 7791,339 M-Pixel/s
               Dual Textures : 6220,992 M-Pixel/s
             Triple Textures : 4190,11 M-Pixel/s
               Quad Textures : 3168,377 M-Pixel/s
    1 Floating Poing Texture : 11830,45 M-Pixel/s
              Render to Self : 9139,809 M-Pixel/s

HD3870:
Code:
           FrameBuffer Clear : 12044,8 FPS
                  Color Fill : 11747,41 M-Pixel/s
                      Z Fill : 22407,65 M-Pixel/s
              Color + Z Fill : 9427,118 M-Pixel/s
              Single Texture : 11578,8 M-Pixel/s
  Single Texture Alpha Blend : 6963,384 M-Pixel/s
               Dual Textures : 6140,461 M-Pixel/s
             Triple Textures : 4142,295 M-Pixel/s
               Quad Textures : 3125,595 M-Pixel/s
    1 Floating Poing Texture : 11573,76 M-Pixel/s
              Render to Self : 9104,365 M-Pixel/s

An additional set of numbers from an 8800GTS-512 board (800-2000/2200 MHz):
Code:
           FrameBuffer Clear : 28019,2 FPS
                  Color Fill : 12205,42 M-Pixel/s
                      Z Fill : 57722,85 M-Pixel/s
              Color + Z Fill : 11369,92 M-Pixel/s
              Single Texture : 12187,81 M-Pixel/s
  Single Texture Alpha Blend : 6163,11 M-Pixel/s
               Dual Textures : 12160,13 M-Pixel/s
             Triple Textures : 12054,43 M-Pixel/s
 
Last edited by a moderator:
I have a HD2900PRO 512bit mem bus and some test would show that the HD3870 could be memory bandwidth limited. I clocked the HD2900pro to core or 845mhz and the memory to 850mhz. In 3Dmark 06 I was getting around 9000. I uped the memory speed to 910 and left the core the same at 845MHZ and the score went up to 9600, a 600 point jump. This would show a memory bandwidth was limiting the core at 845mhz and that was on a 512bit memory bus.

Test some real games and then we'll talk ;)
 
Back
Top