AMD confirms R680 is two chips on one board

pjbliverpool:

- R6xx uses 64:1 Z-compression for MSAA 4x. Xenos doesn't

- R6xx ROPs are capable of 2 MSAA samples per clock and doesn't perform hardware resolve, so much lower bandwidth per clock is required for this architecture - ROPs waits for resolve being performed by shader core, less MSAA samples are being generated per clock, higher compression is applied, so less data transfers are spreaded for longer periods of time. That's why R6xx is very bandwidth independant compared to Xenos.

Just compare performance of these cards:
HD2600XT GDDR3 vs. HD2600XT GDDR4: 57% more bandwidth, 4-7% higher performance
HD2900XT GDDR3 vs. HD2900XT GDDR4: 21% more bandwidth, 0-5% higher performance
HD2900XT vs. HD3870: 32% less bandwidth, 0-12% higher performance

Thanks no-X, thats exactly the kind of answer I was driving for. There are in fact technical differences (which I wasn't aware of - aside from the compression differences) that make Xenos need more bandwidth to produce the same visual result.
 
Or you're comparing apples to oranges.



When you stay entirely on-chip, the bandwidth to internal RAMs is not a scarce good. Above a certain size, the area overhead of splitting 1 RAM into 2 is minimal.
So once you've decided to go this route, you can design the rest of the architecture while recklessly spending on bandwidth without worrying about efficiency. A direct consequence is that standards methods of comparison become pretty much meaningless, yet you're still pretending that apples are oranges. They are not.

You only need to read the Xenos article to see that the real bottleneck in the architecture is not in the on-chip bandwidth, but in the interface between the 2 dies.

Let's compare the two architecture GPU architecture for typical cases.

In one case, you're rendering completely covered 2x2 pixel tiles. In a traditional GPU, this results in compression. In Xenos, the data travels compressed to the ROP die and undergoes an 8-fold(!) bandwidth expansion. Sure, you see impressive eDRAM bandwidth usage, but with no additional benefit over a traditional GPU.

In the other case, when no compression is possible, the eDRAM uses only 1/8 of the theoretical maximum, because now you're limited by the link bandwidth.

So if you want to compare bandwidth numbers, the meaningful numbers are the bandwidth of a regular GPU memory interfaces against (the memory bandwidth of the system memory + the bandwidth of the inter-die link + the read bandwidth for RMW operations - the bandwidth to copy from eDRAM to system RAM). In this equation, there's no contest: an RV670 will probably beat a Xenos by a factor of two, if not more.

The real advantage of the eDRAM arrangement lies not in the bandwidth but in the complexity and area reduction that it allows: no compression logic, much more coherent data streams to the system memory (-> significantly smaller and simpler MC), and drastically lower latency for the eDRAM (-> large area savings due to smaller latency FIFO's).

Xenos has a really neat architecture, but it's just too naive to take one number out of context and build a whole argument around it without looking at the whole picture.

Thanks silent_guy, thats another great post which clearly explains the reasons why Xenos does need more bandwidth than RV670 to produce the same visual result despite being a weaker chip overall.

For the record though, I wasn't trying to compare apples to oranges. My original aim (which I stated several times) was to identify IF there were indeed differences between the two chips which make the bandwidth requirements different. After much wrangling I think you, along with a couple of others have now answered my original question.

Interestingly though, this new found information about Xenos' unusually high bandwidth requirements does have implications for comparisons with other GPU's were the overall power is much closer, and eDRAM is often touted as the deciding factor, but thats a discussion for another thread.
 
What a lot of us have been trying to say, and you've been refusing to see is this:the comparison is flawed, period. Why, you ask?Because it's ill constructer. Of course scenarios can be built where you gobble up 500GB/s of bandwidth...scenarios that could run on let's say an X1900XTX. Just like you can build a fragment program that makes the 8800 Ultra cry, literally. Do those two things mean that:

a)If something can be made to require 500GB/s on an X1900XT, all newer chips, due to them being more advanced are automatically always bandwidth limited, or are those more likely to be corner/extreme cases, and as such will be coded around.
b)If such a shader does exist, is the Ultra ridiculously underpowered in terms of math abilities and thus needs to be 1024 SPs at 5GHz, or is that simply something that'll exist only in theory/will be coded around IRL.

That was the basic point:the RV670 is adequated to what'll happen in 99% of scenarios, not for a hypothetical 1%(percentages aren't necessarily accurate and I've used them merely to illustrate a point). That's the way in life, bitch that she is:you make tradeoffs, in this case the extra BW wouldn't have justified the increase in R&D costs/PCB complexity due to it not alleviating a major bottleneck in that architecture.

There have been a number of reasons given in this thread, and in the linked article, that explain why Xenos was engineered as it was, and if you sum them up you'll see that it wasn't due to the incredible demand for bandwidth that it ended up as it did. And, does anybody know if devs are actually using the EDRAM as it was meant/touted to be used initially(meaning tiling in order to enable 4x AA at 720p)?Maybe in that scenario you'd end-up hitting the theoretical 256gb limit(which, btw, is limited by inter-die communication rates as well, and BW to main memory), but when porting it to PC you're probably going to end up shaving quite a bit of that hit due to the BW saving techniques present in desktop GPUs and absent in Xenos.
 
What a lot of us have been trying to say, and you've been refusing to see is this:the comparison is flawed, period. Why, you ask?Because it's ill constructer.

Your still assuming I was trying to argue that RV670 is bandwidth limited. I wasn't. If anything I was using the fact that RV670 isn't bandwidth limited to make an observation about the usefulness of all that bandwidth in Xenos.

Its now been well demonstrated that Xenos does need all that bandwidth but isn't able to use it for better effect in rendering due to other limitations of its architecture. This is exacly the question I wanted answered when I started this and now it has been, so as far as i'm concerner the argument was successful even if I did have to go round the houses to get to the answer.
 
pjbliverpool said:
Wow, way to miss the point completely there ninelven.

The point is that RV670 MUST be bandwidth limited in some situations if even weaker GPU's can make use of more.
Oh I didn't miss anything.. every GPU ever made MUST be bandwidth limited in some situations. For that matter, every GPU ever made was also CPU bound, fillrate bound, shader bound, etc... in some situation.
 
Oh I didn't miss anything.. every GPU ever made MUST be bandwidth limited in some situations. For that matter, every GPU ever made was also CPU bound, fillrate bound, shader bound, etc... in some situation.

Yes, but the point was whether those situations so rare that eDRAM in effectively useless in Xenos in 99% of cases.

The answer is clearly no because architectural peculiarities mean it requires more bandwidth to produce the same result.
 
Or you're comparing apples to oranges.

When you stay entirely on-chip, the bandwidth to internal RAMs is not a scarce good. Above a certain size, the area overhead of splitting 1 RAM into 2 is minimal.
So once you've decided to go this route, you can design the rest of the architecture while recklessly spending on bandwidth without worrying about efficiency. A direct consequence is that standards methods of comparison become pretty much meaningless, yet you're still pretending that apples are oranges. They are not.

You only need to read the Xenos article to see that the real bottleneck in the architecture is not in the on-chip bandwidth, but in the interface between the 2 dies.

Let's compare the two architecture GPU architecture for typical cases.

In one case, you're rendering completely covered 2x2 pixel tiles. In a traditional GPU, this results in compression. In Xenos, the data travels compressed to the ROP die and undergoes an 8-fold(!) bandwidth expansion. Sure, you see impressive eDRAM bandwidth usage, but with no additional benefit over a traditional GPU.

In the other case, when no compression is possible, the eDRAM uses only 1/8 of the theoretical maximum, because now you're limited by the link bandwidth.

So if you want to compare bandwidth numbers, the meaningful numbers are the bandwidth of a regular GPU memory interfaces against (the memory bandwidth of the system memory + the bandwidth of the inter-die link + the read bandwidth for RMW operations - the bandwidth to copy from eDRAM to system RAM). In this equation, there's no contest: an RV670 will probably beat a Xenos by a factor of two, if not more.

The real advantage of the eDRAM arrangement lies not in the bandwidth but in the complexity and area reduction that it allows: no compression logic, much more coherent data streams to the system memory (-> significantly smaller and simpler MC), and drastically lower latency for the eDRAM (-> large area savings due to smaller latency FIFO's).

Xenos has a really neat architecture, but it's just too naive to take one number out of context and build a whole argument around it without looking at the whole picture.

I don't want to unnecessarily prolong the OT discussion, but I would like to reiterate one of silent_guy's points. Once the decision was made that edram was necessary to get enough bandwidth it wasn't much more costly to ensure that bandwidth would never bottleneck the ROPs.

Also, I wouldn't say the bottleneck is the bandwidth between dies as it's designed to match the number of ROPs. So if there is a bottleneck there it's the combination of the two, not just the link bandwidth.
 
pjbliverpool said:
Yes, but the point was whether those situations so rare that eDRAM in effectively useless in Xenos in 99% of cases.
Did you really expect the engineers would dedicate 1/3 of the transistor budget to something that was useless in 99% of cases? I mean I know everyone has been down on ATI/AMD lately but damn...

[strike]Finally, WTF does all this have to do with R680?[/strike]n/m
 
I think pjbliverpool should just go ahead and create a "PC Defense Force" (if it doesn't already exist). This seems like nothing more than another chapter in his quest to "prove" the console gpus are crap, which is a prevalent "theme" in most of his posts.
 
I think there was an excellent thread somewhere, where some developers discussed coding for the different platforms. They code for their machine and the strengths and weaknesses of that machine.
 
Did you really expect the engineers would dedicate 1/3 of the transistor budget to something that was useless in 99% of cases? I mean I know everyone has been down on ATI/AMD lately but damn...

Of course not. Thats why I asked for a proper explanation of the reasons which drive the different bandwidth requirements in my first post.

Unfortunatly some people still don't seem to have grasped exactly what it was that I was asking.
 
I think pjbliverpool should just go ahead and create a "PC Defense Force" (if it doesn't already exist). This seems like nothing more than another chapter in his quest to "prove" the console gpus are crap, which is a prevalent "theme" in most of his posts.

Thanks for your useful contribution there NRP. Typically insightful as always.

Its actually quite interesting how you seem to have drawn a conclusion from this discussion which points to "console GPU's being crap". Care to share how you arrived at that little nugget of wisdom?
 
pjbliverpool said:
Thats why I asked for a proper explanation of the reasons which drive the different bandwidth requirements in my first post.
Yet when people politely explained the situation to you, you ignored what was presented and continued along the line of "either Xenos didn't benefit greatly from having all that extra bandwidth or that RV670 was bandwidth limited on at least a partially regular basis," which would be wrong.

Furthermore, this very topic has been discussed, at length, on this very forum. You could have bothered to read the Xenos article, searched the console forum, taken this topic to pms, or made your post as a new topic in the console forum or even this forum.

Instead, we have 4+ pages reaching the stunning conclusion that developers make the most of the hardware that is available to them.
 
I have an even better idea:

Any more news on availability of the 3870X2 parts? Any more rumors on price, performance, and the possibility of that "new spin" of the RV670 core?
 
When does the NDA lift? (Quick search didn't yield anything)

Last I heard was 23rd for the HD3400 and HD3600. And 28th for the HD3870X2.

Edit: Don't look suprised if the HD3870X2 launches sooner than the 28th.
 
Last edited by a moderator:
Back
Top