AMD: R7xx Speculation

Status
Not open for further replies.
- What's up with those high clocks (on rv770 at least)? We saw 0 clock increase going from R600 to RV670 (which involved a die shrink), and now suddenly a ~25% clock increase is possible with the same process technology (and a supposedly still similar architecture)?
RV670 in X2 is clocked at 825MHz and it seems that all HD3870s will be moving to 825MHz. The 777MHz clock appears to be the result of the chip "working when it was expected to need another spin" and the new revision seems to clock higher...

OK, so that's only ~11% over R600.

- Why would AMD go back to 128bit memory interface for the low-end part? It doesn't really look faster than the predecessor (even if those 8 texture units are true, it's not going to be much faster (it already had quite a balanced tex/alu ratio).
If RV7xx is 4x Z/MSAA per clock then the extra bandwidth would come in handy I suppose :LOL:

Jawed
 
I suspect the architecture is already capable of unifying memory across multiple chips, but we aren't going to find out for sure for a long time. The patent applications appear to put everything in place...

Jawed

So is the rv770 a completely new architecture then? or is it possible to create the unification of the memory via the design of something else like the pcb or new drivers etc?

Sorry if they are dumb questions.
 
So it's clear -- double texturing per cycle/fragment?!

RV770
Code:
 #0    #1    #2    #3    #4    #5   ||  #0    #1
[ALU] [ALU] [ALU] [ALU] [ALU] [ALU] || [TEX] [TEX]
[ALU] [ALU] [ALU] [ALU] [ALU] [ALU] || [TEX] [TEX]
[ALU] [ALU] [ALU] [ALU] [ALU] [ALU] || [TEX] [TEX]
[ALU] [ALU] [ALU] [ALU] [ALU] [ALU] || [TEX] [TEX]

RV670
Code:
 #0    #1    #2    #3   ||  #0   
[ALU] [ALU] [ALU] [ALU] || [TEX]
[ALU] [ALU] [ALU] [ALU] || [TEX]
[ALU] [ALU] [ALU] [ALU] || [TEX]
[ALU] [ALU] [ALU] [ALU] || [TEX]

Legend:
[ALU] -- a quad of 5-way VLIWs;
[TMU] -- a TMU quad;
 
So is the rv770 a completely new architecture then? or is it possible to create the unification of the memory via the design of something else like the pcb or new drivers etc?
I suspect R7xx is essentially R6xx with tweaks - a bit like RV670 has tweaks to enable D3D10.1.

What I can gather is that it'll be a die that's physically optimised for at least a paired multi-GPU configuration - though I still have nagging suspicions over this (to do with the fact that a graphics card presents itself as a single PCI Express port). Additionally, the hint from that nearly 1 year old Q&A session is that it'll have a more efficient implementation in terms of the libraries used at the chip level - which is where the relatively large (for ATI) jump in clocks comes from. Whether that'll also benefit die size is hard to know, but that rumour implies power will also benefit.

Jawed
 
Jawed, Thanks for summarising. Im definately looking forward to seeing the performance of this thing. We might finaly be getting Crysis in high detail/res around the 30fps mark by mid year.

I just hope NV pull an absolute monster out the bag like they did with the G80.
 
the hint from that nearly 1 year old Q&A session is that it'll have a more efficient implementation in terms of the libraries used at the chip level
ZOMG Fast14 :!:


Heh, sorry guys, someone had to do it :oops:
 
So is the rv770 a completely new architecture then? or is it possible to create the unification of the memory via the design of something else like the pcb or new drivers etc?

Sorry if they are dumb questions.

RV770 480 streams (96 pipelines) 1050MHz-core 32TMU's
RV670 320 streams (64 pipelines) 825Mhz-core 16TMU's

Technically single RV770 should be as fast as (2X RV670) Radeon3870X2 1GB.

Question? 2200MHz GDDR5 will be used for RV770. Is it 1100x2=2200MHz or 2200X2=4400MHz GDDR5 on 256bit bus?
 
RV770 480 streams (96 pipelines) 1050MHz-core 32TMU's
RV670 320 streams (64 pipelines) 825Mhz-core 16TMU's

Technically single RV770 should be as fast as (2X RV670) Radeon3870X2 1GB.

Given that Crossfire is a bit hit and miss (as is SLI), I would expect its average performance to be quite a bit higher than the X2. More like its average performance is equal to or better than the X2's best performance (in comparison to a single 3870.

RV770 sounds like a very nice GPU, definatly capably of crushing NV on paper but we all know that can turn out. I'm hoping for something thats comfortably faster than the 9800GTX but I expect that whatever NV launches as the high end after that will crush it (although I hope it doesn't for the sake of competition).
 
I also wonder that how in the heck ATI has RV770 clocked @ 1050MHz-GPU, then it has to be 45nm tech - shrink from 55nm.
 
I also wonder that how in the heck ATI has RV770 clocked @ 1050MHz-GPU, then it has to be 45nm tech - shrink from 55nm.

That would go a ways towards explaining that chiphell shot claiming to be a rv770 that looks approx the size of of rv670, if it is indeed 45nm. Maybe it is relatively the same size, but just pumped with more juice to achieve a higher clock hence the higher TDP??
___________________________________________________________________


At any rate, so broken down we're looking at an x2 product with >2TF.

My question is how will this stack up to a nvidia 384 shader product with 2000mhz shaders, equaling roughly 2.3TF. Does the MUL reach a point where it has diminishing returns for it's capabilities, therefore bringing the actual used computational power somewhere between the 2x MADD number (1.536TF) and the full specced 2.3TF lower in actual utilization percentage than what we see in say G92? Is the color interpolation etc a fairly static usage of the MUL, or will it increase with the usage of the shaders? I know this is probably a question for the GT200 thread, but I wonder if the 2+1 approach could bite them in the ass compared to ATi's full-on MADD approach.

Also, since PhysX will be accessible through CUDA, could the non-used functions by the MUL (since it's not used in general shading) be doing these computations therefore allowing for potentially free physX performance?

_________________________________________________________________

At any rate, pretty happy to see these specs. It's more or less what I personally expecting TDP/Spec-wise, although I was thinking it would keep directly in line with RV670 and have 24 RBE/ROPs and 24 TMUs. While the increase in TMUs (however it comes to be, TF or otherwise) is welcome and long overdue, I wonder how 4 quads of RBE/ROPs is still going to work going on it's umpteenth implementation.

Also, while some may think these parts may sound tame in comparison to what Nvidia is bringing to the table, and may question if the X2 crossfire-esque implementation will be competitive with Nvidia's single huge chip, remember this speculatory thought: We'll probably see <~$200-250 priced RV770's, similar to 37xx is, similar in price to G92 but by looking at the specs, trumping it. Nvidia will probably have the more powerful chip on the high-end, but I don't see how they can cut it down to be priced/performance comparable to what we'll see out of RV770, considering how much that die will cost to fab and how many shader/texture/etc units they would need to salvage from a cheaper part to make it feasible. We'll probably see a GTS (384-bit,288 shaders or something?) priced smack dab in between the rv770 and the GT200, maybe competing with the X2, but again, I don't see how it could feasibly compete with it. I question if that part will be worth the ching over Rv770, or if it won't be beaten handily by an x2. To me, it looks like ATi could truly have a great performance category chip on their hands with no competition other than the 8800gt and could very-well replace it in the eyes of the consumer...and I know I'm not alone when I say that's the market I look at; just something that will keep crysis (and such) for example a decent amount over 30fps at 1920x1200 with half-decent settings. If that's true, and if it is as moderately as successful as the 8800gt, this could be great news for ATi. One would think the possibility is even more-so because of the ability for cards to run crossfire on intel chipsets like the upcoming p45, unlike the 8800gt.

Not trying to rant, but I think this could be exactly what AMD is shooting for, and they could very well hit the mark...for the first time in quite a while.
 
Last edited by a moderator:
My question is how will this stack up to a nvidia 384 shader product with 2000mhz shaders, equaling roughly 2.3TF.
1TFLOP is the rumour.

Does the MUL reach a point where it has diminishing returns for it's capabilities, therefore bringing the actual used computational power somewhere between the 2x MADD number (1.536TF) and the full specced 2.3TF lower in actual utilization percentage than what we see in say G92? Is the color interpolation etc a fairly static usage of the MUL, or will it increase with the usage of the shaders? I know this is probably a question for the GT200 thread, but I wonder if the 2+1 approach could bite them in the ass compared to ATi's full-on MADD approach.
Arguably the number of attributes per pixel shader instruction should diminish - i.e. pixel shaders will get longer and longer but the number of attributes that need to be interpolated will increase relatively slowly, so the number of instruction slots consumed by attribute interpolation will tend to decline. Well, I'm guessing. So, that'd leave more instruction slots for this ALU to perform SIN/RSQRT etc.

At any rate, pretty happy to see these specs. It's more or less what I personally expecting TDP/Spec-wise, although I was thinking it would keep directly in line with RV670 and have 24 RBE/ROPs and 24 TMUs. While the increase in TMUs (however it comes to be, TF or otherwise) is welcome and long overdue, I wonder how 4 quads of RBE/ROPs is still going to work going on it's umpteenth implementation.
Simply doubling the Z per clock will do the job, I reckon. No need for more colour fillrate.

Not trying to rant, but I think this could be exactly what AMD is shooting for, and they could very well hit the mark...for the first time in quite a while.
The way I see it, this coming generation (May/June) R780 (2xRV770) will compete directly against GT200Ultra.

The real question then is whether R780 is architected to compete more effectively than R680 is doing in competing with 8800Ultra.

Jawed
 
The way I see it, this coming generation (May/June) R780 (2xRV770) will compete directly against GT200Ultra.
Well that's kind of a lost competition already isn't it?
RV670x2 / 640 SPs @ 55nm is competing against G80U / 128 SPs @ 90nm.
Let's say that we have RV770x2 / 960 SPs @ 55nm competing with G100 / 384 SPs @ 55nm.
Now that's a 50% increase in SPs number for RV770x2 and 200% increase for G100.
I think it's pretty clear who's going to be the winner in such competition.
I'd even say that NV will need 'only' 192 SPs @ 55nm to compete with such RV770x2 card (higher clocks on the same 55nm process will compensate perfomance lag that G80U has right now compared to RV670x2).
(That's assuming no or little architectural changes between 6x0/7x0 and 8x/9x/10x of course.)

I'm not so impressed with these specs.
If we're still talking about 6x0 architecture then these specs are pretty weak from my point of view.
It's like we're 2 years in DX10 era and all we have is some miserable 50% perfomance improvement over the first generation of DX10 chips? Completely unimpressive if true. And can backfire badly if G100 is what it's rumoured to be.
 
Last edited by a moderator:
So far it looks promising.

RV770 core
45nm (maybe)
480 streams (96 pipelines)
32 TMU's
16 ROP's
1050MHz-GPU core
Fixed AA problem
GDDR5 256bit (maybe) 2200MHzx2 = 4400MHz ~141GB bandwidth.

If single RV770 beats GF9800GTX we may have another R300.
 
Does anyone think that there might be a possibility that Ati's next generation of products will have seperately clocked shaders to the core ? If not, do any of you think that Ati can compete agaisnt Nv without it ?
 
Does anyone think that there might be a possibility that Ati's next generation of products will have seperately clocked shaders to the core ? If not, do any of you think that Ati can compete agaisnt Nv without it ?

Most likely not; due to that RV770 is just tweak of RV670. "It resembles R3xx to R4xx transition"

As for competing with Nvidia; high clock-GPU speed should help ATI some.
 
I don't know why you base your argument on SPs since that's the single area of ATI's architecture where there's no problem whatsoever.
Judging by the DX10 applications benchmarks i'd say that you're wrong here.
There is a problem of utilization of resources which makes 6x0's 320 SPs slower then 8x's 128 SPs most of the time in the real world applications even with the clocks correction.
SPs is the area of a chip which interests me the most because this is the place where we still don't have nearly enough power for the DX10 applications.
Though it is a good point that RV770's improvements in other areas could bring more than 50% perfomance improvement overall -- in the general sence of all applications combined. But do you really want more perfomance in something like FEAR or you'd better get more perfomance in something like Crysis? If it's the latter then imho SPs power is more important than anything else, and 50% increase in that power even coupled with (doubtful btw) 1050 MHz core clocks isn't that impressive -- you still won't get playable framerates on single RV770 in Very High settings.

If single RV770 beats GF9800GTX we may have another R300.
It certainly won't be another R300 b/c it's already more of R420 really.
And i don't think that RV770 will compete with G92... Although if we're talking about the upper middle end then NV's decision to cancel real G9x top-end card can really bite 'em in the ass down the road -- they'll have G100 at the top-end but between that and G92 they'll have... nothing? And that's an area where RV770 can really shine, yeah.
 
Status
Not open for further replies.
Back
Top