ATI to delay 80nm GPU migration?

Kaotik said:
RV570 isn't really "cutdown" R580 with those specs, since R580 is 16-1-3-1, that thing has double the z units


I certainly hope that stays true, ..

From the very launch of the 1x00 series 3 things have always stuck in my mind as I thought "My ultimate design video card would have these ________ features,."

namely:

1: 512bit ring bus memory architecture
2: 3:1 Rv530 shader ratio
and lastly
3: Rv515 double Z

from the very outset one could see that the rv560 (1600)'s 3:1 ratio was the future and it didnt take long to theorize (rightly so) that the 580 would basically be 4 RV530 quads each with 3:1 ratio of shader processors. So that gets us to where we are today .. however that leaves the doiuble Z offered from the Rv515 seemingly abandoned. IMHO the 515 is great in it's own right (small package, low power, Dbl-z, 8*32 memory possible).

Correct me if I am wrong however iirc, the Rv530 (1300) did exceptionally well when 2x FSAA is applied. I believe it was said by some to be "free fsaa",.

so add this "free fsaa" via the dbl-z (correct?) with the 3:1 ratio and ring bus memory controller and it would seem that you have a very forward thinking product that uses tech that is already in place.

Again purely speculative on my part however I really HOPE to see a higher performance part (250-350 USD- 256bit, 12-16 pipe GDDR3) in a X-X-3-2 config.


then again Im still waiting for dongleless 1800 Xfire and x800 series cards.
 
Couple corrections:

1. It's not so much that the ring bus is 512b (as, AFAIK, all 256b DDR external bus cards have two 256b SDR internal paths) as that it seems to offer an instruction window to reorder memory access requests.

2. RV530 = 1600, and it has double-Z (4-1-3-2) and the ring bus (albeit just 256-bit, as it's got just a 128-bit external bus). It can't have 8*32b memory crossbar as it's only got 128 external bits (4*32b) to work with.

3. RV515 = 1300, and I don't think it's got dbl z (4-1-1-1). I know it doesn't have the ring bus. So, it's basically a 9600 but with with an SM3 featureset.

4. I don't think double-Z/-stencil helps with AA, just with stencil shadows and the like. I may be wrong about AA, though, as I don't fully understand MSAA.

Dunno about "free" 2xAA with either RV515 or RV530. I'm hoping with you that RV570 will be 12-1-3-2 and so kick ass at $300. A $200 8-1-3-2 RV560 sounds pretty good, too. What'll that do to RV530 and RV515, though? Will we lose the XT from the X1600 series to make room for X1700 Pros and the like, or are we going to see 256MB X1300s at $79, X1600s range from $99 to $179, X1700s range from $199 to $279, and X1900GTOs stay at $299?

As for dongleless X1800 and X800 Xfire, don't hold your breath too long. In the comments section of FiringSquad's Oblivion benchmarks, Brandon mentioned he asked ATI for this, and they said--and I quote--"no." Maybe they're just throwing him off the scent, though. :)
 
Last edited by a moderator:
maybe the RV570 is first intended for laptops and OEM, like NV41 and NV42.
I thought NV41 would bring the 256bit bus to the "high midrange" segment, then I thought it could be the NV42, but I was wrong. except for the recent PCIe 6800GS, but it's NV clearing the stocks
 
Pete said:
Couple corrections:

1. It's not so much that the ring bus is 512b (as, AFAIK, all 256b DDR external bus cards have two 256b SDR internal paths) as that it seems to offer an instruction window to reorder memory access requests.

2. RV530 = 1600, and it has double-Z (4-1-3-2) and the ring bus (albeit just 256-bit, as it's got just a 128-bit external bus). It can't have 8*32b memory crossbar as it's only got 128 external bits (4*32b) to work with.

3. RV515 = 1300, and I don't think it's got dbl z (4-1-1-1). I know it doesn't have the ring bus. So, it's basically a 9600 but with with an SM3 featureset.

4. I don't think double-Z/-stencil helps with AA, just with stencil shadows and the like. I may be wrong about AA, though, as I don't fully understand MSAA.

Dunno about "free" 2xAA with either RV515 or RV530. I'm hoping with you that RV570 will be 12-1-3-2 and so kick ass at $300. A $200 8-1-3-2 RV560 sounds pretty good, too. What'll that do to RV530 and RV515, though? Will we lose the XT from the X1600 series to make room for X1700 Pros and the like, or are we going to see 256MB X1300s at $79, X1600s range from $99 to $179, X1700s range from $199 to $279, and X1900GTOs stay at $299?

As for dongleless X1800 and X800 Xfire, don't hold your breath too long. In the comments section of FiringSquad's Oblivion benchmarks, Brandon mentioned he asked ATI for this, and they said--and I quote--"no." Maybe they're just throwing him off the scent, though. :)

thanks much for the clarification .. R--Rv--515, 580, 520, 530, 526, 570.. 1300, 1600, 1800, 1900.. Pro, XT, XTX, XL, GTO ... too many codenames and acronyms to remeber them all, nevermind keeping them straight .. lol

edit: ok, this appears where I got confused: http://www.beyond3d.com/reviews/ati/rv5xx/index.php?p=07

ATI's Radeon series since R300 have doubled their Z sample rate when multi-sample anti-aliasing (MSAA) is enabled, such that 2x MSAA is achievable in a single clock cycle, 4x in two and 6x in three. With the X1300 performance we can see this is still the case as the Z fillrate is very similar with 2x FSAA or without. X1600 already features a double Z rate and we can see that this is actually still maintained with 2x FSAA enabled, effectively quadrupling the Z fillrate with MSAA.

I guess what I desire to know is the relation (if any) between Z sampling and MSAA, in particular the effect of dbl-Z and its effect on msaa rates.
 
Last edited by a moderator:
geo said:
Well, the impetuousness of youth has its advantages too. They make us old farts more patient so we don't just throttle you. Sometimes works, others not. :LOL: But then we're also usually fatter and slower too, so we can't chase you down anyway. ;)

:LOL:
 
Jawed said:
Out making pix :D

I'm highly disappointed by this rumour.

Jawed

And yet, while I'm not willing to buy off on this unreservedly yet, that xbit report very much had the flavor of reading it right off ATI roadmap docs. And we know those (from both IHVs) don't tend to understate such things.
 
Pete said:
It'd be interesting if ATI is going for more redundancy per die to assure yields of full chips but require more ASICs to cover all market segments, whereas NV stays with fewer ASICs and bins partially defective chips to lower segments.
Yep, that's the way I read it. ATI has, in my opinion, a yield model where "every die" turns out fully functional with only clock-binning required - which requires multiple designs to cover every price point.

Expensive to design all these dies. Risky to have so many dies competing with each other for space in the fabs' production lines. Complicated to maintain inventory (though hardly rocket science). All that hassle for good yields? Dare I say it, but if it's worth doing for good yields, then it could well mean that ATI's getting really good yields. Otherwise, well...

Jawed
 
[armchair quarterback mode] I would have to agree that there are going to be too many ASICs out there come summer's end. RV515 and R520 would seem like the expendable ones to me.

The OEMs are increasingly going for the integrated stuff in their systems, and the 4-pipe RV410 core of RS600/690 should satisfy the post-Xpress 200 demand. RV515 as a discrete low-end core could continue to feed the notebook market after migration to 80nm. By the time RV560/570 arrive, RV530 should have also shrunk to 80nm and could get a second life as a great sub-$100 low-end desktop offering. A few RV560 configurations for the mainstream and RV570 takes over the $250-$300 performance segment. Phase out R520 and drop R580 SKUs into $350-450 segment. High-end R580+ refresh for the enthusiast. [/armchair quarterback mode]. :smile:
 
FrameBuffer said:
thanks much for the clarification .. R--Rv--515, 580, 520, 530, 526, 570.. 1300, 1600, 1800, 1900.. Pro, XT, XTX, XL, GTO ... too many codenames and acronyms to remeber them all, nevermind keeping them straight .. lol
I thought I had this stuff down, but I had to correct myself on the x-x-x-x designations. Friggin' complexity. This is why ppl like saying, "I get 5k 3DMarks." :)

I guess what I desire to know is the relation (if any) between Z sampling and MSAA, in particular the effect of dbl-Z and its effect on msaa rates.
OK, don't quote me on this (seriously, no quote functions--burn this reply when you read it! ;)), but here's my guess based on re-remembering that MSAA uses the same color value but rejiggers the geometry samples. MSAA uses x times the geometry samples, thus requires x times the Z computations per pass. Both NV's and ATI's current MSAA HW implementations allow for up to 2x MSAA per clock, so higher levels (like 4x MSAA for both IHVs or 6x for ATI) require two or three cycles, respectively, to aggregate enough MSAA samples). Doubling the geometry samples apparently implies doubling the Z samples., so 2x MSAA per clock would seem to mean 2x Z per clock. NV, since the FX series (well, technically since NV2A?) could do double Z without as well as with MSAA. RV530, aka X1600, is the first ATI part (dunno about Xenos) to allow double Z without MSAA. But, according to Dave's fillrate testing, that double Z capability seems to carry over to MSAA rather than be superceded by it, in essence doubling the inherently double-Z 2x MSAA per clock. So X1600 seems to be the first part to yield 4x Z samples per 2x MSAA pass.

If any of that makes sense to you, then maybe you're thinking this would translate to super D3 performance. I checked the AF/AA % hit breakdown I did for Rage3D's 6800GS review and indeed X1600 takes a roughly 20% performance hit from 4xAA while all the other cards (6800s anf X800s) take a 40% hit. That's a very naive interpretation, tho, as I'm not sure if that's due to X1600's apparently AA-independent dbl-Z, its "ring bus" memory controller, or some other factor (like fillrate:bandwidth ratios). I suppose checking X1800 or X1900 Doom 3 AA perf hits from another R3D review (as they're relatively unique in benching AA separately from AF) may help. If they take a 40% hit like the other architectures, then it's likely X1600's dbl-Z that's responsible. If, on the other hand, they drop just 20%, then I'll be revealed as the clueless nincompoop I am.
 
Originally posted by Pete The Wise: But, according to Dave's fillrate testing, that double Z capability seems to carry over to MSAA rather than be superceded by it, in essence doubling the inherently double-Z 2x MSAA per clock. So X1600 seems to be the first part to yield 4x Z samples per 2x MSAA pass.

So in essence does this mean then that the Rv530 (x1600) delivers fsaa 4x at the performance hit of what would normally be 2x fsaa or is it a quality issue where 2X FSAA yeilds higher (4X) AA ?? By the sound of it , Dbl-Z is a performance issue rather than a delivering higher quality. Sorry if I mis-interpreted your post.

Looks like I need to do some homework as well..
 
FrameBuffer said:
Originally posted by Pete The Wise: But, according to Dave's fillrate testing, that double Z capability seems to carry over to MSAA rather than be superceded by it, in essence doubling the inherently double-Z 2x MSAA per clock. So X1600 seems to be the first part to yield 4x Z samples per 2x MSAA pass.

So in essence does this mean then that the Rv530 (x1600) delivers fsaa 4x at the performance hit of what would normally be 2x fsaa or is it a quality issue where 2X FSAA yeilds higher (4X) AA ?? By the sound of it , Dbl-Z is a performance issue rather than a delivering higher quality. Sorry if I mis-interpreted your post.

Looks like I need to do some homework as well..

The way I understand it, X1600 takes less hit from 2xAA if it would be Z limited on X1800 for example
 
Pete said:
*snip* NV, since the FX series (well, technically since NV2A?) could do double Z without as well as with MSAA. RV530, aka X1600, is the first ATI part (dunno about Xenos) to allow double Z without MSAA. But, according to Dave's fillrate testing, that double Z capability seems to carry over to MSAA rather than be superceded by it, in essence doubling the inherently double-Z 2x MSAA per clock. So X1600 seems to be the first part to yield 4x Z samples per 2x MSAA pass.

Not sure if NV2A does it, NV2x does not.
R300+ does double-z when doing MSAA, but not without. NV3x+ does double-z with or without MSAA, but not double-double-z when doing MSAA. So, really, NV3x can utilise it's extra Z (stencil) hardware when MSAA is not enabled.

:)
 
FrameBuffer said:
Originally posted by Pete The Wise
Way to set me up for a fall! :LOL: Really, it's more like Pete the Sponge.

So in essence does this mean then that the Rv530 (x1600) delivers fsaa 4x at the performance hit of what would normally be 2x fsaa or is it a quality issue where 2X FSAA yeilds higher (4X) AA ?? By the sound of it , Dbl-Z is a performance issue rather than a delivering higher quality. Sorry if I mis-interpreted your post.
Don't be sorry, be suspicious! :) Remember, I'm not sure what I told you was entirely correct. I'm just letting you in on my thought process. I do so with good intentions more than good understanding.

Edit: Be suspicious of the entire paragraph below. See Xmas', 3dcgi's, and Dave's corrections on the next page.

I doubt RV530 gives us 4xAA at 2xAA speeds. Remember, 4x MSAA by definition requires an extra clock cycle than 2x, another pass through the ROPs (or wherever it is those two samples per clock are hiding), so you're losing some performance right there. I'm sure bandwidth factors in, too. Oh, and to be clear, we're talking about MSAA, not FSAA. MSAA doesn't touch (or at least alter) every pixel in the scene, so I'm not sure I'd call it "full scene/screen" (like straightforward SSAA or the V5's AA).

Thanks, Kaotik. That makes sense. It's a question of what's the limiting factor, and RV530's double Z can shift the bottleneck to another part of the GPU.

Well, sure, Andrew, if you want to put it in readable form. Heh. I recall Deano saying that NV2A had double Z, so I thought I'd throw it out there. (I wonder if all MSAA implementations have double Z, starting with GF3/NV20, in which case NV2A would have it by default.)
 
Last edited by a moderator:
Apart from stencil fillrate limiting situations 2xMSAA is in a relative sense "for free" on GPUs for years now.

The only other design I'm aware of that is capable of single cycle 4xMSAA is Falanx' Mali for the PDA/mobile market.
 
http://www.beyond3d.com/forum/showpost.php?p=663192&postcount=14

Based on fillrate tests:

X1600XT

b3d41.gif


X1800XL

b3d42.gif


7800GTX

b3d43.gif


I'm in too much of a rush to gather together the X1900XTX, 7900GTX results - I'm sure someone can manage.

Also, we should have Colour+Z measured fillrate results for the new GPUs from B3D's reviews.

Jawed
 
Pete said:
I doubt RV530 gives us 4xAA at 2xAA speeds. Remember, 4x MSAA by definition requires an extra clock cycle than 2x, another pass through the ROPs (or wherever it is those two samples per clock are hiding), so you're losing some performance right there.
Why would it need an extra clock cycle "by definition"?
 
3dcgi said:
Correct, it is a performance optimization.

Well it occurs to me that RV560/RV570 is a 4xMSAA Capable GPU with dinky performance hit . Damn , I'm eagerly awaiting All these three SKUs.
 
Back
Top