AMD: R9xx Speculation

Do you have a source for measured Z-rate between GF100 and Cypress? Don't think I've ever seen that covered in a review.

Edit: there's this but it's nowhere close to 2.5x.

The best i can offer is +97% for GTX480 compared to 5870. But that's z-only in 2560x1600, so probably not achievable in games.
 
Let me add some prices to my previous assumptions:

HD 67xx series
HD6750: Turks XT= 16ROPs, 40TMUs, 160 4D-Shaders, 128Bit memory bus (@900Mhz ~ 1.152GFLOPs; real-world gaming perf. in between HD5770/HD5830; TDP 110W) / Oct. 2010 ~ $179
HD6770
: Barts Pro = 32ROPs, 72TMUs, 280 4D-Shaders, 256Bit memory bus (@700Mhz ~1.568GFLOPs; real-world gaming perf. in between HD5830/50, TDP 135W) / Nov. 2010 ~ $229

HD 68xx series
HD6850: Barts XT = 32ROPs, 80TMUs, 320 4D-Shaders, 256Bit memory bus (@ 850Mhz ~ 2.176GFLOPs; real-world gaming perf. slightly below HD5870, TDP 175W) / Oct. 2010 ~ $299
HD6870:
Cayman Pro = [strike]48[/strike]32ROPs, 112TMUs, 440 4D-Shaders, 256Bit memory bus (@ 725Mhz ~ 2.552GFOPs; real-world gaming perf. HD5870+20%, TDP 210W) / Nov. 2010 ~ $399

HD 69xx series
HD 6950: Cayman XT = [strike]48[/strike]32ROPs, 120TMUs, 480 4D-Shaders, 256Bit memory bus (@ 850Mhz ~ 3.264GFLOPs; real-world gaming perf. HD5970, TDP 260W) / Oct. 2010 ~ $499
HD 6970:
2x Barts XT =64ROPs, 160TMUs, 640 4D-Shaders, 2x, 256Bit memory bus (@825Mhz ~ 4.224GFLOPs; real-world gaming perf. HD5970+20%; TDP 280W) / Nov. 2010 ~ $599
HD 6990
: 2x Cayman XT = [strike]96[/strike]64ROPs, 240TMUs, 960 4D-Shaders, 2x, 256Bit memory bus (@675Mhz ~ 5.184GFLOPs; real-world gaming perf. HD5970+40%; TDP 295W) / Dec. 2010 ~ $699

Possible?

Is there more information to the HD6990 available?
 
From what I can tell ATI/AMD defines their chips by the power use now so perhaps we can assume that the current design rules will apply moving forward as well.

HD x6xx = no PCI-E power adapter. = <75W
HD x7xx = 1 PCI-E 6 pin power adapter = <150W
HD x8xx = 2 PCI-E 6 pin power adapters = <225W
HD x9xx = 1 PCI-E 6 pin, 1 PCI-E 8 pin power adapters. = :love:00W

A good way to prevent people from taking a lower end SKU and overclocking it to much higher performance is to limit the available power.

there is no such thing as "available power" using a PCIE adapter.
 
Then again with little to no competition perhaps AMD will turn into the new Nvidia and start pushing prices across the board into ridiculous territory again. Rather than the rather refreshing ATI of a couple years ago that set a pricing precedent by actually drastically lowering prices across the board with the Rv7xx series, and thus forcing Nvidia to abandon their ridiculous pricing policies of the time.

Companies are going to make choices based on the competitive environment, available resources and their long term strategies. They don't have personalities that sway them in any one particular direction.

AMD has to relish the opportunity to increase their ASPs. The "sweet spot" strategy was right for that time since they were still reeling from their last big chip which caused them all sorts of grief. Wouldn't surprise me in the least if they see an opening to attack Nvidia aggressively on the gaming performance front given the power consumption headroom available to them. I suspect ASPs scale up faster than cost and are a bigger profit driver.
 
"attribute interpolation" has been remove from RV870
But sometime it stole a lot % cycle from ALU, when you pass many attribute from vertex shader.
And I observed 20% cycles lost.

I think AMD should add that enhanced fixed function block for "attribute interpolation"
back on HD6x00.
Hardwire attribute interpolation unit is cheap. (In term of die size)
It's much easier gain performance by bring it back , than add hundreds SP.
 
Adding attribute interpolation back to the front-end probably isn't an option. Pre-calculating and storing pixel attributes for all the threads in flight doesn't seem like the best way to go about things. Especially when tessellating. They could add interpolation instructions to the shader core ala Nvidia but I'm not sure how that would work on a VLIW. Can different lanes in the VLIW issue instructions of varying latency? I imagine you could decompose a long latency instruction into multiple shorter instructions that could be co-issued alongside the regular ALU ops.
 
"attribute interpolation" has been remove from RV870
But sometime it stole a lot % cycle from ALU, when you pass many attribute from vertex shader.
And I observed 20% cycles lost.
How much slower was the frame rate?

Months ago when I looked at the interpolations being compiled, the compilation quality was awful. Do you have an opinion on the compilation quality for your code?

Bear in mind that fixed function interpolation adds latency to the time it takes to create a hardware thread of fragments. You can't see how long it takes because you can't see the cycle count for the instructions issued to the interpolator.

GPUSA reports the number of fixed function interpolation instructions it executes, per fragment, for GPUs that are older than R600. If your shader is SM3 or older then you should be able to see how many interpolation instructions would execute on the older GPUs.

For some reason AMD did not implement this statistic for R600 onwards.

I think AMD should add that enhanced fixed function block for "attribute interpolation"
back on HD6x00.
Hardwire attribute interpolation unit is cheap. (In term of die size)
It's much easier gain performance by bring it back , than add hundreds SP.
Fixed function interpolation is also a bottleneck. It's not quite as clear-cut as it appears. I remember a discussion around here where some games were slower than expected on HD5770, and the conclusion was attribute interpolation. But I think there have been games where some games were seen as faster than expected on HD5770 and that was reckoned to be attribute interpolation. Proving the latter is fairly tricky, though.
 
They could add interpolation instructions to the shader core ala Nvidia but I'm not sure how that would work on a VLIW.

IMHO it should work better with VLIW, as I see no reason why they couldn't pack the interpolation instructions with the regular shader code (as long as the interpolants aren't used right from the start ofcourse), helping on the ILP and thereby utilization..
 
IMHO it should work better with VLIW, as I see no reason why they couldn't pack the interpolation instructions with the regular shader code (as long as the interpolants aren't used right from the start ofcourse), helping on the ILP and thereby utilization..
It should work quite well. 1D interpolation requires two ALU slots, 2D interpolation can be done with 2 instructions requiring 2+2 slots. So, in contrast to nvidia (at least the pre-fermi designs but I think they kept that?) this doesn't use the "special" alu unit.
Not sure it really can make HD5770 (don't forget the HD48xx couldn't actually do enough interpolations to feed all units) that much slower but I guess you could always come up with some examples. The performance loss of Cedar vs. rv710 however (at the same clocks) was indeed said to be caused by interpolation "stealing" alu cycles (confirmed by Dave), since with its lower alu/tex ratio it potentially makes twice the difference there.
 
We have a very very nice kind of speculation here:

Antilles: 699 US$

Cayman XT: 429 US$, 35% faster than HD 5870 on average
Cayman Pro: 339 US$, 25% faster than HD 5870 on average, matching GTX 480 512 SP performance wise
Cayman LE: 269 US$, 15% faster than HD 5870 on average

Bart XT: 199 US$, ~HD 5850 performance, good OC ability
Bart Pro: 149 US$, GTX 460 1 GB performance

Turk XT: 99 US$, Juniper XT performance
Turk Pro: 79 US$, Juniper Pro performance
Turk LE: 69 US$, less than Turk Pro for sure, hehehe

Caicos: 49 US$ and lower, placeholder before Fusion takes reign in low end segment.

Now, tell me how nVidia will bath in blood facing these adversaries until 28 nm arrives ?

http://semiaccurate.com/forums/showthread.php?p=67549#post67549

http://semiaccurate.com/forums/showpost.php?p=67541&postcount=655


Very nice, indeed. That's what I expect too. :oops:
 
My speculation:
Cayman XT: 1920SP(30 SIMD)/120TMU/32ROP/256bit
Barts XT: 1280SP(20 SIMD)/80TMU/16ROP/256bit
Turks XT: 512SP(8 SIMD)/32TMU/8ROP/128bit
Caicos: 128SP(2 SIMD)/8TMU/4ROP/64bit

I think Turks would be slower than Juniper.If Turks match Juniper's performance,it will require a 6pin connecter,which is highly unlikely for a Redwood replacement,
 
My speculation:
Cayman XT: 1920SP(30 SIMD)/120TMU/32ROP/256bit
Barts XT: 1280SP(20 SIMD)/80TMU/16ROP/256bit
Turks XT: 512SP(8 SIMD)/32TMU/8ROP/128bit
Caicos: 128SP(2 SIMD)/8TMU/4ROP/64bit

I think Turks would be slower than Juniper.If Turks match Juniper's performance,it will require a 6pin connecter,which is highly unlikely for a Redwood replacement,

IIRC, ATi's RBEs are tied to the memory controller so you wouldn't have two chips with the same memory interface and different number of RBEs as you have listed here with Barts and Cayman.
 
IIRC, ATi's RBEs are tied to the memory controller so you wouldn't have two chips with the same memory interface and different number of RBEs as you have listed here with Barts and Cayman.
Rv770 16ROP/256bit
Cypress 32ROP/256bit

If Barts has 32ROP,it would be too big for a mid-range product.
 
IIRC, ATi's RBEs are tied to the memory controller so you wouldn't have two chips with the same memory interface and different number of RBEs as you have listed here with Barts and Cayman.
You can't have random ratios like 3:2, but multiples of 2 are possible... 4/8/16/32/64 ROPs per 64/128/256/512bit interface are all possible combinations (in theory, of course, many of them wouldn't be practical)
 
Back
Top