AMD: R9xx Speculation

So by implication the next round of desktop ATI cards should hit in or before October. It would be a bit odd if the mobility parts launched before desktop parts.

Unless those are just rebranding, in which case I'll be very disappointed by ATI. Same goes for Nvidia if 425 isn't a Fermi derivative.

Regards,
SB
 
Unless those are just rebranding, in which case I'll be very disappointed by ATI. Same goes for Nvidia if 425 isn't a Fermi derivative.
I have a hunch that you'll be very disappointed ;)

And no, I'm not implying that I know anything, but there was that supposedly leaked AMD roadmap that showed a Vancouver mobile refresh that is still UVD2 (+Eyefinity), whereas all SI-based chips seem to be UVD3.

I also read a rumor that AMD is/was working on respins/new steppings of their 5xxx mobile chips to improve yields or power consumption (or both).

So my guess is that Vancouver is just a refresh of the Evergreen-based mobile lineup to reduce manufacturing cost and/or improve performance-per-watt, and for marketing reasons it will introduce Eyefinity to mobile space and use the 6xxx naming scheme, to have "new" products in time for back-to-school season.

SI-based "real" 6xxx mobile chips probably won't come before Q1/Q2 2011. That's what the leaked roadmap claimed, and mobile chips usually appear a few months after their desktop equivalents, and since SI desktop models seem to be scheduled for Q4/2010 - Q1/2011 time-frame...
 
A radeon 5850 for a much more realistic price would be more than any new 6xxx card ;).
The price/performance is same(or worse ?) than the 4k cards. The new faster cards just cost much more than the older cards. At least for people who dont care for eyefinity (over 90% consumers) and other "features". (including DX11, which seems to be the dx with the most benchmarks and techdemos :D)


If the 6k cards will be on the same leaky 40nm TSMC than the 5k cards, than dont expect revolution. If they increase die size or buss width the new cards will cost more. The price/performance stays probably same.
 
Last edited by a moderator:
A radeon 5850 for a much more realistic price would be more than any new 6xxx card ;).
The price/performance is same(or worse ?) than the 4k cards. The new faster cards just cost much more than the older cards. At least for people who dont care for eyefinity (over 90% consumers) and other "features". (including DX11, which seems to be the dx with the most benchmarks and techdemos :D)

There's of course power consumption and thus noise, which makes me want to switch my 4850 to 5750 or even lower when the prices come down.
 
Last edited by a moderator:
If the 6k cards will be on the same leaky 40nm TSMC than the 5k cards, than dont expect revolution. If they increase die size or buss width the new cards will cost more. The price/performance stays probably same.

Obviously we wouldn't expect any price/performance revolutions if they stay on the same node. They could perhaps dramatically increase tessellation performance, but this hardly matters for most games.
 
They can increase efficiency. Evergreen architecture scales quite poorly with the number of SIMDs (simply - HD5850 OCed to HD5870 level performs the same despite 160 SPs and 8 TMUs difference).

I expect higher efficiency (per-flop performance), higher tesselation+geometry performance and better texture filtering.
 
That "effiency" and tesselation performance is only really an issue on 58x0, which has sort of out-grown the design. So I don't see much reason to update the lower end (until a new node).
 
That's right for tesselation, but scaling isn't great even on HD5700 (but I must admit it's much better than on HD5800):

http://en.inpai.com.cn/doc/enshowcont.asp?id=7688

at the average:
400 -> 640 SPs (+60%) = +7,8% gaming performance
720 -> 800 SPs (+11%) = +6% gaming performance
1440 -> 1600 SPs (+11%) = +1,8% gaming performace

the test isn't optimal, HD5800 may be CPU limited in some cases, but the results are very similar even for those games, which are GPU bottlenecked...
 
400 -> 640 SPs (+60%) = +7,8% gaming performance
720 -> 800 SPs (+11%) = +6% gaming performance
1440 -> 1600 SPs (+11%) = +1,8% gaming performace

the test isn't optimal, HD5800 may be CPU limited in some cases, but the results are very similar even for those games, which are GPU bottlenecked...
I think a 6% increase for 11% more SPs all else being the same is quite ok for a "balanced" architecture. Clearly though, the 1.8% of the HD58xx for the same 11% increase are not.
(And yes both the HD5830 and Juniper HD5670 don't scale normally but that has nothing to do with bad scaling of the architecture but everything with the strange effects of disabling half the rops with these chips.)
 
r8s-avzp1.png
r8s-bh93p.png


These situations aren't CPU limited, evidently. HD5800 barely hits 60FPS, so higher resolution would reduce framerates to hardly playable levels.

+11% of SPs and TMUs translates to 1-3% performance advantage.

Another example - at more demanding settings:

r8s-d0uvv.png
r8s-ca9sf.png


HD5970 and 2x HD5850 - both of them have the same core and memory clocks. HD5970 has an advantage of 320 SPs and 16 TMUs (+11%). But real performance is only 1-3% better, again.
 
That's right for tesselation, but scaling isn't great even on HD5700 (but I must admit it's much better than on HD5800):

http://en.inpai.com.cn/doc/enshowcont.asp?id=7688

at the average:
400 -> 640 SPs (+60%) = +7,8% gaming performance
720 -> 800 SPs (+11%) = +6% gaming performance
1440 -> 1600 SPs (+11%) = +1,8% gaming performace

the test isn't optimal, HD5800 may be CPU limited in some cases, but the results are very similar even for those games, which are GPU bottlenecked...

I did refer this some time ago, and I was almost cruxified at the time. I speculated that this still R600 based architecture was hiting its limits, with more and more diminishing returns for each SP.
 
I did refer this some time ago, and I was almost cruxified at the time. I speculated that this still R600 based architecture was hiting its limits, with more and more diminishing returns for each SP.
We could be saying the same about Nvidia SPs , just look at GTX 285 and GTX470, the latter is actually barely 20% faster , despite having 208(80%) more SPs .

Now I know there are many differences between the two , like bandwidth , Texture Mapping , clocks , but still ..
 
These situations aren't CPU limited, evidently. HD5800 barely hits 60FPS, so higher resolution would reduce framerates to hardly playable levels.
I can't see those graphs, but when talking about architecture scaling the playability is a different thing. However, the higher the performance the graphics card the more the CPU is always going to be a factor. I have the full suite of data for Cypress 18 vs 20 SIMD's because we analyzed this and there are many cases where the difference is much greater than seen here - there are also specific cases where is scales to its peak (shader and compute tests).

HD5970 and 2x HD5850 - both of them have the same core and memory clocks. HD5970 has an advantage of 320 SPs and 16 TMUs (+11%). But real performance is only 1-3% better, again.

Thats not an apples to apples comparison. One is passing two GPU's worth of data through one PCI Express link while the other is passing two GPU's worth of data through two links.
 
There's of course power consumption and thus noise, which makes me want to switch my 4850 to 5750 or even lower when the prices come down.

Yep, but strange is the 57xx cards are just 128bit , have smaller die area, less power consumption and yet they sell more or same like the old 48xx cards. (maybe just the tsmc 40nm combination with nvidias lineup :rolleyes:)
Lets hope new cards wont cost 800 dolars for similar 4800-->5800 performance increase in the future.
 
Yep, but strange is the 57xx cards are just 128bit , have smaller die area, less power consumption and yet they sell more or same like the old 48xx cards. (maybe just the tsmc 40nm combination with nvidias lineup :rolleyes:)
Lets hope new cards wont cost 800 dolars for similar 4800-->5800 performance increase in the future.

And you should know you are thinking wrong. Costs of production are barely relevant in a "free market". And product actual value depends on competition. ATI is just playing by "free markets" rules there. Without a capable competition from nVIDIA or other player, price is high of course.
 
I have the full suite of data for Cypress 18 vs 20 SIMD's because we analyzed this and there are many cases where the difference is much greater than seen here
I'm trying to find weak points which could be improved to extrapolate possible character of "R9xx" GPUs.

You know the weak points and you know which of them will be solved by R9xx... That makes any discussion quite difficult :)

there are also specific cases where is scales to its peak (shader and compute tests).
Yes. SPs can be fully utilized, but only by syntetic tests, not in games. This implies there's likely something unoptimal in GPU's front end. Maybe LDS conflicts, maybe somethink else...
 
Yes. SPs can be fully utilized, but only by syntetic tests, not in games. This implies there's likely something unoptimal in GPU's front end. Maybe LDS conflicts, maybe somethink else...

This is a rather significant stretch. Merely looking at some traces from modern and semi-modern games outlines the fact that A LOT of stuff goes on to render a single frame, and there are a number of potential sticking points that extra ALUs in particular or hardware in general can't help with. There are quite a few things that can be done wrong and are done wrong. Examples: oh wow, i'm going wild with state switching (this will hurt), my vertices are described by an 49 bit(hypothetical) structure(great, i'm messing up the vertex cache). I've also decided that i need [maxvertexcount(lol)] for my geometry shader (this will hurt too). To top it off, I use a lot of clip/discard, branching like I was on a CPU, and draw the skybox first. And those are only a few examples. There are more nefarious things like stalling due to improper use of Map/Unmap etc., etc.

More ALUs allow you to do more math. Great, but that doesn't help with either of the above examples. There are other architectural traits that aren't neccessarily upscaled between the SKUs you're comparing (for example, equal export rate from differing stages of the pipe, equal setup rate, equal Hier-Z rate etc.). What I'm getting at is that ultimately, these are reasonably complex machines running reasonably complex code, and a complex system should not be approached deterministically.
 
Back
Top