AMD: R9xx Speculation

If it is the VLIW5->4 drivers causing the confusing results....will AMD continue to support/patch Cayman until they work properly or will AMD drop them the moment 28nm cards are out?

Well that will be like 1 year or more down the road so, AMD have plenty of time to work on the drivers in the meanwhile I think.
 
If it is the VLIW5->4 drivers causing the confusing results....will AMD continue to support/patch Cayman until they work properly or will AMD drop them the moment 28nm cards are out?

Cayman just came out. Why wouldn't they support it? It does not make sense.
 
That is not going to happen as AMD just released their next generation (32nm) part on 40nm the 6970.

I am also under the impression that it takes 2 years to go from design of a new GPU to production so unless AMD started this next generation 40nm design 1 1/2 years ago (and they didn't) there will be no 6970+ coming in 2011.

A "next generation" part is whatever you make of it. Barts/Cayman is next gen, and whatever succeeds them will also be next gen, as it will undoubtedly include changes of some sort. Whether a total overhaul or just piling on more functional units, or in between.

My point is in late 2011 it seems if 28nm is not ready, ATI has enough spare die size they could launch a 500 MM^2 chip on 40nm, and have enough extra transistors to make a next gen chip there, as Cayman is 389 mm while GF110 is 530 mm. Nvidia conversely, really seems will have to wait for 28nm to do anything, anything at all.

I am not speaking of 6970+, but rather, 7XXX.
 
My point is in late 2011 it seems if 28nm is not ready, ATI has enough spare die size they could launch a 500 MM^2 chip on 40nm, and have enough extra transistors to make a next gen chip there, as Cayman is 389 mm while GF110 is 530 mm.

I am not speaking of 6970+, but rather, 7XXX.
My point is AMD would have to had been already working on that 7XXX part over a year ago for it to be ready to enter the production pipe which takes 4 months to produce parts.

That didn't happen so there will be no 7XXX on 40nm.
 
My point is AMD would have to had been already working on that 7XXX part over a year ago for it to be ready to enter the production pipe which takes 4 months to produce parts.

That didn't happen so there will be no 7XXX on 40nm.

And I bet they have a much better idea of the availability of upcoming nodes than you do, and have for a long time. I very much doubt their engineers are sitting on their hands.
 
And I bet they have a much better idea of the availability of upcoming nodes than you do, and have for a long time. I very much doubt their engineers are sitting on their hands.

of course they aren't sitting on their hands. How would u eat cake with no hands ?
 
I'm not impressed with ATI's tessellation technology at all. The whole point of DX11 tessellation is that you don't need to tessellate a whole patch before you start rasterizing, so buffer requirements for a good algorithm are minimal. Still, even if you have to buffer the tessellator output in the RAM, the bandwidth cost should be very low, so as kludgy as this solution is, it should work. Carsten's results, however, still show the horrible scaling from before.
In many cases you will tessellate an entire patch before rasterizing it because the DS runs between these stages and it can take a while.

That is not going to happen as AMD just released their next generation (32nm) part on 40nm the 6970.

I am also under the impression that it takes 2 years to go from design of a new GPU to production so unless AMD started this next generation 40nm design 1 1/2 years ago (and they didn't) there will be no 6970+ coming in 2011.
While it does take a long time to design a new architecture some things can change along the way. Process technology can't change on a whim, but it doesn't require 2 years of work either.

I wonder if they would still be using VLIW5 if DX10 had adopted ATI's former version of tessellation. That would amplify the triangles before the vertex shader stage, presumably leading to better utilization of the 5 slots. Does anyone know if DX11 tessellation stages (domain, hull) use all five slots?
It wouldn't have made a difference. Ati and DX11 tessellation differ in pattern and data flow. The pattern is irrelevant to the shading and the data flow changes wouldn't significantly affect the core shader code. The HS would likely be the most different though.
 
I think the 6950 seems like a very good deal, if games is all you care about.

Unfortunately for me, it isnt. I also want compute value in form of impressive speed gains in video editing like with Adobe Premiere Pro CS5, and right now only nVidia offers that.

Yeah same here (although I use Vegas Pro 10, so cuda for encoding) hence I just ordered a 580gtx for $509 with the Mafia 2 game free. My new business has me doing lots of video editing and I use a gtx470 right now. I'll put the 580 in the gaming pc downstairs and then it can migrate to my office machine as a cuda card once something better comes along for games. Now someone needs to start a 300 page 6990 speculation thread :)
 
While it does take a long time to design a new architecture some things can change along the way. Process technology can't change on a whim, but it doesn't require 2 years of work either.

IIRC it was said somewhere that AMD "lost" ~6 months of work due process change
 
There really aren't that many apps that are setup bound. You can see that from from any straight geometry tests the dual geometry engnes in termsof setup are working fine. There are more improvementsto come from the drivers with regards to the tessellation changes though.
Okay, I meant "geometry bound.

Resolution scaling when the CPU isn't an issue tells you a lot about how geometry bound you are. In Dirt2, compare the HD 6000 series:
http://www.guru3d.com/article/radeon-6950-6970-review/21
to the GeForces:
http://www.guru3d.com/article/geforce-gtx-570-review/14

Using a regression with pixel count, at 1920x1200 - the resolution that Guru3D chose for it's full comparison - the 6970 is spending 62% of its time on resolution-independent workload, while the GTX 570 is only spending 47% of its time there.

Per extra pixel, the 6970 is 15% faster than the GTX 580. Yet the latter is still 11% faster even at the highest resolution (2560x1600, 8xAA, 16xAF) because the 6970 spends 60% more time on the resolution-independent stuff.

You can also see that the 6870 doesn't scale any differently from the 6970, and this shows in the similar res-ind time of ~8ms per frame. Cayman's doubled geometry throughput isn't working very well in this game and others.

------------------------------------

The regression model is not perfect due to quad/tile count scaling a bit differently from pixel count, but the results are decent. CPU doesn't seem to be a factor as multi-GPU setups get 150+ fps.
Code:
Measured FPS:
	12x10	16x12	19x12	25x16
6870	87	76	70	52
6970	94	83	77	61
570	106	90	80	58
580	123	104	96	68

Regression results:
                6870    6970     570     580
ns per pixel    2.78    2.04    2.80    2.36
ms per frame     7.8    8.1     5.8     5.0

Fitted FPS (1 / (ms_per_frame + ns_per_pixel * 10e-9 * screen_pixel_count)):
	12x10	16x12	19x12	25x16
6870	87.0	75.8	70.2	52.0
6970	92.8	83.2	78.1	60.7
570	105.3	89.2	81.4	57.8
580	123.0	104.5	95.5	68.0
ATI's shader architecture is as efficient as ever at shading pixels from cars, the environment, and post-processing. Unfortunately, it's much slower at resolution-independent stuff, like drawing reflection/shadow maps and cranking out triangles for the main scene, which points to far slower geometry processing.
 
I think Nvidia are missing a trick by not having a 3GB card, but I guess it's a pretty difficult proposition getting a GF110 and 12 2Gbit GDDR5 chips onto a single board with adequate cooling and low enough power requirements. Maybe Nvidia should work on their own power limiter that isn't app based.
Memory isn't hurting NVidia at all. Look at 2560x1600 with no AA. The GTX 580 is only 5.2% faster than the 6970.
http://www.computerbase.de/artikel/grafikkarten/2010/test-amd-radeon-hd-6970-und-hd-6950/26/
 
On the other hand, performance delta between 5870 and 5970 is bigger at 4× (16% at 1920×1200/4x vs. 10% at 1920×1200/8×)
 
"The 6970; its definately NOT another R300! - R300King!"


..by the way. Anyone have a real photo of the chip's insides, maybe an X-ray or something? Like to see if there are any disabled features or anything still floating around..you know, extra stream processors...more TMUs...a SIMD...an ROP...maybe another MAD, MUL or ADD...a Z/Stencil...a side port...a MHz or two we missed, anything!
 
ATI's shader architecture is as efficient as ever at shading pixels from cars, the environment, and post-processing. Unfortunately, it's much slower at resolution-independent stuff, like drawing reflection/shadow maps and cranking out triangles for the main scene, which points to far slower geometry processing.

I think mainly things like shadow maps, rendering targets and all other stuff thats need to be writen and than postprocess is the main fps killer. And thats all about ram bandwith and caches.
I mean how much difference is betwen 5800 memory and cache architecture and 6900 if u dont count 4 more simds a few GB more ram bandwith ?
 
In many cases you will tessellate an entire patch before rasterizing it because the DS runs between these stages and it can take a while.
The domain shader is basically a vertex shader with a compact input stream. If you can buffer enough vertices to run a VS, then you can buffer more than enough barycentric pairs to run a DS.
 
I think mainly things like shadow maps, rendering targets and all other stuff thats need to be writen and than postprocess is the main fps killer. And thats all about ram bandwith and caches.
No, it isn't. RAM heavy stuff and post-process is in the per-pixel category, where the 6970 is faster than the GTX 580 here. I don't think you understood my post. The 6970's disadvantage comes from workload that is unrelated to pixel count.

Shadow maps and reflection maps are geometry limited because the former has very simple pixels (40 GPix/s is 0.025 ns/pix as opposed to the 2 ns/pix extracted from the data), while the latter has simple pixels and reduced resolution (e.g. 6x512x512 for a cube map).

Bandwidth alone could give the GTX 580 a 2-3% advantage over the 6970, but not more than that. Games are generally under 30% BW limited.
 
"The 6970; its definately NOT another R300! - R300King!"


..by the way. Anyone have a real photo of the chip's insides, maybe an X-ray or something? Like to see if there are any disabled features or anything still floating around..you know, extra stream processors...more TMUs...a SIMD...an ROP...maybe another MAD, MUL or ADD...a Z/Stencil...a side port...a MHz or two we missed, anything!
Every launch since the R520 we see rumours about disabled something. R520 had 32 pipelines with 16 of them disabled, R580 was in fact R600 with disabled unification, R600 had... almost everyhing disabled, Cypress had 400 SPs disabled, Barts has disabled 2 SIMDs and Cayman even 4 of them...
 
Back
Top