AMD: R9xx Speculation

Actually, I think you're confusing this with reviews were the card was included

No, he's probably thinking of the supposed memory controller tweaks, that brought big wins in Doom 3 and a few other OGL titles IIRC.
 
Actually, I think you're confusing this with reviews were the card was included lateron and where games like NfS Carbon or Gothic 3 were benchmarked. In those games for example, Nvidias tightly integrated texturing showed its dark side, so that made X1800 look way better in comparison.
I think Silent_Buddha is right in this point. Not only the OpenGL driver, which boosted MSAA 4x performance by tens percents, but even non-MSAA performance was improved slightly. I also remember AoE3 and one other game, which scored desperately, but which's performance was boosted really significantly. Looking at my notes, every second Catalyst release at that time brought some (measurable) performance improvements for X1 series.

Anyway, some of those improvemens can't be re-ran, e.g. shader replacement in in 5.11, dual-core/HT optimisations in 5.12, new opengl shader compiller in 6.8 etc...
 
I think serious reviewers will retest Cayman when Antilles comes out, presumably with Cat 11.1… That may help. They'll probably do the same when the GTX 560 is released, though if that happens at the same time as Antilles, it won't have any effect.
 
No, he's probably thinking of the supposed memory controller tweaks, that brought big wins in Doom 3 and a few other OGL titles IIRC.

Yes, the gains for these titles were very significant.
I had X1950pro and remember this as one single biggest driver gain in ATi history.
Sadly that was thanks to relatively weak GL drivers prior to that update.

I bet though that in a year time HD6970 will be a lot closer to GTX580 in new games than HD5870 is to GTX480 now.
 
Ah, ok. You're talking about individual titles. Then that may be true.
I was thinking more of a broader increase of performance, and that didn't happen unfortunately.
 
The per-frame time really is the killer here. We're waiting from Dave to see if the command processor is what he meant by front end, but that would be a real shame.
Perhaps he means a combination of command processing and juggling the two geometry engines (called "graphics engines" for some reason I can't fathom) is the root of a lot of inefficiency.

http://www.techreport.com/articles.x/20126/2

By contrast, Cayman has the ability to setup and rasterize two triangles per clock cycle. I'm not sure it quite tracks with what you're seeing in the simplified diagram above, but Cayman has two copies of the logic block that does triangle setup, backface culling, and geometry subdivision for tessellation. Load-balancing logic distributes DirectX tiles between these two vertex engines, and the processed tiles are then fed into one of Cayman's two 12-SIMD shader blocks. Interestingly, neither vertex engine is tied to a single shader block, nor vice-versa. Future variants of this architecture could have a single vertex engine and dual shader blocks—or the reverse.
So the overheads in scheduling for these two geometry engines lead to ~90% throughput on "standard untessellated" geometry?

I suppose a fruitful comparison could be made with GTX460 running at 880MHz. Or with Cayman down-clocked to GTX460's speed.

To be clear, I expect that some of the per frame time is raster related, as shadow/reflection maps aren't 100% geometry limited, but if much of the rest is command processor limited then that would be quite baffling.
How good is Z compression for shadow buffers?
 
If history is anything to go by, then broad and significant performance improvements are often hinted/promised, but very seldom delivered. I remember seeing 2-3 articles comparing over 5 driver versions on cards and generally the improvements have been in 1..~5% range. It could be argued that now, in DX11 era, things are all different and potential for gains is so much higher.
What I'm sure there'll be increases like "15% in Stalker with 2560x1600 and 8xAA" etc etc.
I'm also sure there should be some big gains to be had where shader compiler or AMD provided libraries are the bottleneck.
As for games - what we've seen is what we've got. And it's not bad IMHO.
 
Perhaps he means a combination of command processing and juggling the two geometry engines (called "graphics engines" for some reason I can't fathom) is the root of a lot of inefficiency.

http://www.techreport.com/articles.x/20126/2


So the overheads in scheduling for these two geometry engines lead to ~90% throughput on "standard untessellated" geometry?

I suppose a fruitful comparison could be made with GTX460 running at 880MHz. Or with Cayman down-clocked to GTX460's speed.


How good is Z compression for shadow buffers?

GTX460 is much more castrated than just half GF100 in geometry. Its more like 1/4.
Check out normal,moderate and extreme in heaven 2.1 on the second picture http://www.pcgameshardware.de/aid,8...klasse-Grafikkarten/Grafikkarte/Test/?page=12.

Also charlie wrote this in the 6970 article http://www.semiaccurate.com/2010/12/14/look-amds-new-cayman6900-architecture/

The big advance of Nvidia's Fermi architecture was that they split the geometry processing up into multiple chunks, 8 in the GF100/GF110, two in all others.


Maybe the overhead with handling the paralel geometry is the main reason why Geforce cards dont use it in normal rendering. And not just to make difference in qaudro products. (they have completly different driver optimizations anyway)
 
Last edited by a moderator:
GTX460 is much more castrated than just half GF100 in geometry. Its more like 1/4.
Check out normal,moderate and extreme in heaven 2.1 on the second picture http://www.pcgameshardware.de/aid,8...klasse-Grafikkarten/Grafikkarte/Test/?page=12.
So? All I see is GTX460 being pretty much exactly half as fast. Factor in the lower clock and it would actually do a tiny bet better per clock (and it also has one of its SM, hence polymorph engine too, disabled).

No idea where Charlie got the idea that GF100/GF110 have 8 GPC, but they don't. The magic number is 4 (for setup/raster), while GF104 has 2 GPC... If you talk about the polymorph engines, that's 16 vs 8 for GF100/GF110 vs GF104. Looks very much like half to me...
 
Do you guys think the 500-700mb fb advantage will come into play for future lineup of "true" DX11 games...? at least before Cayman vliw4 shaders....rops..tessellator engines runs out of steam...just how did AMD priced the 2GB eyefinity 5870 for so much more than the vanilla 5870....and here both 5870 class gpus comes with 2GB! That is good right?
 
Do you guys think the 500-700mb fb advantage will come into play for future lineup of "true" DX11 games...? at least before Cayman vliw4 shaders....rops..tessellator engines runs out of steam...just how did AMD priced the 2GB eyefinity 5870 for so much more than the vanilla 5870....and here both 5870 class gpus comes with 2GB! That is good right?
The EyeFinity 5870 was a totally niche product for a tiny niche market. You don't expect that at the same price as the mainstream cards... Plus it needed a different pcb.
As for really needing 2GB, hard to tell. So far the 1.5GB of the GTX 580 seems to be enough for everything, even at 2560x1600 and 8xAA, but 1GB would not be. For triple monitor setups, more than 1GB might also be quite needed - for CF even more so since you can crank resolution / AA up more and still get reasonable frame rates. So I'd say the 2GB are definitely not useless (but still, a lower cost 1GB version of at least the HD 6950 might be a better deal for some).
 
Ah, ok. You're talking about individual titles. Then that may be true.
I was thinking more of a broader increase of performance, and that didn't happen unfortunately.


Some are not individual titles, some are engine related like Epic and their bias towards one IHV and their Unreal Engine and AA, the hotfix was rolled into driver revision so improvements are across the board on titles that use that engine like Bioshock, Fallout 3 and Batman AA, I saw a huge increase in performance with FSAA enabled on these titles. It simply amazes me today that software companies continue to release games optimized for certain hardware, that was the purpose of Direct X to make it easier for companies to ensure ALL consumers get a great experience no matter what hardware you use, and we are not talking about a small player here, we are talking about the two big hitters in the business with AMD and Nvidia.

http://www.hexus.net/content/item.php?item=20991

Another example in across the board peformance is Crossfire has been steadily improving across most titles.

http://hardocp.com/article/2010/08/26/ati_crossfirex_application_profile_108a_performance/
 
Last edited by a moderator:
Why do you say it's like 1/4?

Somehow the raw tesselation numbers from hardware.fr http://www.hardware.fr/articles/813-7/dossier-amd-radeon-hd-6970-6950-seules-crossfire-x.html and the PCgameshardware unigineheaven2.1 results http://www.pcgameshardware.de/aid,8...klasse-Grafikkarten/Grafikkarte/Test/?page=12 doesnt look right to me as u increase tesselation setings.
It seems to me that the 460 near half fps of 580 trough the whole normal->extreme tesselation in heaven is actualy quite strange if u compare it with radeon line up. (and mainly 6900 cards which end same than 460 at extreme but have quite a lead in rest. And tesselation rate counts there clerly as the 5870(and even 6870) card is already slover a bit than 460 on normal seting)
 
I did a quick test of PoweTune in 3 apps. so far. Sorry for lack of power measurements but my Kill-a-Watt died.
All tests done on Win7 x64 with Cat. 10.12a hotfix RC3 and Phenom X6 4GHz Turbo OFF (890FX)

Unigine Heaven 2.1 1280x720 default settings, tesselation extreme:
-20% = 39FPS
-10% = 39.8FPS
0% = 39.8FPS
+20% =39.8FPS
Only PowerTune -20% affects this setting and causes throttling of GPU.

Unigine Heaven 2.1 1920x1200 default settings, tesselation normal:
-20% = 40.4FPS (72C GPU)
-10% = 49.3FPS (76C GPU)
0% = 49.9FPS (76C GPU)
+20% = 49.9FPS (76C GPU)
Again only -20% makes visible difference here, but -10% also has a tiny effect on performance! (Temps reported at the end of each benchmark).

Unigine Heaven 2.1 1920x1200 default settings, tesselation disabled:
-20% = 55.4FPS (71C GPU)
-10% = 68.4FPS (76C GPU)
0% = 72.6FPS (78C GPU)
+10% = 73.4FPS (78C GPU)
+20% = 73.4FPS (78C GPU)
Here we can see that even 0% caps Cayman a bit.

3DMark Vantage Perlin Noise:
-20% = 101FPS
-10% = 108FPS
0% = 147FPS
+10% = 173FPS
+20% = 173FPS
Heavily affected by PowerTune.

3DMark Vantage Parallax:
-20% = 52FPS
-10% = 74FPS
0% = 74FPS
+10% = 74FPS
+20% = 74FPS

Dirt 2 Ultra settings 1920x1200 noAA:
+20% = 87.8FPS (min. 64.2FPS)
0% = 87.4FPS (min. 64.5FPS)
-10% = 88.0FPS (min. 75.0FPS)
-20% = 73.3FPS (min. 57.3FPS)
Slight variances to be ignored because built in benchmark is not consistent and I've done only 1 run for each setting. Only -20% PowerTune really affects results. Not very demanding game for Cayman in that regard. As speculated by Mintmaster the bottleneck is not shading for sure :)

Finally I wanted to check if +20% setting has enough margin to accommodate Perlin Noise test with max. CCC GPU clock of 950MHz:
-20% = 102FPS
-10% = 108FPS
0% = 145FPS
+10% = 166FPS
+15% = 187FPS
+20% = 187FPS
Good news is, no throttling at 950MHz even with 'only' +15% setting. What is a bit perplexing though is +10% result. If you look at stock clock counterpart it performs faster! 173FPS(GPU 880MHz) vs 166FPS (GPU950MHz)! Now you might tell that @950MHz GPU is throttling below 880MHz but at least according to GPU-Z monitor that's not the case. It dips to 911MHz but average clock is still 946MHz!? To add to confusion my second run after restarting computer didn't throttle at all and showed constant 950MHz with +10% and exactly the same 166FPS. I can think of two possible explanations:
- PowerTune changes clocks so quickly that GPU-Z can't detect it (but still I think it should show at least few spikes in a log file), or
- PowerTune is switching off some of the units / clocks them slower while maintaining high clock for the rest of GPU

Any ideas?
Under construction ... :smile:
 
Last edited by a moderator:
I am confused...how much will updated drivers help Cayman "new" architecture...?

On one hand, you have people saying it would, while others benchers dont think so...
On another hand, AMD driver release notes usually claim 10-40% improvement with every new Catalyst....yet tests i read wrt to driver comparisons...yield virtually no big fps gains in games..but Cayman is like really "new" bro.
On yet another hand, i remember reading new gpus went from totally average at launch to pretty good gains over older gpus...aging well or just new games optimizing for these new gpus....as an example...i thought i was pretty happy with a 4870 1GB, after reading 5850 launch reviews..and behold at present i re-founded out that 5850 has been perf at a higher level than 4870 1GB with so many new games

Should i place my faith in Cayman architecture and anticipate new games will make use of it better....than current ones...than any driver updates will ever help?

I wish sites would come back and do a retrospective review of old gpus....atm with gpu tech slowing....make even more sense...i no longer see 4870 1GB in so many Cayman reviews.

On the final hand....how long more to DX12? Will 28nm gpus come with DX12..

I don't know how much faster the 69x0 series will become. I just have to say that if you look back at benchmarks for oct 2009 when cypress hit , it has improved by leaps and bounds. I would think performance would be up at least 10-30% in every game. I remember a while back when widescreen gaming showed that with the newest driver of the time the 5850 was as powerful as the 5870 was using launch day drivers and hte 5830 was close to hte 5850 in performance with launch day drivers.

I would expect this new gpu to see equal gains across its life
 
http://www.widescreengamingforum.com/wiki/AMD_Radeon_6970_6950_-_Featured_Review


Really good review here although its only ati products. but you can really see the performance increase .


In battle forge at 1920x1080 the 6950 is faster than the 6870 and 5870 but not by much. Kick it up to eyefinity tho and the gains become much greater.



Look at the reviews the 6950 would be a good worth while update from my 5850 with my 3 monitors. But I think i'm going to use some christmas money and get the 6970 as in some of the games i can double my frame rate.
 
blame 3dmark , they want money to test at higher resolutions

Vantage subtests are fixed resolution ;). I believe UniversalTruth was complaining about Heaven 1280x720 res. I was running but I did that to see how much tesselation is stressing GPU power wise. Pushing more pixes shifts workloads more towards TMU/Shaders which is the biggest power hog in GPU. My results from 1920x1200 with normal tesselation shows exactly that, throttling a bit already with -10% setting.

Now I'm going to see how this will look with no tesselation at all :p.
 
Last edited by a moderator:
Back
Top