Why isnt the R580 a lot faster in Shader intensive games?

ZioniX · Jan 26, 2006

Tim said:
I think they are wrong - they seem to think that the x1000 cards is not capable of MSAA+FP16 and for that reason only runs at int10 when the fact is they are fully capable of MSAA+FP16.

The HDR in the screenshots looks imho identical for both cards, so if the x1900 really is running at int10 it does not seem to any influence on IQ.

No, they are right. The X1800/1900 use the FX10 format in AoE3's HDR while the GeForce cards use FP16. The IQ produced are similar though. Furthermore, the GeForce cards are indeed capable of using AA in conjunction with HDR in that game, thanks to supersampling.

mhouston · Jan 26, 2006

Jawed said:
2. I think that GPGPU-based synthetics may prove helpful. Rys's efforts, and Tridam's at www.hardware.fr (though I'm waiting for the English version - my French just aint up to it) are definitely steps in the right direction.

Rys seems to have based his work (or been inspired by) on the Stanford GPU-bench suite. Tridam has been at this somewhat longer and really seems to have the bit between his teeth.

The GPU bench guys, by the way, seem to express some reservations about their own tests - finding that they don't correspond particularly well with the actual performance of GPGPU apps.Jawed

I would't quite put it that way. I'm assuming you are referring to comments in Jeremy's blog. What he meant there is that no single GPUBench test can tell you how an app will perform. If you look at the ClawHMMer paper, the Matrix multiply paper, and several others, you can see that we can do pretty accurate modeling of application performance. For example, using instruction issue and fetch costs, we can build a very accurate model on ATI hardware for when an application will be compute or bandwidth bound.

The danger in producing a single benchmark score is that 1) companies try to optimize for it 2) it doesn't tell you how all apps will perform. GPUBench was designed to help understand how GPU's work for the applications we build. I fully agree that just because one board has a faster instruction issue rate doesn't mean it's faster in a certain app. If your app does lots of readback, the readback performance will be very important to you. If you do few texture fetches and lots of math, the instruction issue rate is paramount. But, for none trivial apps, you'll need to look at a combination of tests. If you are dependent on registers, that will matter. Etc.

Tridam · Jan 26, 2006

Ailuros said:
I just read a review by the german Computerbase, thanks to the guys that linked me to it at 3DCenter and according to them Radeons use a INT10-HDR format + 4xMSAA, while GeForces use a FP16-HDR + 1.5x SSAA mode according to it.

http://www.computerbase.de/artikel/...900_cf-edition/15/#abschnitt_age_of_empires_3

I analyzed what a couple of games were doing with HDR when working on the GTX 512 review and I already discussed that in previous articles so yes, X1K use int10 AOE3 when HDR is on. FP16 for GeForce. The antialiasing with HDR and GeForce is a 2.25x (1.5x in both directions) supersampling.

Tridam · Jan 26, 2006

I've been using pure synthetic test shaders for years and in the end of the day, I'm not sure it's useful for readers to know about most of these results. The reason is simply that each of these tests (including GPUBench ones) is just a single dot and one needs a huge number of these dots to draw a full picture of an architecture, at least an understandable picture of it. With only a couple of dots users could imagine a wrong picture of the architecture. The advanced compilers used in the drivers make everything harder to test so these couples of dots could also deform the picture.

If you know RCP, TEX, MAD and MUL peak throughput, you don't know MAD+MUL throughput, TEX+MAD throughput etc. Registers usage (not only temps), constant accesses are also important. Vec2 + vec2 ability could also be. Dependencies are also important: let's say there are 2 math units, can both units write a result in a register? All of that need a huge amount of different tests.

It doesn't mean that I don't run a lot of these tests, because I do. I use them to understand the architecture since NV30, and then I can explain the architecture. When advanced readers know a bit of the architecture they can know the peak throughputs by themselves.

When I publish a pure synthetic shader result of mine in an article it's most of the time to show a specific point that I'm explaining and I actually try not to be too much synthetic (not always possible of course). For example, to measure branching granularity performance, I could have used the classic fillrate stuff (2 triangles on the fullscreen) or even used a single triangle with clipping. It would have shown purely theory results: full performance gain as soon as the granularity used match the GPU one. Instead of that I chose to render the shader on a lot of moving triangles so that the test is closer to the reality. I think it gives a more interesting figure.

In the latest review I wanted to show the difference between GeForce 7 and Radeon X1K architecture when dealing with dynamic control flow. I think that 2 points were interesting about that so I managed to show 2 tests around branching, each showing one of these 2 points. The first one is granularity; the second one is the way GPUs deal with branching instructions. It also gives an interesting figure as it shows an optimized algorithm could be faster on X1K architecture with branches and loops while it could be way slower on GeForce 7.

Reverend · Jan 27, 2006

Demirug said:
Reverend, you say that games and synthetics donâ€™t link well in current reviews and you are right. But how should we link them together if we donâ€™t understand what the games are doing? Games developers normally donâ€™t open their little â€œmagic booksâ€ for you. And even more worse they sometime didnâ€™t even tell you the truth.

Well, I doubt they would flatout lie about things... more like "avoiding the question". And that's understandable given that their livelihoods depend on seling games. They really couldn't care too much about purist/altruistic hw website reporters.

As for the problem you rightfully brought up, I refuse to believe it cannot be solved. It takes a lot of effort (and it also means you need to have the knowledge and know-how, as per your reverse-engineer games comment below) but I believe such an effort will prove far more useful and beneficial to the public.

Knowing and reporting how and why games are made the way they appear on the shelves given the amount of tie-ups between IHVs and ISVs as well as actual hw architectures and their behaviours can be much more useful to the hw buying public (we need to agree that this is the ultimate aim of a website) than knowing the intricasies of the ADDs, MULs, Texs, etc operations of an architecture.

Otherwise the latest and greatest hw remain "the latest and greatest hw with the latest and greatest unfulfilled potential".

I remember a list of games that allegedly use SM3 and donâ€™t do it at all.

That list comes from IHVs, right?

This will force anybody who want to link synthetics and games together to reverse engineer the games that are used in the review. This is a hard and time consuming job.

Yes it is. It needs to be done, however. You spend a huge amount of time investigating the architectures of a piece of hw. That time becomes useless if you can't prove, with games, what it all means. Which is more important?

But does it matter? I am still seeing many people â€œreadingâ€ only the bars and numbers in a review. How many people really want to know the dirty secrets that are hidden inside a game? A reviewer who have to pay the bills form this work need to write what the readers want or they will go away.

And the bars and numbers really are all that matter, truthfully. But they need to be explained. Hence, what I said above.

Remember I suggested that there can be a "games only review" as well as a "architecture only investigation" type of articles, and as circumstances demand, a link between the two.

The "dirty secrets" behind a game is just as important as the "dirty secrets behind a piece of hw and/or its drivers". Both have their audience, especially at this site.

In the end, it comes down to what a site is about. I don't think Dave needs to have "games benchmark reviews" out the same time as all the other privileged NDA'ed sites. His "no games benchmark here, you can get them elsewhere" R520 article is proof. So maybe this means he really isn't all that concerned about the heavy-traffic=money games-benchmarks audience. Hence my suggestion.

Hubert · Jan 27, 2006

Megadrive1988 said:
hopefully we will see R580 and R580 derivative based cards stretch their legs through 2006 and into 2007 as optimizations happen and newer SM3.0 intensive games get released.

even if R600 launches in late 2006, I don't see it or DirectX10 / SM4.0 / WGF2.0 being very big until late 2007.

Ditto. What he said.

Pete · Jan 27, 2006

Re: AoE3 HDR and FX10

Thanks to Hanners, Ail, Tim, ZioniX, and Tridam for the info. (GER->ENG translation of the Compbase AoE3 page.)

Eeenteresting! I gather NV can't do FX10, and apparently it's not showing an obvious IQ deficiency to FP16, so I guess Ensemble went with the former on ATI parts for speed reasons. So why is a X1800 still so much slower than a GTX? And has anyone noticed a difference in IQ b/w the two HDR+AA modes? More texture aliasing on ATI, more edge aliasing on NV? Can ATI be forced to run FP16 HDR, say with 3DAnalyze IDing an ATI card as NV?

Ailuros · Jan 27, 2006

Tridam said:
I analyzed what a couple of games were doing with HDR when working on the GTX 512 review and I already discussed that in previous articles so yes, X1K use int10 AOE3 when HDR is on. FP16 for GeForce. The antialiasing with HDR and GeForce is a 2.25x (1.5x in both directions) supersampling.

Thank you for the clarification Damien. Since AoE3 isn't even my type of genre, is that 1.5*1.5 OGSS(?) bluring anything? I've read something to the extend that "half samples" introduce some weird bluring; I've never seen any half samples in action and hence the curiousity.

Ailuros · Jan 27, 2006

Pete said:
Thanks to Hanners, Ail, Tim, ZioniX, and Tridam for the info. (GER->ENG translation of the Compbase AoE3 page.)

Eeenteresting! I gather NV can't do FX10, and apparently it's not showing an obvious IQ deficiency to FP16, so I guess Ensemble went with the former on ATI parts for speed reasons. So why is a X1800 still so much slower than a GTX? And has anyone noticed a difference in IQ b/w the two HDR+AA modes? More texture aliasing on ATI, more edge aliasing on NV? Can ATI be forced to run FP16 HDR, say with 3DAnalyze IDing an ATI card as NV?

Look at SS2; do you see a noticable difference between default bloom effects and float HDR? It obviously depends on implementation and might be the case here also. This gets somewhat nonsensical if you consider what type of game we're talking about and how much polygon edge aliasing you can actually notice in a high resolution. If the half samples noted above should introduce any blur than it's an obvious downside; if not the improvement on textures is on the other hand again small. If there would be a LOD offset involved I wouldn't expect it to be higher than -0.25 on estimate.

All pure speculation, since I don't have the game.

Ailuros · Jan 27, 2006

Rev,

Remember I suggested that there can be a "games only review" as well as a "architecture only investigation" type of articles, and as circumstances demand, a link between the two.

The "dirty secrets" behind a game is just as important as the "dirty secrets behind a piece of hw and/or its drivers". Both have their audience, especially at this site.

In the end, it comes down to what a site is about. I don't think Dave needs to have "games benchmark reviews" out the same time as all the other privileged NDA'ed sites. His "no games benchmark here, you can get them elsewhere" R520 article is proof. So maybe this means he really isn't all that concerned about the heavy-traffic=money games-benchmarks audience. Hence my suggestion.

I absolutely love "games only reviews" as you call them, but I could easily think of writing one concentrating exclusively on games without even mentioning what hardware I've used. I know that's a somewhat awkward logic, but it would be a somewhat extreme approach that does truly concentrate on games exclusively.

B3D doesn't use hardware from different IHVs to compare them. If that policy wouldn't or shouldn't be changed, are we talking about a pure analysis on a set of games, or a performance comparison between different GPUs in a set of games?

IMHO as a reader what I'd like to see from B3D are extensive articles analyzing image quality; definitely more time and bandwidth consuming and more "dangerous" than anything else. But it might be worth a try. It's not the use of synthetic applications, it's the rather extremely sterile approaches concentrated mostly on performance aspects that annoy me most personally.

Geo · Jan 27, 2006

It strikes me that the beta article on "how we benchmark" kinda starts to bridge those two worlds. It'd be cool to have an article --or more likely section of an article-- per game discussing why it is in the suite, what its performance considerations are (with a few handy graphs showing those considerations coming into play in particularly indicative ways), technology it leans on, etc. Then you link that games section of the article every time you use it in the benches of a new review.

Edit: Another cool thot --a "Games /Tech Table" to go with the chips/board tables. Wanna know where a particular tech is used? Well, check the table.

Of course "cool thot" actually = "more work for Dave and Neeyik".

Pete · Jan 27, 2006

Hanners said:
Using the 'Very High' pixel shader setting in Age of Empires 3 enables HDR rendering - That's the only change that setting makes that I'm aware of.

Hmmm, that might not be the only change. According to THG('s screenshots), High is PS2 and V. High is 3.0 (with soft shadows and still-nicer water). Do various sites' AoE3 demos feature water? Damien, what say you?

Ail, heh, the only differences I saw b/w CompBase's AoE3 sshots were shadow angles and slightly more "3D" roof tiles on the bottom-left house on the G70.

geo, iXBT/Digit-Life does that, to an extent, with its benchmark headings. Then again, Xbit and B3D and other sites do that when explaining benchmark results. A separate article describing the benchmark suite would make for good reading, though.

Hellbinder · Jan 27, 2006

OT.. but..

People are more violently opposed to fur than leather; because it's easier to harass rich ladies than motorcycle gangs.

That is totally awesome.

neliz · Jan 27, 2006

Hellbinder said:
That is totally awesome.

Did Starbucks step out of the AEG program?

Hellbinder · Jan 28, 2006

neliz said:
Did Starbucks step out of the AEG program?

Apparently, I have not drank enough starbucks today to actually comprehend what it is that you are talking about.

neliz · Jan 28, 2006

Hellbinder said:
Apparently, I have not drank enough starbucks today to actually comprehend what it is that you are talking about.

http://www.beyond3d.com/forum/showthread.php?t=26199&highlight=community+outreach

YOU of all people should know!

MistaPi · Jan 29, 2006

ZioniX said:
No, they are right. The X1800/1900 use the FX10 format in AoE3's HDR while the GeForce cards use FP16. The IQ produced are similar though. Furthermore, the GeForce cards are indeed capable of using AA in conjunction with HDR in that game, thanks to supersampling.

Im confused. Can or can't X1x00 perform AA with FP16 blending HDR?

Tridam · Jan 29, 2006

MistaPi said:
Im confused. Can or can't X1x00 perform AA with FP16 blending HDR?

X1K can. FP16 + AA is already working with FarCry and Serious Sam 2.

Apple740 · Jan 29, 2006

Tridam said:
X1K can. FP16 + AA is already working with FarCry and Serious Sam 2.

And 3DM06.

Why isnt the R580 a lot faster in Shader intensive games?

ZioniX

mhouston

A little of this and that

Tridam

Tridam

Reverend

Hubert

Pete

Moderate Nuisance

Ailuros

Epsilon plus three

Ailuros

Epsilon plus three

Ailuros

Epsilon plus three

Geo

Mostly Harmless

Pete

Moderate Nuisance

Hellbinder

neliz

GIGABYTE Man

Hellbinder

neliz

GIGABYTE Man

MistaPi

Tridam

Apple740

Similar threads