Razor1 said:
Well Since you work at ATi, add I already noted that ATi has spent a good deal of reasources to improve thier memory controller, it seems that went over your head when you read my post, how old is nV's current memory controller btw? Has it changed much for the last 3 or 4 years?
I'm not privy to inside information about nVidia's memory controller design - I work for ATI, not nVidia, as apparently you have noticed.
Also the x1900 shader design is forward looking to what extent? It can't compete with games today and shaders that are going to be used in the next year or so, when the x1900 will not be around.
Seems to compete pretty well as far as I can see.
You tell me how doing a full screen of occulision parrallex mapping affects the x1900 and then tell me the frame rates that are achieved when doing this. Every single pixel covered going from low res to high res. And tell me if the x1900 is capable at pushing this kind of high level shader in real time, in games that will use a full screen of this shader and then have overdraw do it particle effects and shadows etc.
Well, let's try for a more interesting case shall we?
Our Toyshop demo makes extensive use of OPM along with other highly complex shaders, as I'm sure you're aware. In addition to this it has particle effects, transparencies, shadows, overdraw, physical simulation on the GPU, and renders it all with extended dynamic range - all things that I would regard as relevant. (It also manages to be just about the most impressive real-time rendering demonstration of which I am aware, but I'm naturally biased towards the great work of our demo team).
On my single X1900XTX here, using a FRAPS runthrough I get an average of 36 fps at 1600x1200 resolution. With 4xAA at the same resolution this drops to 33.5 fps. the minimum frame rate is 25.
I would say that that is eminently playable for many game genres, although perhaps not a twitch-shooter. If you want to include Crossfire in the mix then I would expect we could scale to 60fps+, which would seem enough for twitchy FPS play as well.
I believe that it would be intriguing and enlightening to see the frame rate of a 7800GTX-512 running this scene with the same level of quality, with a version of the demo as optimised as possible for that architecture. It might be less enlightening to see it run our version of the shaders - they do use dynamic branching after all, so...
Are you saying ATi's new shader array's are weaker in older games then they will show thier power in newer games, that doesn't really make sense does it? A shader is a shader if one is being used, and certain hardware is more powerful pushing a certain shader it shows. Well if thats the case I would think we would have seen hints of that in FEAR and SCCT, didn't I mention that?
It makes perfect sense as far as I can see.
"A shader is a shader" - what a comment - two different architectures will have very different characteristics running two different programs - as I recall from the way-back machine if you took a Pentium and ran it against a Pentium Pro(/2) on pure hand-tuned floating-point math code, guess what - the Pentium was often faster than the newer CPU at the same clock. Why? Because the latencies of FP instructions increased on the Pentium Pro, but the throughput remained the same. But then you run them on a more typical code mix, or code not specifically hand-tuned for either and the Pentium was often heavily beaten. Why? Because the Pentium Pro had out-of-order execution, and better branch prediction etc. so on general code it won convincingly.
A shader is not just a shader. Context and instruction mix are very important, and as shaders get longer then you may get to see more elements of pure shader performance coming through in final benchmarks.
That being said I don't think that I really overlooked that much - I mentioned that shader performance is only one of many factors dictating performance in current benchmarks, and then made a statement of my beliefs of how the balance will change in the future (beliefs, not a statement of fact - I think I was reasonably clear on this point).
Of course you over looked that. But then you see you have to factor in clock deficiency which won't be seen with the 7900 gtx will it?
In the example that I used above I scaled the performance by the differences in clock-rate for the purpose of the comparison. Maybe you overlooked that?
Also the example you gave in the other thread, steep parrallex and fur(I think that one has it too), well they use dynamic branching, lets leave that out for now since we already know ATi also spent a good deal of resources on it, and nV hasn't. So it all comes down to this. ATi spent a good deal of effort in increasing dynamic branching, but didn't pay much attention to anything else. When dynamic branching shaders won't be used in the short term.
How convenient to just leave it out - after all, "a shader is just a shader". How convenient to suddenly choose to think only in the short term, and ignore the future at the same time as implying that we are not being forward looking with X1900.
Non-sequitur.
The Cook-Torrence example is pure ALU - no branching whatsoever, so I would like to believe that we paid plenty of attention to arithmetic performance as well. And you yourself indicated that we do well with anisotropic filtering, which would seem to mean that we paid reasonable attention to that. Remind me what's left in terms of shading performance again, and where didn't we pay attention?
Also how can xbitlabs results be cpu limited when they scale depending on resolution?
If the benchmarks in question are scaling then they are obviously not completely CPU limited (I didn't quote any specific benchmark in my previous post). However, if a benchmark scales then is it necessarily highly shader limited? Doom3 scales at very high-res, but is not really heavily shader-limited, it's heavily shadow-volume rendering limited.
There can and are many other potential limitations that come into play. We believe, based on our research, that shading will become more important over the lifetime of the X1900 architecture. Time will tell if we were right.
[edit] I see you later added a whole load of the numbers from that thread. I said in my original post that I was ignoring the cases from the original thread where we were discussing "What counts as an ALU" or "How many pipelines do they have" due to the aforementioned difficulties in reaching a common consensus on those points. I believe that the Cook-Torrence example that I used from that thread was simply X1900 versus G70, scaling for clockrate alone - X1900 versus G70 on shader execution clock-for-clock.[/edit]