AMD: R9xx Speculation

Entropy · Oct 31, 2010

Jawed said:
I gave Civ 5 as a generic example of HD5870 being too slow in absolute terms, tessellation on or off.

Honestly, Jawed, with all due respect, using a turn based strategy game as justification to proclaim a graphics card as too slow is bizarre. You could make a better argument from Civ5 for CPUs being too slow as you in the latter parts of a game can spend considerable time sitting around waiting for the AI to stop playing with itself. (Although this may have as much or more to do with memory management issues than CPU speed per se.)

To some degree it's symptomatic of the evolution of graphics and gameplay, and correspondingly these forums. The graphics doesn't really affect the gameplay of Civ5 at all, and much of it isn't even visible unless you zoom in to a degree that few that is actually playing the game are interested in doing. What player is ever interested in seeing the threads on the wheels of the vehicles of individual troops? The fact that Civ5 is used in graphical benchmarking, not because it makes any sense but because it is possible, beautifully demonstrates the disconnect of the hardware sites from the actual use of the products.

Frame rates, from a gameplay point of view, is a solved problem as far as graphics cards are concerned. Those looking for optimum control or gameplay no longer come around here (as they actually did a decade ago), and B3D forums are now pretty much the exclusive domain of technology enthusiasts. Nothing wrong with that, necessarily, but sometimes the weird or completely lacking application perspective is jarring.

Jawed · Oct 31, 2010

Entropy said:
Honestly, Jawed, with all due respect

Let me stop you there, so you can come back with a decent argument.

http://www.techreport.com/articles.x/19844/11

To keep frame rates playable on these cards, we had to compromise on image quality a little bit, mainly by dropping antialiasing. We also held texture quality at "High" and stuck to 4X anisotropic filtering [note this contradicts the accompanying pictures which indicate 8xAF]. We did leave most of the DX11 options enabled, including "High" shadow quality with advanced shadow sampling, ambient occlusion, and tessellation.

22fps at 2560x1600. Are you suggesting that HD5870 is fast enough?

Civ 5 is obviously CPU-limited in certain respects, yet GTX480 performance is substantially higher, plus:

http://www.anandtech.com/show/3987/...enewing-competition-in-the-midrange-market/13

Civ 5 has given us benchmark results that quite honestly we have yet to fully appreciate. A tight clustering of results would normally indicate that we’re CPU bound, but the multi-GPU results – particularly for the AMD cards – turns this concept on its head by improving performance by 47% anyhow.

trinibwoy · Oct 31, 2010

Entropy said:
The fact that Civ5 is used in graphical benchmarking, not because it makes any sense but because it is possible, beautifully demonstrates the disconnect of the hardware sites from the actual use of the products.

Required fps for Civ5 aside are you suggesting we only benchmark games that are unplayable? What's wrong with using Civ5 as a benchmark given that it supports features that very few other games currently do?

Frame rates, from a gameplay point of view, is a solved problem as far as graphics cards are concerned.

Are you saying that if we strip off all the fancy graphics and leave "gameplay" only then modern graphics cards are fast enough? Why exactly is that a relevant or even practical argument? Personally, I find graphics to be a huge part of the gaming experience but I still don't get your point. Should we no longer care about graphics card performance because it only affects graphics?

Entropy · Oct 31, 2010

Jawed said:
Let me stop you there, so you can come back with a decent argument.

I never said that GPUs don't affect the frame rate you can measure with FRAPS. I questioned the value of using Civ5 for graphics card benchmarking at all because changes in graphics settings doesn't impact the gameplay in any way. Hell, to some degree it doesn't even affect the visual impression other than in non-gameplay situations.

Perhaps I should have expanded on my point.

In this thread, I've read hours of bickering about tessellation. Now, tessellation is one of several methods of faking detail so as to reduce the system load of actual geometric data, joining methods such as LOD schemes, various detail/bump mapping and so on.
Now, a good discussion would have covered what the limitations of the technology is, to what degree and how fast it is likely to be applied in the industry, what advantages and disadvantages it has in comparison to other techniques already in use, what the cost is in terms of hardware resources and whether this is a actually a good way to spend those resources as opposed to spending them on other aspects of the GPU, at what level the capabilities are sufficient for the application space et cetera

Potentially some interesting stuff to discuss, along with placing the technique into the overall context.

Instead, I see "nVidia beats AMD in the Heaven Extreme benchmark, nyah nyah nyah", and "Oh noes, AMD has to do something drastic not to fall behind!"
This is actually below technology enthusiast level, and descends solidly into my team vs. your team.

Now, in my frustration I made a rather sweeping statement about the lack of application perspective here, and generally I think that the level of discussion on these forums would gain a lot by having actual usefulness as its guide. However I recognize that this a personal opinion.

Entropy · Oct 31, 2010

trinibwoy said:
Required fps for Civ5 aside are you suggesting we only benchmark games that are unplayable? What's wrong with using Civ5 as a benchmark given that it supports features that very few other games currently do?

What is wrong with using Civ5 as a graphics benchmark is that it is not a graphics application - by and large graphics quality or framerates does not affect what you actually do inside the game. Thus you are arguably benchmarking outside your application space - similar to using Transaction Per Second benchmarking to evaluate gaming systems. An extreme example, but I hope it gets the point across.

What I meant by the broader statement was that people who are interested in graphics performance for the sake of control and gameplay are largely satisfied today. This was not the case for instance back when there were people looking for a solid 100 fps in quake3 for motion physics reasons, and when you wanted to use something better than bilinear filtering because the moving lines were distracting in gameplay. Back then, these combinations of features where difficult to attain using the hardware of the day. Today, there is basically no problem getting good gameplay. (Indeed, turning of DOF and some other effects typically improves your ability to discriminate.) Those for whom control is critical can typically get it by lowering settings that affect gameplay not at all, and visuals only marginally. And predictably these people are no longer present, as far as I can see, here on Beyond3D forums.

Gipsel · Oct 31, 2010

Jawed said:
Civ 5 is obviously CPU-limited in certain respects, yet GTX480 performance is substantially higher, plus:

http://www.anandtech.com/show/3987/...enewing-competition-in-the-midrange-market/13

Just as a note, those benchmark results look extremely fishy to me. Have you ever seen a GPU getting faster when the resolution and/or AA settings are increased? Exactly that happens with the GTX480 results. It scores:
39.0 fps at 1680x1050 without AA
41.8 fps at 1920x1200 with 4xAA
43.7 fps at 2560x1600 with 4xAA

Maybe Anand should have benched it also with 8xAA and it would have run at 50fps? I have real problems to understand what this benchmark actually measures.

Silent_Buddha · Oct 31, 2010

Interesting what happens with the 460 and 470. It follows a similar pattern until 2560x1600 at which point it appears to have insufficient memory perhaps and FPS drops. The 460 768meg version, however, starts dropping dramatically as soon as you go up in res.

Regards,
SB

GZ007 · Oct 31, 2010

Gipsel said:
Just as a note, those benchmark results look extremely fishy to me. Have you ever seen a GPU getting faster when the resolution and/or AA settings are increased? Exactly that happens with the GTX480 results. It scores:
39.0 fps at 1680x1050 without AA
41.8 fps at 1920x1200 with 4xAA
43.7 fps at 2560x1600 with 4xAA

Maybe Anand should have benched it also with 8xAA and it would have run at 50fps? I have real problems to understand what this benchmark actually measures.

At higher ressolution the tesselation could be faster as u end up with triangles covering more pixels. The rest of the GTX cards just dont have enough memory.

AlexV · Oct 31, 2010

Gipsel said:
Just as a note, those benchmark results look extremely fishy to me. Have you ever seen a GPU getting faster when the resolution and/or AA settings are increased?

It's not impossible. Some stuff works better as you increase resolution (excluding minor driver tweaks here and there). Think about Hier-Z/Z-CULL which are more efficient at a higher resolution(as long as you don't overflow, which should be a thing of the past anyhow). Not saying that's the case there, only that it's not impossible.

Gipsel · Oct 31, 2010

Silent_Buddha said:
Interesting what happens with the 460 and 470. It follows a similar pattern until 2560x1600 at which point it appears to have insufficient memory perhaps and FPS drops. The 460 768meg version, however, starts dropping dramatically as soon as you go up in res.

Regards,
SB

I'm thinking about if it is possible that the compute shader used to decompress textures on the fly tries to be clever by using some adaptive scheme depending on the size and speed of the GPU and messes everything up.

Gipsel · Oct 31, 2010

GZ007 said:
At higher ressolution the tesselation could be faster as u end up with triangles covering more pixels

But you will never have less triangles or pixels to rasterize and never less quads to shade

It simply can't get faster by increasing resolution even if the efficiency may rise.

Edit: Same goes for the points raised by AlexV. The maximum an improved efficiency can deliver is that the framerate stays the same. You just fill something what is wasted at lower resolution with useful work. But you can only fill so much as you add useful work.

DavidGraham · Oct 31, 2010

AlexV said:
Think about Hier-Z/Z-CULL which are more efficient at a higher resolution(as long as you don't overflow, which should be a thing of the past anyhow)

I have to ask .. why should Z-culling be more efficient when resolution increases ?

Jawed · Oct 31, 2010

Entropy said:
Now, a good discussion would have covered what the limitations of the technology is, to what degree and how fast it is likely to be applied in the industry, what advantages and disadvantages it has in comparison to other techniques already in use, what the cost is in terms of hardware resources and whether this is a actually a good way to spend those resources as opposed to spending them on other aspects of the GPU, at what level the capabilities are sufficient for the application space et cetera

http://forum.beyond3d.com/showpost.php?p=1485328&postcount=3903

and a feature with little to no current relevance (and judging from people on these very boards involved with art creation, rather murky prospects going forward as well).

Maybe you should contribute something slightly more concrete. How about quoting some artists. Or explaining how it's irrelevant when it's being used in numerous games.

Call of Juarez was a great example of a game that had D3D10 goodness, only to be totally undermined by the succeeding game that looked better in DX9. That problem with this kind of pre-emptive featurism appears to be the gist of your point of view on tessellation. Feel free to expand...

Jawed · Oct 31, 2010

Gipsel said:
Maybe Anand should have benched it also with 8xAA and it would have run at 50fps? I have real problems to understand what this benchmark actually measures.

I noticed those problems. I just assumed Anandtech's incompetence, e.g. using out-of-date results.

trinibwoy · Oct 31, 2010

Entropy said:
What is wrong with using Civ5 as a graphics benchmark is that it is not a graphics application

Of course it's a graphics application. I think you mean due to its turn based nature it is less dependent on a high framerate. However, that doesn't disqualify it from being a useful graphics benchmark.

And predictably these people are no longer present, as far as I can see, here on Beyond3D forums.

The world has moved on and the forum moved on with it. Technology is far more advanced now than it was back then and having a good enough CPU for responsive mouse clicks and key presses is no longer an issue.

I do agree with the lack of useful debate though. There is whole lot of talk about who's faster at tessellation and very little if any discussion about how tessellation works or how it's implemented. It's still a murky concept and I can't point to a proper layman's analysis of it either here or anywhere on the net.

Mintmaster · Oct 31, 2010

Jawed said:
Where do you get 4 bytes per vertex from? I'm seeing TS output in examples as float2 or float3.

That's how you write the shader, but internally each of the two coordinates are 16 bit fixed point numbers from 0 to 1.

If an HS hardware thread of 16 patches (4 control points per patch for terrain tessellation using quad patches = 64 control points sharing a hardware thread) generates 337 triangles per patch, then that's ~5.4K triangles/vertices, 42KB assuming 8 bytes per vertex. Obviously, DS will drain those triangles as TS produces them, in batches of 64 vertices (that's ~84 batches).

You're assuming that the tessellated triangles from many patches are generated in parallel. That would be the methodology of tessellation emulated by the GS, but when you have fixed function hardware that's not how it works.

The tessellator will have a stream of input patches (edge/face tessellation factors and nothing else), read one, generate coordinate pairs one at a time to create a triangle list until the patch is complete, and then repeat. I would think that it wouldn't take many transistors to generate one barycentric coordinate pair per clock this way.

Like I said, the TS output stream is then very small. Whether it goes to GDS or an off chip ring buffer before being read into LDS shouldn't matter. You still need the control point data of the hull shader, but that should be able to remain in the LDS.

NVidia uses L2 to smooth these coarse-grained lumps of data, and it uses load-balancing across the entire GPU twixt stages to maximise overall throughput. Neither of these options seem to be available in Cypress.

For tessellation this shouldn't be an issue. The throughput needed on Cypress (one tri per clock) is very tiny. If it can balance vertex shaders, then it can balance domain shaders.

CarstenS · Oct 31, 2010

Jawed said:
I noticed those problems. I just assumed Anandtech's incompetence, e.g. using out-of-date results.

Maybe Firaxis are using something like what is proposed on page 19:
http://developer.download.nvidia.com/presentations/2010/gdc/Tessellation_Performance.pdf

Screen Space Adaptive Tessellation
• Triangles under 8 pixels are not efficient
• Consider limiting the global maximum TessFactor by screen resolution
• Consider the screen space patch edge length as a scaling factor

But then, it's unlikely, that a GTX 470 is faster than a GTX 480, or that the overclocked Evga GTX 460 is slower than a stock GTX 460 with 1GB. *sigh*

Ethatron · Oct 31, 2010

Mintmaster said:
The tessellator will have a stream of input patches (edge/face tessellation factors and nothing else), read one, generate coordinate pairs one at a time to create a triangle list until the patch is complete, and then repeat. I would think that it wouldn't take many transistors to generate one barycentric coordinate pair per clock this way.

So coordinate amplification is not much space-wise. How about all the connected coordinates, like 8 or 16 texture-coordinates etc. Is that all algorithmic, or need also to be buffered? How does that work, I feed 8 texture coordinates to my regular polygon, that's a lot to read, much more than vertex-information, but only once per triangle. You say the tesselator produces a serial stream of patch-fragments, are they also processed serially afterwards? (I imagine not, and if not there must be a buffer to distribute the data?) Would we see attribute-access explosion on the back-end of the tesselator? If there would be a global parallel cache, we just would have cache-hits, but I imagine the cache doesn't have a bus to every shader, but to clusters of shaders? Do we not get too much customers for too much data on the read-ports or the switch?

The Beyond3D-analysis shows Cypress does have data-fetch problems, with at least FP-attributes. I'd be curious to see a test-case with some half-realworld 10-15 attributes, instead of just one.

Ethatron · Oct 31, 2010

Jawed said:
I noticed those problems. I just assumed Anandtech's incompetence, e.g. using out-of-date results.

It can also mean that Civ5 is initialization-limited, would be the first game I guess. And ATI cards would get faster too.

Or that we have superscalarity, on low resolutions data is processed in tiny erratic bursts reading amounts of data which is thrown away immediatly, on high resolutions it becomes more predictable and cache-friendly because it requires more time per element and more elements are actually used instead of thrown away.
But ... that would be a very Fermi-specific exploit? Namely cache-hir.

Mintmaster · Oct 31, 2010

Ethatron said:
So coordinate amplification is not much space-wise. How about all the connected coordinates, like 8 or 16 texture-coordinates etc. Is that all algorithmic, or need also to be buffered?

That's all generated in the domain shader using data from the hull shader. The tessellator only takes tessellation factors as inputs to produce a sequence of barycentric coordinates.

The domain shader is pretty much identical to a vertex shader. Both will produce vertex data to feed the pixel shader, and thus will have the same size output per wavefront.

AMD: R9xx Speculation

Entropy

Jawed

trinibwoy

Meh

Entropy

Entropy

Gipsel

Silent_Buddha

GZ007

AlexV

Heteroscedasticitate

Gipsel

Gipsel

DavidGraham

Jawed

Jawed

trinibwoy

Meh

Mintmaster

CarstenS

Moderator

Ethatron

Ethatron

Mintmaster

Similar threads