AMD/ATI Evergreen: Architecture Discussion


Doh, that's what I get for going straight to the forum and skipping the front page...

There's no forum announcement either. Since there's no thread for discussion of the review yet, I guess I'll post this here: the labels for 4890 and 5870 for the graphs on page 4 are flipped. Also the Stalker COP figures on the same page under "RV790 vs. Cypress 0xAA/0xAF" have some non-numerical characters.
 
the labels for 4890 and 5870 for the graphs on page 4 are flipped. Also the Stalker COP figures on the same page under "RV790 vs. Cypress 0xAA/0xAF" have some non-numerical characters.

Ahh, traditional B3D style. I like it! :)
 
One of the things I'm curious about is the LDS bank conflicts. These are essentially zero in the non-tessellated case but significant when tessellation is active.

This implies to me that HS/DS (or HS only or DS only) are also using LDS and it's the three shaders: HS/DS/PS in combination that are causing conflicts.

Another reason that bank conflicts might arise in the tessellation case is that there are more attributes per vertex. This seems unlikely to me.

On the other hand is the pixel shader the same in both cases?

EDIT: Hmm, that doesn't make sense. LDS bank conflicts can only be inflicted by a shader upon itself, I believe i.e. a single ALU instruction has bank-conflicting LDS addresses. So this seems to imply that HS and/or DS is generating its own bank conflicts. I presume pixel shaders don't generate conflicts normally. And I can't think why pixel shading of tessellated triangles would go from 0 bank conflicts to a substantial number.

Jawed
 
Ahh, traditional B3D style. I like it! :)

Yeah, you're a meanie:p

Jawed: the pixel shader appears to be the same...at least based on what PerfStudio says/grabs. With no tessellation, LDS bank conflicts, and thus LDS caused stalls exist only in the case of one state, which also has a Compute Shader active (the heavy tessellation ones however don't). My guess is that it's the DS that's the culprit, since it probably gets parametrization coordinates for the vertices from the LDS, and almost certainly writes complete vertex data(attributes et al.) to the LDS in order for interpolation to be performed (I think interpolation itself has its role here too, as noted in the article). Whilst there aren't/shouldn't be more attibutes per vertex, there's a huge number of extra vertices in the tessellated case.

All in all, the bit included here was more of a first step, and we're definitely looking at refining things further, so that we get better insight into what makes them tick - so please, suggestions/corrections/whatnot are more than welcome.
 
Regarding the initial benchmarks it would have been nice to see the 4890 @850/600 for a truly "half cypress", instead of that extra bandwidth limited cypress, which doesn't really do anything in the comparision to 4890.

And for the impressive 60-80% cull rate - could it make sense to do at least the backface culling (I guess you have to be too imtimate with the rasterization scheme to do zero-fragment culling) in a geometry shader, or is the setup rate for the geometry shader also limited in the same way?
 
Ar the charts supposed to be clickable. They are tiny and don't enlarge at all on IE 8
They render fine for me in IE8 and IE8 64-bit? Do you have something that fiddles with Flash content installed? As an aside, the current graph renderer that I wrote way back when will go at some point, and be replaced by something non-Flash.
 
They render fine for me in IE8 and IE8 64-bit? Do you have something that fiddles with Flash content installed? As an aside, the current graph renderer that I wrote way back when will go at some point, and be replaced by something non-Flash.

Hmm not that I'm aware. It wont render at all on IE8 64-bit keeps trying to install flash. So weird. Well if its only me its all good. iwill go load it up on the laptop
 
I think dx11 will need some tesselation rewamp if they want to use it on heaven benchmark level in future :rolleyes:. At least solve the culling somehow.
 
My guess is that it's the DS that's the culprit, since it probably gets parametrization coordinates for the vertices from the LDS, and almost certainly writes complete vertex data(attributes et al.) to the LDS in order for interpolation to be performed (I think interpolation itself has its role here too, as noted in the article).
Interpolation, i.e. the math, doesn't make use of LDS - it's merely that data shared by multiple work-items is held in LDS.

I can't think why increasing the count of triangles causes pixel shader interpolation to start to generate LDS bank conflicts. If this was the case then a normal non-tessellated scene would do the same.

The only caveat I'd add there is that if, when tessellation is active, the way triangles are assigned to hardware threads is different, then there might be a reason for interpolation to cause conflicts. I'm thinking specifically of packing multiple triangles into each hardware thread, which doesn't normally happen when tessellation is off. Except I don't know if this packing is possible. Nor why it would only be turned on during tessellation. Though this kind of packing, it could be argued (don't agree), would only be a performance gain, in spite of LDS bank conflicts, when tessellation is on.

Whilst there aren't/shouldn't be more attibutes per vertex, there's a huge number of extra vertices in the tessellated case.
But a pixel can only be on a single triangle per work item.

All in all, the bit included here was more of a first step, and we're definitely looking at refining things further, so that we get better insight into what makes them tick - so please, suggestions/corrections/whatnot are more than welcome.
First thing I'd suggest is raw data. e.g. the statistics that come out of the profiler for that frame, at least for those 5 key states.

Jawed
 
I think dx11 will need some tesselation rewamp if they want to use it on heaven benchmark level in future :rolleyes:. At least solve the culling somehow.
From the article:
It's a clear indication that performance must be traded for final image quality with tessellation, as with any other facet of real-time graphics.
Which was a hint that it's actually the engine and art team's problem to solve, rather than ATI's. There's certainly work for ATI to do here, driver wise, but the hardware is what it is, and the tessellation level in the benchmark could clearly be dialled back without a huge drop in IQ. That's what'd have to happen in a game.
 
Which was a hint that it's actually the engine and art team's problem to solve, rather than ATI's. There's certainly work for ATI to do here, driver wise, but the hardware is what it is, and the tessellation level in the benchmark could clearly be dialled back without a huge drop in IQ. That's what'd have to happen in a game.

nutella-big.jpg


Yep it seems the dragons legs and tail could use some adaptive tesselation ;).
 
I found also this DX11 Detailed flow http://www.gamedev.net/community/forums/mod/journal/journal.asp?jn=316777&reply_id=3424549 .

And the final note on efficiency there :
The diagram presented here takes the naive approach of executing as many times as there are outputs or inputs. It is expected that hardware can take advantage of commonality (via pre- or post-transform caches for example) and reduce the number of invocations. The two best candidates are the vertex and domain shaders; in this example there are 6 VS invocations and 36 DS invocations yet there are only 4 unique control points and 14 domain points. Specifically, the example used here would do 50% more vertex shading and 157% more domain shading – in a field where performance is crucial it’s easy to see why the hardware would want to be cleverer!

Those 60-80 % culled primitives seems to be quite high. Couldnt the extra DS and VS invocations be the result of those numbers ? Or if its actualy solved in hardware somehow via those caches in the 5k radeons ?
 
Just pointing out, this is why flash is evil. And for a chart, flash is so totally not even needed!
Agreed, it was just cheap for me to do when I was bringing the new site up a few years ago.
 
Back
Top