AMD: Speculation, Rumors, and Discussion (Archive)

Status
Not open for further replies.
Cracks on objects like that, LOD isn't being calculated correctly for the patches, its a problem with the code, which I stated ;).

Btw they are talking about subpixel aliasing, when geometry edge partially intersects a pixel, with sub pixel triangles does increase that. But the trade off, with sub pixel triangles is you don't get the strong aliasing of the many pixels at one edge of a poly, its harder to notice the aliasing vs the original not tessellated model.

Good read on god rays, but you are using Vram and shader performance, so there is a trade off with tesselation vs horse power and vram.

And also it seems to only work with one strong light source, where as with tessellation, it can be done with many light sources without extra hits on vram and pixel shader.

Sure, but current software solutions to crack filling are expensive, which is the point. It's why both consoles have full DX11 tessellation support but you don't see displacement mapped tessellation everywhere. A more programmable pipeline could make it easier to do so, but until that time or until someone shows the current somewhat fixed pipeline can work well in realtime I don't see displacement mapped tessellation being used a whole lot.

And while there you certainly need to incur a performance cost for something one way or another, considering Hollywood does raymarching and raytracing for volumetrics, and uses it for photo real results, I'm pretty sure it's going to stay that way. The tessellation method was just an old hack from, heck back from the geometry shader days and the second STALKER title if I recall correctly. The volumetric raymarching solution, which can re-use shadow maps and supports multiple light sources (did you even read it?) and support inhomogenous media, and supports scattering from indirect lightig, and is already much, MUCH faster even on Nvidia's cards than Nvidia's own tessellation solution is the clear winner.

And mass geometry aliasing is still a problem depending on the delta of change in between centroid samples, you still get flashes, even super bad sparkly results like you get with un anti-aliased normal maps and specular. And since you can only sample a single unfiltered triangle at a time, there's no reason to do sub-pixel triangles even if you could. To get better image quality you'd need a way to sample and filter all the triangles affecting the pixel at once, which is exactly what REYES does but not what rasterization does not. AMD's support for tessellation and geometry is decent enough at the moment. They just need to cut their ALU to geometry ratio to something more like the Fury rather than the Fury X, which might mean going about six wide for the front end on their new high end (Greenland?).
 
Sure, but current software solutions to crack filling are expensive, which is the point. It's why both consoles have full DX11 tessellation support but you don't see displacement mapped tessellation everywhere. A more programmable pipeline could make it easier to do so, but until that time or until someone shows the current somewhat fixed pipeline can work well in realtime I don't see displacement mapped tessellation being used a whole lot.

Don't agree with that at all. Both consoles are based of GCN hardware, don't need to go any further than that.

And while there you certainly need to incur a performance cost for something one way or another, considering Hollywood does raymarching and raytracing for volumetrics, and uses it for photo real results, I'm pretty sure it's going to stay that way. The tessellation method was just an old hack from, heck back from the geometry shader days and the second STALKER title if I recall correctly. The volumetric raymarching solution, which can re-use shadow maps and supports multiple light sources (did you even read it?) and support inhomogenous media, and supports scattering from indirect lightig, and is already much, MUCH faster even on Nvidia's cards than Nvidia's own tessellation solution is the clear winner.

Only one light source produces god rays, if you have multiple light sources same intensity similar distances, you will need multiple shadow maps, hence multi amount of Vram consumption. Yes I read the paper, I think you didn't understand when I stated multiple lights, not just multiple lights, but multiple lights casting god rays.

Hollywood doesn't matter much to gaming, I work in special effects in TV and movies, getting that stuff to work in realtime, is the hack :), its always a compromise between speed and accuracy until hardware is fast enough or has memory large enough to do the real thing in real time.

And mass geometry aliasing is still a problem depending on the delta of change in between centroid samples, you still get flashes, even super bad sparkly results like you get with un anti-aliased normal maps and specular. And since you can only sample a single unfiltered triangle at a time, there's no reason to do sub-pixel triangles even if you could. To get better image quality you'd need a way to sample and filter all the triangles affecting the pixel at once, which is exactly what REYES does but not what rasterization does not. AMD's support for tessellation and geometry is decent enough at the moment. They just need to cut their ALU to geometry ratio to something more like the Fury rather than the Fury X, which might mean going about six wide for the front end on their new high end (Greenland?).

Did you see the screenshots of the batman models I posted? What is the factor of 4 tessellation on those models? Just one model is 8 maybe more even more millions of polys that's with adaptive tessellation. Now if you think AMD cards can handle that, good luck, I've been taking to other dev's that use UE4 and others making games, base polycounts on characters and scene assets are going to go up, and they will go up a lot. The reason for this, mesh detail is just as important as texture detail, but by adding in more mesh detail, pretty much the models for games coming out now are like the models we use for TV and movies, individual pieces, which are movable, more complex animations, etc. Originally, last year, we were targeting around what I stated before (75 K for main characters, 30k for secondary, and FOV 1-2 million), that has changed, specially since Pascal, and AI will be out, we have upped to 3 times that. And with other dev's I've talked to they are too, so if AMD doesn't fix their tessellation bottleneck, guess what, yeah you are going to see the same issue from many future games. So I do hope they fix it :)
 
Last edited:
well with 3 times the base polygon count for up coming games, they will need to fix that bottleneck is what I'm saying. When I first started talking to other teams about what the polygon counts should be around for next generation games around a year and half ago, I was surprised they were upping them so much, specially for characters. They are pretty much at movie level polygon details. When doing these types of objects, edge loops are really important to keep the indvidual pieces vertex noramls to look right. Its crazy how detailed these meshes are, here is one of a character I have been working on and there are no normal maps on it yet, not to mention I still have to add one more level of detail for the mesh.

http://i.imgur.com/1Q1JmNJ.jpg

when tesselated at a factor of 4 in UE4 the chest armor and back armor alone is around 2 million polys, base poly counts around 75k. The whole model which I haven't tested out yet, will probably end up around 8 million (350k base). We are expecting to have 10-15 of these types of characters on screen in multiplayer.
 
Last edited:
In my experience developers who have something meaningful to say about GPUs present runtime data to prove their point.
 
I already have done that that's where I got those tessellation numbers for the front and back armor for the character I'm making, when tested on Fiji, about a week before Fiji launch, even at x4 tessellation factor, it took a bigger hit than Maxwell, which wasn't too bad around 5%, but when going to factor of 8, we start looking at 30 million polys for those pieces, its more like 20% greater.
 
So you've now presented some runtime data that indicates a 5% comparative hit on GCN versus Maxwell at 4x tessellation. I don't see a problem at 4x. Does anyone else?
 
that's just one character (partial in this case), no environment. Total number of tessellated geometry has the same affect as one object? No it all adds up. If I add the 2 full characters with nothing else in the visible area, at x4 the hit is greater on Fiji than Maxwell, and greater that 5%, it increases fairly linear to the total number of procedural geometry created till you get to around 20 million polys where after that Fiji starts taking a larger hit, pretty much the geometry bottleneck overwhelms all other bottlenecks. I'm sure Maxwell will exhibit the same when its tessellation limits are hit but we haven't really gotten to that point yet.

Not really concerned about this as we are going to have multiple LOD's for the assets but at highest settings we will be using the LOD level 0 which is what we are testing on right now. Not concerned with the lower level LOD's till we get to a certain point in game development, pretty much when we get to an alpha stage. We are doing this a bit backwards, which I would like to do things more tradionally with the LOD's being done right off the bat but need to get somethings done first so the team doesn't lose interest.
 
Last edited:
So you're saying at 20 million polys (triangles?) created with 4x tessellation that the performance hit compared with Maxwell is much larger than 5%?

(That's 1.2 billion polys per second, at 60fps.)
 
I already have done that that's where I got those tessellation numbers for the front and back armor for the character I'm making, when tested on Fiji, about a week before Fiji launch, even at x4 tessellation factor, it took a bigger hit than Maxwell, which wasn't too bad around 5%, but when going to factor of 8, we start looking at 30 million polys for those pieces, its more like 20% greater.
If your base mesh is already high poly count, and you tessellate it further, you are not bottlenecked by the tessellation performance. You will be bottlenecked by the raw triangle/vertex throughput of the GPU. This is the same bottleneck that limits high poly rendering in general (and is not tied to tessellation).

Enabling the tessellation pipeline slows down the vertex throughput of a GPU, even when the tessellation factors are 1.0 (= no extra triangles are added). This is because each triangle patch will generate at least 3 domain shader calls, and the results of those must be separately calculated and stored (to parameter cache). This results in similar performance hit as using non-indexed geometry (not being able to share vertices with neighbor triangles). You can try to reduce some of this hit from the domain shader ALU by moving most of the calculation to the vertex shader. However this means that the GPU must temporarily store the vertex shader output somewhere and possibly move it from CU to another (this was a problem on GCN 1.0 and 1.1). Also on Nvidia, seems that also the best performance is achieved by reducing the number of shader stages (VS, HS, HS patch func, DS) and the data transfer between them to be as small as possible. Skipping both VS and HS stages is a good idea if you don't need them.

It is hard to make a tessellation pipeline that beats good baked LODs in performance. And if tessellation can't beat baked high poly meshes in performance, it has no good use case. The point of tessellation is to generate geometry on fly, in order to reduce the memory bandwidth (compared to a equally high polygon baked mesh) and thus make it faster to render it. In my experience static meshes always beat static tessellation (such as above stated 4x amplification) in performance (assuming tighly packed vertices), leaving dynamic tessellation (smoothly changing per edge factors) the only usable tessellation style. With good dynamic tessellation algorithm, you can get some gains, but it's not going to be easy.
 
If your base mesh is already high poly count, and you tessellate it further, you are not bottlenecked by the tessellation performance. You will be bottlenecked by the raw triangle/vertex throughput of the GPU. This is the same bottleneck that limits high poly rendering in general (and is not tied to tessellation).

Enabling the tessellation pipeline slows down the vertex throughput of a GPU, even when the tessellation factors are 1.0 (= no extra triangles are added). This is because each triangle patch will generate at least 3 domain shader calls, and the results of those must be separately calculated and stored (to parameter cache). This results in similar performance hit as using non-indexed geometry (not being able to share vertices with neighbor triangles). You can try to reduce some of this hit from the domain shader ALU by moving most of the calculation to the vertex shader. However this means that the GPU must temporarily store the vertex shader output somewhere and possibly move it from CU to another (this was a problem on GCN 1.0 and 1.1). Also on Nvidia, seems that also the best performance is achieved by reducing the number of shader stages (VS, HS, HS patch func, DS) and the data transfer between them to be as small as possible. Skipping both VS and HS stages is a good idea if you don't need them.

It is hard to make a tessellation pipeline that beats good baked LODs in performance. And if tessellation can't beat baked high poly meshes in performance, it has no good use case. The point of tessellation is to generate geometry on fly, in order to reduce the memory bandwidth (compared to a equally high polygon baked mesh) and thus make it faster to render it. In my experience static meshes always beat static tessellation (such as above stated 4x amplification) in performance (assuming tighly packed vertices), leaving dynamic tessellation (smoothly changing per edge factors) the only usable tessellation style. With good dynamic tessellation algorithm, you can get some gains, but it's not going to be easy.

Well, there's argument for it in the case of automatically producing continuous LOD instead of having to bake multiple LODs, which would be nice. Similarly there' a good argument for image quality, EG LEADR mapping (combination of LEAN mapping with tessellation) and a recent attempt at a kind of REYES in realtime. EG no LOD popping or baking would be really nice.

But considering there's still a large concentration on culling and etc. pure performance still seems the order of the day.
 
According to this AMD slide they are going to change many aspects of their GPU's, good to see the geometry processor in there.

 
More slides are on Videocardz.

Videocardz said:
Tomorrow AMD will unveil its new architecture called Polaris. This is a major upgrade in Radeon architecture since first 28nm GPU.

On the last slide, is the fact that the FinFET curve ends at around the middle of the graph significant? (The axes don't have numbers, so I'm not sure how many conclusions we can draw here.)
 
Last edited:
well without any markers can't really makes heads or tails of them. just have to assume what the graph is saying fmax is performance vs. power usage is going to be better which well should be lol to what degree the graph is kinda useless though.
 
According to this AMD slide they are going to change many aspects of their GPU's, good to see the geometry processor in there.
Hmm does that also mean the L2 is now unified even wrt render backend? That would be something which I'd have expected for gcn 1.2 already :).
 
More slides are on Videocardz.



On the last slide, is the fact that the FinFET curve ends at around the middle of the graph significant? (The axes don't have numbers, so I'm not sure how many conclusions we can draw here.)

Unmarked graph axis with vague labels are unmarked and vague :p

Assuming those are the real slides, probably, they're just some standard PR Powerpoint stuff to make themselves (AMD) look as good as possible. What any actual performance per watt or performance at all numbers are will have to wait until benchmarks are made as usual.
 
AMD-Polaris-Architecture-2.jpg
the 40nm die is ~66x61 pixels
the 28nm die is ~46x43 pixels
the finFET die is ~29*29 pixels

If this is picture is correct and to scale, then we are getting about a ~2.3x improvement in density from the new process. It would mean that 600mm^2 gpus on 28nm will be only about 260mm^2.
 
More slides are on Videocardz.

On the last slide, is the fact that the FinFET curve ends at around the middle of the graph significant? (The axes don't have numbers, so I'm not sure how many conclusions we can draw here.)

Don't know, but Fmax is higher for FinFET, even if you take the value at the end of the curve for 28nm, so I'd say that's a good sign.
 
Status
Not open for further replies.
Back
Top