AMD: Speculation, Rumors, and Discussion (Archive)

Frenetic Pony · Jan 3, 2016

Razor1 said:
Cracks on objects like that, LOD isn't being calculated correctly for the patches, its a problem with the code, which I stated .

Btw they are talking about subpixel aliasing, when geometry edge partially intersects a pixel, with sub pixel triangles does increase that. But the trade off, with sub pixel triangles is you don't get the strong aliasing of the many pixels at one edge of a poly, its harder to notice the aliasing vs the original not tessellated model.

Good read on god rays, but you are using Vram and shader performance, so there is a trade off with tesselation vs horse power and vram.

And also it seems to only work with one strong light source, where as with tessellation, it can be done with many light sources without extra hits on vram and pixel shader.

Sure, but current software solutions to crack filling are expensive, which is the point. It's why both consoles have full DX11 tessellation support but you don't see displacement mapped tessellation everywhere. A more programmable pipeline could make it easier to do so, but until that time or until someone shows the current somewhat fixed pipeline can work well in realtime I don't see displacement mapped tessellation being used a whole lot.

And while there you certainly need to incur a performance cost for something one way or another, considering Hollywood does raymarching and raytracing for volumetrics, and uses it for photo real results, I'm pretty sure it's going to stay that way. The tessellation method was just an old hack from, heck back from the geometry shader days and the second STALKER title if I recall correctly. The volumetric raymarching solution, which can re-use shadow maps and supports multiple light sources (did you even read it?) and support inhomogenous media, and supports scattering from indirect lightig, and is already much, MUCH faster even on Nvidia's cards than Nvidia's own tessellation solution is the clear winner.

And mass geometry aliasing is still a problem depending on the delta of change in between centroid samples, you still get flashes, even super bad sparkly results like you get with un anti-aliased normal maps and specular. And since you can only sample a single unfiltered triangle at a time, there's no reason to do sub-pixel triangles even if you could. To get better image quality you'd need a way to sample and filter all the triangles affecting the pixel at once, which is exactly what REYES does but not what rasterization does not. AMD's support for tessellation and geometry is decent enough at the moment. They just need to cut their ALU to geometry ratio to something more like the Fury rather than the Fury X, which might mean going about six wide for the front end on their new high end (Greenland?).

Razor1 · Jan 3, 2016

Frenetic Pony said:
Sure, but current software solutions to crack filling are expensive, which is the point. It's why both consoles have full DX11 tessellation support but you don't see displacement mapped tessellation everywhere. A more programmable pipeline could make it easier to do so, but until that time or until someone shows the current somewhat fixed pipeline can work well in realtime I don't see displacement mapped tessellation being used a whole lot.

Don't agree with that at all. Both consoles are based of GCN hardware, don't need to go any further than that.

And while there you certainly need to incur a performance cost for something one way or another, considering Hollywood does raymarching and raytracing for volumetrics, and uses it for photo real results, I'm pretty sure it's going to stay that way. The tessellation method was just an old hack from, heck back from the geometry shader days and the second STALKER title if I recall correctly. The volumetric raymarching solution, which can re-use shadow maps and supports multiple light sources (did you even read it?) and support inhomogenous media, and supports scattering from indirect lightig, and is already much, MUCH faster even on Nvidia's cards than Nvidia's own tessellation solution is the clear winner.

Only one light source produces god rays, if you have multiple light sources same intensity similar distances, you will need multiple shadow maps, hence multi amount of Vram consumption. Yes I read the paper, I think you didn't understand when I stated multiple lights, not just multiple lights, but multiple lights casting god rays.

Hollywood doesn't matter much to gaming, I work in special effects in TV and movies, getting that stuff to work in realtime, is the hack

, its always a compromise between speed and accuracy until hardware is fast enough or has memory large enough to do the real thing in real time.

And mass geometry aliasing is still a problem depending on the delta of change in between centroid samples, you still get flashes, even super bad sparkly results like you get with un anti-aliased normal maps and specular. And since you can only sample a single unfiltered triangle at a time, there's no reason to do sub-pixel triangles even if you could. To get better image quality you'd need a way to sample and filter all the triangles affecting the pixel at once, which is exactly what REYES does but not what rasterization does not. AMD's support for tessellation and geometry is decent enough at the moment. They just need to cut their ALU to geometry ratio to something more like the Fury rather than the Fury X, which might mean going about six wide for the front end on their new high end (Greenland?).

Did you see the screenshots of the batman models I posted? What is the factor of 4 tessellation on those models? Just one model is 8 maybe more even more millions of polys that's with adaptive tessellation. Now if you think AMD cards can handle that, good luck, I've been taking to other dev's that use UE4 and others making games, base polycounts on characters and scene assets are going to go up, and they will go up a lot. The reason for this, mesh detail is just as important as texture detail, but by adding in more mesh detail, pretty much the models for games coming out now are like the models we use for TV and movies, individual pieces, which are movable, more complex animations, etc. Originally, last year, we were targeting around what I stated before (75 K for main characters, 30k for secondary, and FOV 1-2 million), that has changed, specially since Pascal, and AI will be out, we have upped to 3 times that. And with other dev's I've talked to they are too, so if AMD doesn't fix their tessellation bottleneck, guess what, yeah you are going to see the same issue from many future games. So I do hope they fix it

Jawed · Jan 3, 2016

If you understood GCN you would know that 4x tessellation factor doesn't hurt GCN at all.

Alessio1989 · Jan 3, 2016

yes, it really begin to hurts over x8

Razor1 · Jan 3, 2016

well with 3 times the base polygon count for up coming games, they will need to fix that bottleneck is what I'm saying. When I first started talking to other teams about what the polygon counts should be around for next generation games around a year and half ago, I was surprised they were upping them so much, specially for characters. They are pretty much at movie level polygon details. When doing these types of objects, edge loops are really important to keep the indvidual pieces vertex noramls to look right. Its crazy how detailed these meshes are, here is one of a character I have been working on and there are no normal maps on it yet, not to mention I still have to add one more level of detail for the mesh.

http://i.imgur.com/1Q1JmNJ.jpg

when tesselated at a factor of 4 in UE4 the chest armor and back armor alone is around 2 million polys, base poly counts around 75k. The whole model which I haven't tested out yet, will probably end up around 8 million (350k base). We are expecting to have 10-15 of these types of characters on screen in multiplayer.

Jawed · Jan 3, 2016

In my experience developers who have something meaningful to say about GPUs present runtime data to prove their point.

Razor1 · Jan 3, 2016

I already have done that that's where I got those tessellation numbers for the front and back armor for the character I'm making, when tested on Fiji, about a week before Fiji launch, even at x4 tessellation factor, it took a bigger hit than Maxwell, which wasn't too bad around 5%, but when going to factor of 8, we start looking at 30 million polys for those pieces, its more like 20% greater.

Jawed · Jan 3, 2016

So you've now presented some runtime data that indicates a 5% comparative hit on GCN versus Maxwell at 4x tessellation. I don't see a problem at 4x. Does anyone else?

Razor1 · Jan 3, 2016

that's just one character (partial in this case), no environment. Total number of tessellated geometry has the same affect as one object? No it all adds up. If I add the 2 full characters with nothing else in the visible area, at x4 the hit is greater on Fiji than Maxwell, and greater that 5%, it increases fairly linear to the total number of procedural geometry created till you get to around 20 million polys where after that Fiji starts taking a larger hit, pretty much the geometry bottleneck overwhelms all other bottlenecks. I'm sure Maxwell will exhibit the same when its tessellation limits are hit but we haven't really gotten to that point yet.

Not really concerned about this as we are going to have multiple LOD's for the assets but at highest settings we will be using the LOD level 0 which is what we are testing on right now. Not concerned with the lower level LOD's till we get to a certain point in game development, pretty much when we get to an alpha stage. We are doing this a bit backwards, which I would like to do things more tradionally with the LOD's being done right off the bat but need to get somethings done first so the team doesn't lose interest.

Jawed · Jan 3, 2016

So you're saying at 20 million polys (triangles?) created with 4x tessellation that the performance hit compared with Maxwell is much larger than 5%?

(That's 1.2 billion polys per second, at 60fps.)

sebbbi · Jan 3, 2016

Razor1 said:
I already have done that that's where I got those tessellation numbers for the front and back armor for the character I'm making, when tested on Fiji, about a week before Fiji launch, even at x4 tessellation factor, it took a bigger hit than Maxwell, which wasn't too bad around 5%, but when going to factor of 8, we start looking at 30 million polys for those pieces, its more like 20% greater.

If your base mesh is already high poly count, and you tessellate it further, you are not bottlenecked by the tessellation performance. You will be bottlenecked by the raw triangle/vertex throughput of the GPU. This is the same bottleneck that limits high poly rendering in general (and is not tied to tessellation).

Enabling the tessellation pipeline slows down the vertex throughput of a GPU, even when the tessellation factors are 1.0 (= no extra triangles are added). This is because each triangle patch will generate at least 3 domain shader calls, and the results of those must be separately calculated and stored (to parameter cache). This results in similar performance hit as using non-indexed geometry (not being able to share vertices with neighbor triangles). You can try to reduce some of this hit from the domain shader ALU by moving most of the calculation to the vertex shader. However this means that the GPU must temporarily store the vertex shader output somewhere and possibly move it from CU to another (this was a problem on GCN 1.0 and 1.1). Also on Nvidia, seems that also the best performance is achieved by reducing the number of shader stages (VS, HS, HS patch func, DS) and the data transfer between them to be as small as possible. Skipping both VS and HS stages is a good idea if you don't need them.

It is hard to make a tessellation pipeline that beats good baked LODs in performance. And if tessellation can't beat baked high poly meshes in performance, it has no good use case. The point of tessellation is to generate geometry on fly, in order to reduce the memory bandwidth (compared to a equally high polygon baked mesh) and thus make it faster to render it. In my experience static meshes always beat static tessellation (such as above stated 4x amplification) in performance (assuming tighly packed vertices), leaving dynamic tessellation (smoothly changing per edge factors) the only usable tessellation style. With good dynamic tessellation algorithm, you can get some gains, but it's not going to be easy.

Razor1 · Jan 3, 2016

Yeah Jawed its around 10%, but its a linear increase, so it is to be expected.

Interesting Sebbbi I didn't think of it that way, but that seems feasible, I didn't think nV's hardware had that much better triangle trough put though.....

That seems like ya hit the nail on the head, seems like nV has a large advantage at triangle throughput.

http://techreport.com/review/28513/amd-radeon-r9-fury-x-graphics-card-reviewed/4

Frenetic Pony · Jan 3, 2016

sebbbi said:
If your base mesh is already high poly count, and you tessellate it further, you are not bottlenecked by the tessellation performance. You will be bottlenecked by the raw triangle/vertex throughput of the GPU. This is the same bottleneck that limits high poly rendering in general (and is not tied to tessellation).

Enabling the tessellation pipeline slows down the vertex throughput of a GPU, even when the tessellation factors are 1.0 (= no extra triangles are added). This is because each triangle patch will generate at least 3 domain shader calls, and the results of those must be separately calculated and stored (to parameter cache). This results in similar performance hit as using non-indexed geometry (not being able to share vertices with neighbor triangles). You can try to reduce some of this hit from the domain shader ALU by moving most of the calculation to the vertex shader. However this means that the GPU must temporarily store the vertex shader output somewhere and possibly move it from CU to another (this was a problem on GCN 1.0 and 1.1). Also on Nvidia, seems that also the best performance is achieved by reducing the number of shader stages (VS, HS, HS patch func, DS) and the data transfer between them to be as small as possible. Skipping both VS and HS stages is a good idea if you don't need them.

It is hard to make a tessellation pipeline that beats good baked LODs in performance. And if tessellation can't beat baked high poly meshes in performance, it has no good use case. The point of tessellation is to generate geometry on fly, in order to reduce the memory bandwidth (compared to a equally high polygon baked mesh) and thus make it faster to render it. In my experience static meshes always beat static tessellation (such as above stated 4x amplification) in performance (assuming tighly packed vertices), leaving dynamic tessellation (smoothly changing per edge factors) the only usable tessellation style. With good dynamic tessellation algorithm, you can get some gains, but it's not going to be easy.

Well, there's argument for it in the case of automatically producing continuous LOD instead of having to bake multiple LODs, which would be nice. Similarly there' a good argument for image quality, EG LEADR mapping (combination of LEAN mapping with tessellation) and a recent attempt at a kind of REYES in realtime. EG no LOD popping or baking would be really nice.

But considering there's still a large concentration on culling and etc. pure performance still seems the order of the day.

Razor1 · Jan 4, 2016

According to this AMD slide they are going to change many aspects of their GPU's, good to see the geometry processor in there.

iMacmatician · Jan 4, 2016

AMD: Speculation, Rumors, and Discussion (Archive)

Frenetic Pony

Razor1

Jawed

Alessio1989

Razor1

Jawed

Razor1

Jawed

Razor1

Jawed

sebbbi

Razor1

Frenetic Pony

Razor1

iMacmatician

Razor1

mczak

Frenetic Pony

Esrever

Alexko

Similar threads