Xenos hardware tesselator

nonamer · Nov 29, 2006

blakjedi said:
Is that the holy grail of visual acuity? Wouldnt you want to eventually aim for film grain whatever resolution that effectively is?

Believe it or not, pre-render CGI can reach upwards of 256xAA.

StefanS · Nov 29, 2006

blakjedi said:
Is that the holy grail of visual acuity? Wouldnt you want to eventually aim for film grain whatever resolution that effectively is?

the digital cinematographers currently use 4k progressive.

http://en.wikipedia.org/wiki/Digital_cinematography

pipo · Nov 29, 2006

hupfinsgack said:
the digital cinematographers currently use 4k progressive.

It's a good thing we don't have those huge screens then...

Shifty Geezer · Nov 29, 2006

blakjedi said:
Is that the holy grail of visual acuity?

I imagine that this'll be something of a holy grail. Higher resolutions have no benefit in the screen sizes these 1080p sets get to. 8xAA would elliminate much of the jaggyness. You could probably stop at 1080p 8xMSAA and put all efforts into shading etc., rather than upping the resolution and AA to little obvious effect (if any noticeable effect).

Graham · Nov 29, 2006

Shifty Geezer said:
I imagine that this'll be something of a holy grail. Higher resolutions have no benefit in the screen sizes these 1080p sets get to. 8xAA would elliminate much of the jaggyness. You could probably stop at 1080p 8xMSAA and put all efforts into shading etc., rather than upping the resolution and AA to little obvious effect (if any noticeable effect).

Well technically is there anything stopping you doing 1920x1080 with 4xaa on xenos right now? Sure it'll be 7 tiles though

4 tiles for 2xaa. Not having to store the front buffers in main memory is nice.

I'm quite interested in the tesselator from a differed rendering point of view. Ie throw gobs and gobs of geometry at the scene quickly (no complex shading) then do all your lighting and crazy work afterwards.

Can you mix tiling and MRT? (not that I'd use MRT in this case though - just a technical question).

I imagine it would be quite hard not wasting geometry accross tile lines when using the tesselator. You wouldn't be able to clip the geometry in software, as that would muck up the tesselation (I guess). Unless you clip the geometry exactly the same on both sides, but I'd expect you'd still see a line...

Am i right in thinking that 720p + 4xmSAA + fp10 HDR requires about ~30mb?

1280 * 720 * 4 samples * 8 bytes = 29491200 / 1024 / 1024 = 28.25, so fits in 3 tiles.

I also wonder if we will see many games render at, say, 1400x935 and then scale down to 1280x720, so supersampling instead of multisampling, but still in one tile.

Jawed · Nov 29, 2006

Graham said:
Can you mix tiling and MRT?

Yes.

Jawed

Fran · Nov 29, 2006

LightHeaven said:
Other than displace map and some utter particles what would more flexiblity allow you to do? Treat the geometry (ie. deform a car after an accident) before tesselating?

You can do very fast instancing of particles for example.

If so, you can use Xenon or even memexport to achieve the level of flexibility you want and them send the data for the tesselator? I mean, does it accept that kinda of input, or if that was the case you would have to tessalate with other means?

Yes, you can do adaptive tesselation with memexport and then feed the tesselator. It's pretty flexible.

And about perfomance... being a fixed hardware function one would expect that its kinda fast... Could you give any figure on how this tesselator compares in perfomance when doing displacement map, to lets say, a full featured GS powered Dx10 gpu?

It's very hard to compare at this stage. I would expect that current DX10 GPUs are not optimised for a large fan out from the geometry shader (when you basically produce more primitives than the primitives coming in), which is exactly where you want to use the hardware tesselator. But I can also expect that the mere raw power of a DX10 GPU can more than make up for it.

Thanks for the time

No need to thank me

Jawed · Nov 29, 2006

This patent application has some nice diagrams that give an overview of the methods:

Unified Tessellation Circuit and Method Therefor

and there's this too:

Method and Apparatus for Dual Pass Adaptive Tessellation

Jawed

AlNom · Nov 29, 2006

Graham said:
I also wonder if we will see many games render at, say, 1400x935 and then scale down to 1280x720, so supersampling instead of multisampling, but still in one tile.

Well actually, at least one dev (MotoGP3) has gone with 1280x1024. With the formula from the Xenos article that'll give you...

Question is... Is scaling from a 5:4 pixel to a 16:9 pixel better or worse than scaling a 4:3 pixel to 16:9? Keep in mind that the former needs only vertical scaling and the latter requires scaling on two dimensions... (assuming 720p output)

Cal · Nov 29, 2006

Ugh, why every thread about 360 strayed to predicted tiling, and then someone directly connected deferred shadowing with deferred shading to mess up things further, even these two are totally irrelevant.

Back to the topic, I haven't test the N-Patch performance on my 360 kit. But one thing I can ensure you is that displacement map via tessellation + vertex texture is far better than those relief/occlusion/parallax/horizon/whatever mapping, quality wise. Those micro-surface raytracing techs will just give you a full screen of jaggies. On performance side, thinking about how many possible passes a mesh have to be rendered: a pre-zpass, a shadow pass, an actual rendering pass, some potential additional passes for things like subsurface scattering. In each of them you have to use tessellator to ensure the correct result. If the engine does not take care things like avoiding to produce too many vertices, matching shadow pass's tessellation exactly with rendering pass's to get correct self-shadow, then you're screwed. So I think using tessellator is a big risk on performance.

Cal · Nov 29, 2006

blakjedi said:
I thought tiling takes place regardless at resolutions of 720 and above. What I dont understand is why cant post processing be done to the frame regardless of what was done in EDRAM?

What kind of post processing are you referring to? The tiles in EDRAM are resolved back to the system memory after rendering and the post processing can be done with the whole framebuffer (in system memory) as a fullscreen texture (no tiling during the post-processing rendering).

blakjedi · Nov 29, 2006

Cal said:
What kind of post processing are you referring to? The tiles in EDRAM are resolved back to the system memory after rendering and the post processing can be done with the whole framebuffer (in system memory) as a fullscreen texture (no tiling during the post-processing rendering).

Thanks Cal. That is exactly what I thought.

Gubbi · Nov 29, 2006

Shifty Geezer said:
I imagine that this'll be something of a holy grail. Higher resolutions have no benefit in the screen sizes these 1080p sets get to. 8xAA would elliminate much of the jaggyness. You could probably stop at 1080p 8xMSAA and put all efforts into shading etc., rather than upping the resolution and AA to little obvious effect (if any noticeable effect).

Depending of definition, human acuity is limited to ½ or 1 arc minute of angular resolution. That means that you won't even detect jaggies if you're sitting 3-5 meters from a 42" 1080 panel.

What is very easily detectable however is moire and other lack of filtering artifacts.

Of course doing a 8x super sampling of the picture would resolves much of that

But effort is probably better spent on improving filtering (AF and in-shader filtering)

Cheers

AlNom · Nov 29, 2006

I don't suppose one of you 360 devs could run an AF testing program ala the one used to compare the G80 to previous GPUs...

Just curious as to how good it is with angles.

Mintmaster · Nov 30, 2006

Graham said:
I also wonder if we will see many games render at, say, 1400x935 and then scale down to 1280x720, so supersampling instead of multisampling, but still in one tile.

I can't imagine that looking better than 2xMSAA, because the diagonal sample placement helps a lot. It would be a lot more expensive too.

Mintmaster · Nov 30, 2006

Lazy8s said:
eDRAM takes away significant amounts of silicon for potential execution logic regardless of whether the amount of eDRAM is "enough".

On NEC's 45nm EDRAM process (2009 is their plan), 64MB of EDRAM is only 45mm2. That's only going to be maybe 10% of the total silicon in a console. If we can get 100 GPix/s alphablending, that's good enough reason for me. "Execution logic" can only do so much in certain tasks.

Regarding the holy grail, I'd say 4xAA is plenty for a console at 1080p, and 32bpp is plenty for HDR, so that's why I used 64MB.

Graham · Nov 30, 2006

Mintmaster said:
I can't imagine that looking better than 2xMSAA, because the diagonal sample placement helps a lot. It would be a lot more expensive too.

Ohh I realise the performance would suck

no question there, I'm just thinking aloud, wondering how it would compare to using 2xaa upscaled (ala PGR3).

Cal, The depth pass, I have a vague memory of reading that the depth fill only pass was required (or at least strongly recommended) when using tiling.. Is this correct? Help out what tile geometry gets binned iirc

When I speak of deferred rendering I am meaning deferred shading/lighting etc. In case you were talking about my post. If a depth pass isn't required, I believe it would be possible to do deferred rendering in a single 'geometry pass' without using MRT. It wouldn't be 100% accurate, but who cares if it isn't accurate but still looks good (which I have yet to discover

).

3dcgi · Nov 30, 2006

Fran said:
You can't memexport inside a begin/end tiling bracket.

This is off topic of the tessellator, but I looked this up and the caveat was this might be slow because you might perform the same memexport twice, but the docs didn't say it can't be done.

Fran · Nov 30, 2006

3dcgi said:
This is off topic of the tessellator, but I looked this up and the caveat was this might be slow because you might perform the same memexport twice, but the docs didn't say it can't be done.

D3D will fire an assert if you try.

Lazy8s · Nov 30, 2006

The cost of a wide external bus or die area for embedded DRAM are not the only resources to trade against to solve the bandwidth issue. The rendering pipeline could afford some deepening to offer a more optimal approach.

Xenos hardware tesselator

nonamer

StefanS

meandering Velosoph

pipo

Shifty Geezer

uber-Troll!

Graham

Hello :-)

Jawed

Fran

Dev

Jawed

AlNom

Moderator

Cal

Cal

blakjedi

Gubbi

AlNom

Moderator

Mintmaster

Mintmaster

Graham

Hello :-)

3dcgi

Fran

Dev

Lazy8s

Similar threads