Xenos and its special features: Educated expectations needed READ: post #126!

Any 360 dev willing to tell us about Xenos performance when it comes to self shadowing on the game you are currently working on. Is 3Dc helping you 360devs out now at all?
 
It's Fran :D

Yup, I know of devs using memexport. It has some bugs though and it's not always straighforward to use, but it can potentially be a win in certain scenarios.

You can use memexport when a title is doing PTR, just make sure you don't issue a memexport primitive inside a PTR bracket, just before or after, depending on what you are trying to do.

In fact, a typical frame with MSAA and memexport would be rendered something like this on 360:

- Submit all memexport primitives
- Render all shadow buffers
- Render all offscreen textures
- Begin predicated tiling
- Submit the main scene
- End predicated tiling (actual tiled rendering happens here)
- Render particles (?)
- Post processing (tonemapping, bloom, dof, whatever)
- Swap
- Pub

It's important to notice that "begin predicated tiling" is a kind of barrier that logically divides the engine flow, hence the shadow buffers are all prepared before rendering the scene for example. This introduces several architectural issues to deal with when using PTR. On another platform you would probably render a shadow buffer, render the part of the scene using it, then render another shadow buffer and so on, in order to reuse shadow buffer memory. Or any other combination that suites the engine. On the 360 you are pretty much forced to that scheme if you want to use PTR. The main benefit is you get MSAA more or less for free if you solve the problems related to resubmitting the same primitive for each tile.

Fran

Sweeeet! Fran, you don't have to tell us, but exactly how much ass is your game kicking?

Have your guys toyed with doing physics through the gpu?

Any experience with the tesselation unit?

How much has MS encouraged/helped your teams use of making the most of the machines abilities?

Thanks Fran! Great info here.
 
It's Fran :D

Yup, I know of devs using memexport. It has some bugs though and it's not always straighforward to use, but it can potentially be a win in certain scenarios.

You can use memexport when a title is doing PTR, just make sure you don't issue a memexport primitive inside a PTR bracket, just before or after, depending on what you are trying to do.

In fact, a typical frame with MSAA and memexport would be rendered something like this on 360:

- Submit all memexport primitives
- Render all shadow buffers
- Render all offscreen textures
- Begin predicated tiling
- Submit the main scene
- End predicated tiling (actual tiled rendering happens here)
- Render particles (?)
- Post processing (tonemapping, bloom, dof, whatever)
- Swap
- Pub

It's important to notice that "begin predicated tiling" is a kind of barrier that logically divides the engine flow, hence the shadow buffers are all prepared before rendering the scene for example. This introduces several architectural issues to deal with when using PTR. On another platform you would probably render a shadow buffer, render the part of the scene using it, then render another shadow buffer and so on, in order to reuse shadow buffer memory. Or any other combination that suites the engine. On the 360 you are pretty much forced to that scheme if you want to use PTR. The main benefit is you get MSAA more or less for free if you solve the problems related to resubmitting the same primitive for each tile.

Fran
Thanx Fran (Viva Italia!)

Posts like this make this forum be special place to learn ;)
 
It's Fran :D

Yup, I know of devs using memexport. It has some bugs though and it's not always straighforward to use, but it can potentially be a win in certain scenarios.

It would be greatly appreciated, if you can talk about it that is without risking spoiling stuff from your work, if you could give some form of example were memexport can be of great use. Although I can grasp what it does, I haven't really seen any "concrete" examples of were it can be of use.

You can use memexport when a title is doing PTR, just make sure you don't issue a memexport primitive inside a PTR bracket, just before or after, depending on what you are trying to do.

In fact, a typical frame with MSAA and memexport would be rendered something like this on 360:

- Submit all memexport primitives
- Render all shadow buffers
- Render all offscreen textures
- Begin predicated tiling
- Submit the main scene
- End predicated tiling (actual tiled rendering happens here)
- Render particles (?)
- Post processing (tonemapping, bloom, dof, whatever)
- Swap
- Pub

It's important to notice that "begin predicated tiling" is a kind of barrier that logically divides the engine flow, hence the shadow buffers are all prepared before rendering the scene for example. This introduces several architectural issues to deal with when using PTR. On another platform you would probably render a shadow buffer, render the part of the scene using it, then render another shadow buffer and so on, in order to reuse shadow buffer memory. Or any other combination that suites the engine. On the 360 you are pretty much forced to that scheme if you want to use PTR. The main benefit is you get MSAA more or less for free if you solve the problems related to resubmitting the same primitive for each tile.

Fran

Is this how your engine is working?:devilish:...
 
It would be greatly appreciated, if you can talk about it that is without risking spoiling stuff from your work, if you could give some form of example were memexport can be of great use. Although I can grasp what it does, I haven't really seen any "concrete" examples of were it can be of use.

Look for the presentation "Seven ways to skin a mesh" from a relatively recent MS event, it shows how MEMEXPORT is the best way to do skinning on the 360.
 
A little bit o' synchronicity ...

Since this topic is a question I has been rolling around in my skull for a wihle and I saw these links recently:

http://arstechnica.com/news.ars/post/20061115-8230.html
http://www.serpentine.com/blog/2007/02/22/a-quick-programmers-look-at-nvidias-cuda/

I am hoping to steer the thread back to the original topic (as I understand it), specifically what other value add does the Xenos offer besides it's graphical prowess ? Will developers be able to leverage some of it's resources to add havoc-ish physics or some-such, to add legs to the console so that it can compete with the ps3 once the cell starts being exploited to a fuller extent ?

I have read somewhere that using the gpu logic for crunching, say physics code, in conjunction with other more cpu like hardware, would be difficult since it doesn't normally do much serially speaking (pardon my ham-handed non-technical vocabulary since I might have a point but it could get lost in translation so to speak )
 
Look for the presentation "Seven ways to skin a mesh" from a relatively recent MS event, it shows how MEMEXPORT is the best way to do skinning on the 360.

That is interesting. I wonder if anybody uses it that way.
 
Last edited by a moderator:
It would be greatly appreciated, if you can talk about it that is without risking spoiling stuff from your work, if you could give some form of example were memexport can be of great use. Although I can grasp what it does, I haven't really seen any "concrete" examples of were it can be of use.

Memexport can potentially play pretty well with tiling. It's not "the best way to skin a mesh", in my opinion, but it's a tool that you can use in certain scenarios. The best way to skin the mesh is the simplest one to implement that does what you need as fast as you need it :)

But, imagine you have a complex animated character with some interesting and non trivial skinning, which possibly takes some time to render. With tiling and a big character on screen, spanning multiple tiles, you probably don't want to run the complex vertex shader doing skinning for every vertex of the mesh potentially for each tile. You also might not want to skin the mesh on the XCPU (while you would want to do it with Cell) before rendering "a la" Doom 3, because the GPU is much more efficient for this type of job.
In this situation you can use memexport to skin the mesh on the GPU, save the transformed vertices to a buffer before starting tiling, somewhere before the shadowmaps passes in my scheme in an earlier post, and use this vertex buffer to feed the GPU for all subsequent passes (shadowmaps, lighting passes, tiling). Notice that there is no CPU involved in this, it all happens in the GPU, writing out to main memory bypassing the EDRAM and reading again from there.

It can be a potential win and memexport is surely a tool you can leverage to overcome some of the problems related to tiling. It does come at the cost of added complexity to your rendering pipeline (two passes are more difficult to maintain than one) and there are some bugs and quirks to work around when using it.

Fran
 
...genius...

Fran, you failed to answer exactly how much ass your game is kicking.:???:

:p

Seriously though, can you talk about gpu-physics & tesselation unit?

Thanks Fran! Much appreciated to hear from a 360 dev of your calibre!

:D
 
Look for the presentation "Seven ways to skin a mesh" from a relatively recent MS event, it shows how MEMEXPORT is the best way to do skinning on the 360.

Thanks, I googled and downloaded some 500MB+ worth of powerpoint presentations and there was a lot of interesting stuff in there, thanks for the heads up...
 
Fran, you failed to answer exactly how much ass your game is kicking.

There must be a reason for my failure :D

Seriously though, can you talk about gpu-physics & tesselation unit?

I'm not too good with gpu-physics (read: I've never tried much GPGPU myself), so, well, any comment wouldn't be too useful. And about tesselation, I only gave a superficial look at that stuff, so, again, I wouldn't add too much information. I know it's a pretty fast unit, it does it's job, but it's not too flexible.

Thanks Fran! Much appreciated to hear from a 360 dev of your calibre!

:D

You are too kind!
 
Memexport can potentially play pretty well with tiling. It's not "the best way to skin a mesh", in my opinion, but it's a tool that you can use in certain scenarios. The best way to skin the mesh is the simplest one to implement that does what you need as fast as you need it :)

But, imagine you have a complex animated character with some interesting and non trivial skinning, which possibly takes some time to render. With tiling and a big character on screen, spanning multiple tiles, you probably don't want to run the complex vertex shader doing skinning for every vertex of the mesh potentially for each tile. You also might not want to skin the mesh on the XCPU (while you would want to do it with Cell) before rendering "a la" Doom 3, because the GPU is much more efficient for this type of job.
In this situation you can use memexport to skin the mesh on the GPU, save the transformed vertices to a buffer before starting tiling, somewhere before the shadowmaps passes in my scheme in an earlier post, and use this vertex buffer to feed the GPU for all subsequent passes (shadowmaps, lighting passes, tiling). Notice that there is no CPU involved in this, it all happens in the GPU, writing out to main memory bypassing the EDRAM and reading again from there.

It can be a potential win and memexport is surely a tool you can leverage to overcome some of the problems related to tiling. It does come at the cost of added complexity to your rendering pipeline (two passes are more difficult to maintain than one) and there are some bugs and quirks to work around when using it.

Fran

Cheers, that definitely added quite a bit of meat on the bones and I guess although maybe a good tool if you get it to work, not realy the wholy grail kind of thing.

Another thing, and sorry for getting more specific about Fable 2 as such or atleast its engine and I understand if you can not answer (and to the other guys for derailing the thread), but with all Sams hints and stuff curiosity is definitely starting to crawl under the skin. Anyway, in atleast in a couple of Sams milstone updates he has mentioned and even stressed the fact that you had "hight fields" in the engine working now.

The question is, are those hight fields the same as hight maps and if so is there anything particularly different with your hight maps? I though that hight maps was something very common in graphics engines and didn't really get why it was so important this time and why he stressed it, or is the incorporation of might maps in general a big step in engine development?

Finally, will there be a cheat to have the crotch cam in game?:devilish: ...
 
There must be a reason for my failure :D

:???: hmm ... I think I know... It's physically impossible to measure a level of kickass as high as your team is generating currently, thus to answer would not do your teams work justice.:p

I'm not too good with gpu-physics (read: I've never tried much GPGPU myself), so, well, any comment wouldn't be too useful. And about tesselation, I only gave a superficial look at that stuff, so, again, I wouldn't add too much information. I know it's a pretty fast unit, it does it's job, but it's not too flexible.

Interesting. Thanks Fran! :cool: Truly appreciated. Do you know where I can find out more behind the scenes info?

You are too kind!

I call it like I see it. ;)
 
Thanks Fran, some nice comments.

Anyway, I hope we will learn some more form gdc especially from havoc, ie do they intend to use xenos for some calculations in the 4.5 havoc sku (better implementation of havox FX for xenos)?
Where is MS in regard to automatic tiling + memexport?
Do they intend to bring some friendly Ati CTM (close to metal) like implementation to allow some gpgpu
use of xenos?
If the havoc4.5 sku work as good as advertised, MS won't have to keep up with Sony in graphic department (I would almost say that they're at their advantage here) but in physic and some kind of "interactivity".
Ms have to find a way to offload to xenos some calculations that could be really gpu friendly even if they sometime have to render at slightly lower resolution than 720P.
After all PGR3 still looks very good, it's all trade off depending of what goals devs are aiming at.

Can somebody make a sum up of the 500MB pp presentation?
Or quote some interesting slides?
 
Last edited by a moderator:
Where is MS in regard to automatic tiling + memexport?

How many times do you need to hear that there is no problem with automatic tiling + memexport, outside the need to do the memexport calculations before the scene render?
 
How many times do you need to hear that there is only the problem of requiring persistent storage for the generated geometry to feed it into the separate tile render passes, and it can never be fixed?
Fixed ;)
 

About how much additional storage space are we talking here? (sorry if it's already been asked)

Thanks Fran, some nice comments.

Anyway, I hope we will learn some more form gdc especially from havoc, ie do they intend to use xenos for some calculations in the 4.5 havoc sku (better implementation of havox FX for xenos)?

There is an offical press release by Havoc on the version 4.5 and it mainly states large improvements for the CELL architecture (10x something?). No word on Xenos/360 specific improvements (however"general" improvements on other platforms). I wonder why that should change at GDC since Havoc 4.5 is out already.
 
About how much additional storage space are we talking here? (sorry if it's already been asked)
It depends on the amount of geometry one generates per frame. It could be just a few hundred kB or a few MB. As a rough figure, both of the big consoles should easily be able to handle 1 million vertices per frame in a 30 fps game. Depending on what materials you use, we're looking at somewhere between 12 and 64 megabytes for the geometry data per frame. Some games will go above, some will stay below, and it really depends on the game what percentage of the per-framel geometry is procedural or generated.

Whatever the exact amount, in a tiled render, you need to keep procedural or generated geometry around to be able to render the second, third, nth tile correctly ... or you will have to recompute it n times per frame. If you keep it around in main memory it will on the way also consume bandwidth, once per tile, once per frame.

Whereas if you have just a single "tile" (i.e. no tiles at all actually), you can "fire and forget" batches of geometry. You don't even need temporary buffers, as you can push them over the caches through the GPU directly into the final frame buffer. Even if you don't do that (shadow volume extrusion, fine-grained culling, yadda yadda), you can break the "we generate our procedurals here" sub-period of the frame into as many small chunks as you want, each time just working on small amounts of data (extrude one model, render shadow, then discard the extrusion to make room for the next -- that sort of thing). You cannot do any of that with data you want to feed into a tiled render. It needs to be saved for later.

As always, the devil is in the details. I just find it pretty dishonest to claim that there are no problems coming from tiled rendering or even rebadging every solution to a problem as a "different way of doing things". Memory footprint for keeping procedurals is a cost, as resources are taken away from other tasks. That's just about the definition of "problem" no matter how you slice it.
 
Back
Top