Are there any devs using tile rendering in 360 games?

Gholbine said:
You mean AA on the Xenos is not possible at all without using the daughter die?

The ROPs are on the daughter die - it can't output any pixels (AA or not) at all without using the daughter die. If the output is 720P with AA then tiling is used, but it is possible to run at a lower resolution with AA without using tiling (and then scale the 720P).
 
Dr Evil said:
Yes now that I took closer look at the game, it really is almost jaggie free...


Something tells me they didn't build the FEAR engine with predicated tiling in mind either... I wonder what the performance trade-off was. In any case, is it a solid 30fps or more like Halo's 30fps?
 
Dave Baumann said:
Number of tiles required is easy to find. Anything that is rendering natively at 720p (or above) with AA is running in tiled mode; standard definitions can run with 4x AA without tiled (this looks to be what back compatbile titles are using - rending at standard definition with 4x FSAA auto enabled and then upscaled to HD).
I am not 100% sure, but to me, the backward-compatible titles seem to be in a higher resolution than 480p.
 
[maven] said:
I am not 100% sure, but to me, the backward-compatible titles seem to be in a higher resolution than 480p.

The internal scaler chip scales them from the normal rendering resolution (be it 480i, 480p, or whatever) to 720p or whatever other res you choose. I believe you can fit the 480p + 4x AA in a single tile, so it all works out happily -- I'd be willing to bet that the few Xbox games that run in 720p natively won't have AA when run in a 360 (and will probably end up looking better run at 480p on Xbox360).
 
The GameMaster said:
without tiling the performance penalty is more akin to a 128-bit memory interface PC video card with similar clock rates and bandwidth will provide (30-40%) using 4xFSAA at 720p. They clould use 2xFSAA and fit it in the eDRAM without tiling and get the free performance though...

If you're not tiling, you can't use FSAA IIRC, at 720p. Any game that is not tiling has its frame only in eDram, so any framebuffer ops would only be using its bandwidth - but AA couldn't be one of them because it requires a greater footprint than the eDram can accomodate in one go. Either way the framebuffer is not being operated on through the main memory bus, if that's what you're saying.

The GameMaster said:
I do believe the UE3 engine can be modified to include predicted tiling, but how hard that is I don't know at the moment. Current generation graphics engines may not like XENOS very much, but XENOS was more designed for future (DirectX 10) graphics engines so likely it won't be until 2nd or 3rd generation games when we will start seeing effective use of the tiling.

DX10 isn't likely to require eDram, so I'm not sure how relevant a transition to DX10 would be to Xenos's situation in this regard.

Dave Baumann said:
Its already used if its using AA.

Not if you're rendering at "600p" ;)
 
Alstrong said:
Something tells me they didn't build the FEAR engine with predicated tiling in mind either... I wonder what the performance trade-off was. In any case, is it a solid 30fps or more like Halo's 30fps?

It's mostly solid 30fps, but there are few areas that struggles a bit.
 
Mintmaster said:
ERP, isn't it feasible to just send the entire scene down the pipe three times with different clip planes? I can't imagine that any of these first gen games are coming even close to pushing Xenos' geometry limits.

Sure, that's one way of doing it.

The issue with authoring for predicated tiling is related to how to best submit geometry.

You need to submit geometry multiple times, and if you do the Z prepass, the first pass has to have potentially different pixel and vertex shaders. You can do this in a very brute force fashion or you can try and optimise for the case.
The difference on the GPU between the two is likely dependant on batch sizea and screen coverage also you could unnecessarilly run expensive shaders in the first pass. The difference on the CPU and on Memory footprint however can also be significant.

Xenos has support for a number of things that can reduce this overhead, but you need to organise your data in a fashion so that you can actually use them effectively.

The other issue is none of this is black and white there is no best way to do this, you really need to experiment and test inside your application.

For example I could imagine edge cases where 3 tiles would be faster than 2 just because of the way your draw primitives are distributed.
 
Questions:

Is the fillrate additive among multiple tiles or do you get 4GPixel/s theoretical max per tile? (Was wondering if it might be feasible to render to say... 2560x1440 using a bunch of tiles and then the output scaler downscaling to 720p for some 4x SSAA action ;) )

Just a random thought, sorry if it's dumb. :p
 
Alstrong said:
Is the fillrate additive among multiple tiles or do you get 4GPixel/s theoretical max per tile?
The GPU only works on one tile at a time (naturally). :) If it could work on multiple tiles at a time there'd be no reason to tile in the first place - as presumably you'd be able to fit the entire screen buffer into the eDRAM memory area... It would still be possible to do supersampling by drawing multiple tiles and then filtering them down, but it'd likely be pretty memory expensive. Remember, machine only has 512MB of system memory and it's there the buffer would have to reside. Perhaps it'd be better to filter down each tile BEFORE storing it in main memory, that way it wouldn't require any extra space. Also, if it was done on the CPU, it'd put that monster FSB to good use too. :D
 
If you're not tiling, you can't use FSAA IIRC, at 720p.
I'm pretty sure most all games out right now are using a shader-ized multisampling to fake their own AA or some alternative that gets them similar results. Yeah, it's wasteful to be copying around rendertargets like that, but it saves time and effort as opposed to tiling.

At least a shader implementation can be used beyond just faked FSAA -- you can AA other texture read ops such as shadowmap tests.
 
ShootMyMonkey said:
I'm pretty sure most all games out right now are using a shader-ized multisampling to fake their own AA or some alternative that gets them similar results. Yeah, it's wasteful to be copying around rendertargets like that, but it saves time and effort as opposed to tiling.
Sooo ... where are they storing those multiple samples exactly?
 
Sooo ... where are they storing those multiple samples exactly?
Texture copy of the back buffer (or a rendertarget that would otherwise have been the back buffer). Sample a texel multiple times to get a fractured extimate of what the superresolution image would be like and do a weighted average when writing out. Is it in anyway correct? No. Do we care? Hell no. Does it pass TRCs? Yep (if done well)... MSAA is a hack to begin with, so why not hack the hack?

Similarly, you can pull games like Depth-of-Field which anyway involves blurring the backbuffer, or Motion blur, which will hide a lot of jaggies for you without any extra AA faking. Well, the point is that anything that can look as good on average as 2xMSAA is good enough to pass. I'm not saying everybody is doing exactly as I described, but that most everybody is applying something in that realm of thinking at this stage.
 
I think that tile based rendering started here...

GigaPixel takes on 3dfx, S3, Nvidia with… tiles
Tile-based rendering faster, better looking than polygons claims company
By Tony Smith in San Jose
Published Wednesday 6th October 1999 20:24 GMT
Get breaking Management news straight to your desktop - click here to find out how
US 3D graphics specialist GigaPixel this week issued a challenge to the likes of 3dfx, Nvidia, S3 and ATI -- the company claims its GP-1 chip, based on its Giga3D architecture, has rival products well and truly licked on both image quality and performance. What makes GP-1 interesting is its use of a tile-based rendering scheme instead of the traditional polygon approached used by every other mainstream graphics accelerator.

According to GigaPixel CEO George Haber, GP-1 breaks a scene into a series of small tiles and renders each individually. That, he said, allows the chip to render a fully anti-aliased image -- which is where GP-1 gets its superior image quality from -- without the massive processing overhead it usually requires. Tiling the image also considerably reduces the card's bandwidth requirements by a factor of ten, he added, which also increases performance. GP-1 is fully compatible with the Direct3D and OpenGL APIs, Haber said.

The chip takes the polygon-based description of the scene and converts it into the sequence of tiles. Each tile is rendered and shaded as required then sent to the frame buffer. Haber demonstrated Quake II running on a GP-1 reference board and a Matrox 3D card, and while the frame rate appeared comparable, with its anti-aliasing, the GP-1's output certainly looked better. Of course, 3dfx's upcoming Napalm board will offer anti-aliasing -- though not full-scene anti-aliasing -- when it ships in Q1 2000. Haber dismissed the Napalm's T-buffer technology on a technical level since the board has to render each scene four times to make anti-aliasing work, but what does that matter if 3dfx can make it work cost-effectively and retain high frame rates? 3dfx's system will also offer other effects, such as motion blur and smooth shadows, which as yet the GP-1 does not. That said, the company's approach should give it the processing headroom to add more advanced features to later versions of the chip.

GigaPixel plans to license its technology to third-party graphics card vendors, much as Nvidia does. Haber said the company was already talking to at least one board manufacturer, but he would not name them or say how close the two firms are to a licensing deal. ®

ATI multi-card rendering details emerge
Turns to tiles - but where's the grouting?

One particularly interesting aspect of Hexus' revelation is ATI's use of a tile-based rendering scheme. Instead of doing the whole scene as one, the image is partitioned into squares, the better to minimise the bandwidth needed to bat a rendered tile from one card to the other.

It's an interesting trick that goes back to the early 1990s when Imagination Technologies was developing its PowerVR line of graphics chips. PowerVR technology was used in Sega's ill-fated Dreamcast, which used a tile-based rendering scheme, as did other PowerVR-based products such as the Kyro range of graphics cards.


http://www.theregister.co.uk/2005/05/06/ati_mvp_details/
 
I'm pretty sure most all games out right now are using a shader-ized multisampling to fake their own AA or some alternative that gets them similar results. Yeah, it's wasteful to be copying around rendertargets like that, but it saves time and effort as opposed to tiling.

So tiling with the EDRAM is still not being used. Wonderful.

Sure, that's one way of doing it.

The issue with authoring for predicated tiling is related to how to best submit geometry.

You need to submit geometry multiple times, and if you do the Z prepass, the first pass has to have potentially different pixel and vertex shaders. You can do this in a very brute force fashion or you can try and optimise for the case.
The difference on the GPU between the two is likely dependant on batch sizea and screen coverage also you could unnecessarilly run expensive shaders in the first pass. The difference on the CPU and on Memory footprint however can also be significant.

Xenos has support for a number of things that can reduce this overhead, but you need to organise your data in a fashion so that you can actually use them effectively.

The other issue is none of this is black and white there is no best way to do this, you really need to experiment and test inside your application.

For example I could imagine edge cases where 3 tiles would be faster than 2 just because of the way your draw primitives are distributed.

I get the feeling that if you really want to optimize your game, you dont want to use tiling at all. Because it's going to knock 10-20% performance off. And that's really the opposite of really coding to the metal. That PS3 devs are going to be doing all day long.

I wonder if devs could just skip the EDRAM altogether.

I'm beginning to hope 576P ala PGR3 upscaled becomes the norm. Maybe salvage something from the EDRAM, or at least not be taking a hit from it.
 
Last edited by a moderator:
Bill said:
So tiling with the EDRAM is still not being used. Wonderful.



I get the feeling that if you really want to optimize your game, you dont want to use tiling at all. Because it's going to knock 10-20% performance off. And that's really the opposite of really coding to the metal. That PS3 devs are going to be doing all day long.

I wonder if devs could just skip the EDRAM altogether.

I'm beginning to hope 576P ala PGR3 upscaled becomes the norm. Maybe salvage something from the EDRAM, or at least not be taking a hit from it.

Ive never heard ANYONE say the performance hit due to the extra geometry work being done for tiling is 10-20%. Ive heard a few make the point that with a US model the Xenos can really power through it with all the pipes doing geomtery.

I know youve been skeptical Bill but lets see the next wave of games that come out before we judge. For the most part, every developer with a launch titles mentions a compressed timeframe as part of their development cycle.

From whats starting to surface now with "gen 1.25" games i think we're going to end up impressed. I really dont think MS would have made this decision if they didnt think it would end up being a benefit in the end.

My question is how 'hard' is it for developers to implement tiling, how do the dev tools assist with getting it right and optimized?
 
I agree with bill. Free AA my ass :mad:
I think that Ms screwed them up with EDRAM.
If you want to give free aa to devs , give em 30mb edram and nobody can accuse you as a Bshitter.
But if you give them 10mb , because you dont want to spend more money, then you dont have the right to speak about free AA.
 
groper said:
I agree with bill. Free AA my ass :mad:
I think that Ms screwed them up with EDRAM.
If you want to give free aa to devs , give em 30mb edram and nobody can accuse you as a Bshitter.
But if you give them 10mb , because you dont want to spend more money, then you dont have the right to speak about free AA.
I don't think 30 mb edram are what you'd call as free.
10 mb are enough, just let developers do their work, it takes time
 
Bill said:
So tiling with the EDRAM is still not being used. Wonderful.



I get the feeling that if you really want to optimize your game, you dont want to use tiling at all. Because it's going to knock 10-20% performance off. And that's really the opposite of really coding to the metal. That PS3 devs are going to be doing all day long.

I wonder if devs could just skip the EDRAM altogether.

I'm beginning to hope 576P ala PGR3 upscaled becomes the norm. Maybe salvage something from the EDRAM, or at least not be taking a hit from it.
If 576P with a "whopping" 2Xaa becomes norm, I'm selling my X360. I can still see jaggies badly at that res. 1280-720 with "fake" AA would actually look better.
 
Back
Top