What's the deal with HDR?

Titanio said:
Even using FP10, though, with any AA you'll be tiling anyway, so it's not the breaking point in that regard. Sure, you'll likely be tiling more with FP16, but I wonder if using it will be as easy regardless of that.
If each extra tile adds something like 3-5% performance hit, doubling the memory requirements from FP10 to FP16 could add as much as 10% performance hit I'd guess.
 
Blur has nothing to do with AA. Blur does not reduce edge aliasing. It makes it look worse.

Non-AA looks TERRIBLE on HDTVs. Xenos is DESIGNED to do AA. The whole daugher chip has millions of transistors dedicated to it. It would be crazy to not use it, you'd effectively throw away 90% of the benefit of the eDRAM daughter die.
 
DemoCoder said:
Blur has nothing to do with AA. Blur does not reduce edge aliasing. It makes it look worse.

Non-AA looks TERRIBLE on HDTVs. Xenos is DESIGNED to do AA. The whole daugher chip has millions of transistors dedicated to it. It would be crazy to not use it, you'd effectively throw away 90% of the benefit of the eDRAM daughter die.
Not if you didn't plain the render engine for what is needed to produce AA effectively on the Xenos. Since none of the devs had hardware like the Xenos when creating thier next gen engines they couldn't have taken it into account.
 
Shifty Geezer said:
If each extra tile adds something like 3-5% performance hit, doubling the memory requirements from FP10 to FP16 could add as much as 10% performance hit I'd guess.
Each extra tile doesn't add 3-5% performance hit.

According to thegamemaster:

"2xFSAA should only have a 1-3% performance penalty as you only need to render 2 tiles while a 4xFSAA should have about 3-5% in rendering 4 tiles."
http://forum.teamxbox.com/showpost.php?p=6140903&postcount=71

Going from 2 to 4 tiles incurrs ~2% performance hit. So about a 1% hit/tile at 4 tiles. This probably doesn't scale linearly though, I wouldn't know....
 
Dr. Nick said:
Not if you didn't plain the render engine for what is needed to produce AA effectively on the Xenos. Since none of the devs had hardware like the Xenos when creating thier next gen engines they couldn't have taken it into account.

According to the dev in the link above, Tiling can be implemented into a rendering engine after it's already been built, however this will not be as efficient as if the engine was built to use tiling from day 1, so instead of a 1-3% performance it would be more like 1-10%.
 
I'd just like to interject a couple questions:

What's a pseudo algorithm for the tiling :?:

Why is it trivial to code for from day one, but not as an addition for an engine?
 
scooby_dooby said:
According to thegamemaster:

"2xFSAA should only have a 1-3% performance penalty as you only need to render 2 tiles while a 4xFSAA should have about 3-5% in rendering 4 tiles."
http://forum.teamxbox.com/showpost.php?p=6140903&postcount=71

Going from 2 to 4 tiles incurrs ~2% performance hit. So about a 1% hit/tile at 4 tiles. This probably doesn't scale linearly though, I wouldn't know....
Well my figure may well be off, but there's no definite figure for how much one tile adds. It's a case of %age performance cost = %age polys crossing tile boundaries, and that can vary considerably from game to game, frame to frame. If you think for example of Kameo and an army of 2000 orcs coming over a hill. Assume the hill's crest is halfway down the screen and there's some 1000 Orcs just coming over, so there's 1000 orcs aligned near this split. Now if the screen is tiled in vertical columns, rendering left and right side of the screen, the number of orcs stradling the boundary will just be a few in the centre between halves. Whereas if the tiles are dividing the frame into top-half and bottom half across the middle of the screen, where the crest of our hill is, the number of orcs across tiles could be nearer 1000.

Perhaps that's where management of tiles based rendering becomes most important, making sure that some scenes don't have a higher tile crossover than others? Is the tile structure set by the developer or the hardware?

Though the average per tile probably isn't as high as I first suggested, I wouldn't be too surprised if there were instances when a 4 tile rendering could feature as much as a 10% hit in complex scenes with lots of small triangles.
 
This is the best info we have on ATI's method for tiling, from Dave's article:


ATI and Microsoft decided to take advantage of the Z only rendering pass which is the expected performance path independent of tiling. They found a way to use this Z only pass to assist with tiling the screen to optimise the eDRAM utilisation. During the Z only rendering pass the max extents within the screen space of each object is calculated and saved in order to alleviate the necessity for calculation of the geometry multiple times. Each command is tagged with a header of which screen tile(s) it will affect. After the Z only rendering pass the Hierarchical Z Buffer is fully populated for the entire screen which results in the render order not being an issue. When rendering a particular tile the command fetching processor looks at the header that was applied in the Z only rendering pass to see whether its resultant data will fall into the tile it is currently processing and if so it will queue it, if not it will discard it until the next tile is ready to render. This process is repeated for each tile that requires rendering. Once the first tile has been fully rendered the tile can be resolved (FSAA down-sample) and that tile of the back-buffer data can be written to system RAM; the next tile can begin rendering whilst the first is still being resolved. In essence this process has similarities with tile based deferred rendering, except that it is not deferring for a frame and that the "tile" it is operating on is order of magnitudes larger than most other tilers have utilised before.

There is going to be an increase in cost here as the resultant data of some objects in the command queue may intersect multiple tiles, in which case the geometry will be processed for each tile (note that once it is transformed and setup the pixels that fall outside of the current rendering tile can be clipped and no further processing is required), however with the very large size of the tiles this will, for the most part, reduce the number of commands that span multiple tiles and need to be processed more than once. Bear in mind that going from one FSAA depth to the next one up in the same resolution shouldn't affect Xenos too much in terms of sample processing as the ROP's and bandwidth are designed to operate with 4x FSAA all the time, so there is no extra cost in terms of sub sample read / write / blends, although there is a small cost in the shaders where extra colour samples will need to be calculated for pixels that cover geometry edges. So in terms of supporting FSAA the developers really only need to care about whether they wish to utilise this tiling solution or not when deciding what depth of FSAA to use (with consideration to the depth of the buffers they require as well). ATI have been quoted as suggesting that 720p resolutions with 4x FSAA, which would require three tiles, has about 95% of the performance of 2x FSAA.
 
Back
Top