HDR+AA...Possible with the R520?

I assumed the "slightly transparent" edges of blades of grass etc are simply the result of common-or-garden (bilinear, etc.) filtering.

The edges aren't blending with what's behind them, they're merely the result of a filtering of a transparent texel next to a "green" texel.

But anyway, I'm not qualified to talk about the intricacies of transparent texels and rendering order sadly...

Jawed
 
Chalnoth said:
Jawed said:
But a texel in a grass texture is either transparent (nothing to blend) or opaque (nothing to blend) isn't it?
Actually, I don't think UT2004 uses a plain alpha test for any of these surfaces, be they grass or fencing. If you look closely, you will notice that the edges are slightly transparent. The game uses a combination of an alpha blend (to reduce aliasing) and an alpha test (so that it may go unnoticed if the surface is rendered in the wrong order).

So no, the grass isn't simply 100% transparent or 100% opaque.
True, but you aren't going to get many blends on top of each other here. We're talking about the intersection of grass blade outlines now (the blended areas), and such intersections will not have a big enough area or similar enough colour to see banding.

A bigger problem may be when rendering smoke etc., which have many overlapped layers with similar colours. But if you're doing HDR and tone mapping with FP10, you'll eventually map all 1024 (essentially logarithmically spaced) levels of intensity to a 8-bit integer range anyway. If you choose the right scale for blending into the FP10 framebuffer, I'm pretty sure the final result will be better than the 32-bit rendering we're doing now. It'll be even better if there's dithering, as I think is found in NV40's FP16 blending.
 
Bob said:
Developers would likely hate it if their games has weird single-channel banding or shimmering issues that weren't caught during QA. Banding, although ugly, isn't distracting. Channel smearing typically is. If that smearing is unpredictable (due to the dynamics of the game, for instance), then not all cases may be testable / fixable.
I was trying to think of such a scenario when someone suggested an SRGB framebuffer before, but failed. Could you give an example of what this smearing might look like? I have a hard time seeing how 9-bits per channel plus a 5-bit shared exponent would look any worse than current blending. The additional hardware required seems to be a few bit-shifters and comparators, which is probably very small compared to the blending logic.
 
I suppose the prospect of R520 being capable of HDR rendering and AA concurrently at sensible frame rates goes some way towards making sense of the idea that it's a large (300 million+ tranny) chip while sticking with 16 fragment pipelines (although I grant the whole issue of pipelines and their importance in qualifying performance is in flux). If this is the case then hats off to ATI for being brave and forward looking. Though you might say NV was forward looking with NV30 and look where that got them.

I suppose that my only doubt concerning R520 remaining a 16 pipe chip is that if this is the case then it follows from that snippet of info that was discussed a while back that both RV530 and RV515 are 4 pipe chips, and presumably benefit from the same HDR friendly architecture. Anyone have any thoughts on all, even if it is a little off-topic?
 
d. Could you give an example of what this smearing might look like?
I can't think of a practical example off-hand, but here's a rather contrived one. Note that this example would work just fine with non-shared exponent formats.

Assume the framebuffer is initially cleared to 0.

Set the blend mode to subtract: new dest = source - dest

In the first pass, render the color (+1000.0, +1.0, 0).

You compute (1000, 1, 0) - (0, 0, 0) = (1000, 1, 0), but that ends up being stored as (1000, 0, 0) because RGBE doesn't have enough mantissa bits to store the green channel.

Now, render the same color on top of the first one.

You compute (1000, 1, 0) - (1000, 0, 0) = (0, 1, 0), stored as (0, 1, 0) (green).

With a fp11/fp11/fp10 render target, the final value would be (0, 0, 0) (black).

(depending on the exact number of exponent and mantissa bits, you may need to scale the numbers I used).
 
Jawed said:
I assumed the "slightly transparent" edges of blades of grass etc are simply the result of common-or-garden (bilinear, etc.) filtering.

The edges aren't blending with what's behind them, they're merely the result of a filtering of a transparent texel next to a "green" texel.
If they weren't blending with what's behind them, they'd be aliased.
 
Bob said:
I can't think of a practical example off-hand, but here's a rather contrived one.
Hmm, a practical example is what I'm looking for, actually. I was able to envision scenarios similar to what you just provided, but the way I thought it through, I figured tiny numbers that lose precision through the shared exponent won't contribute much to the final colour anyway.

The only thing I could think of is something stupid like putting on green transparent goggles on a very red planet, and subsequently using autoexposure so that it's not overly dark.
 
subtraction is always something terrible as it can remove the non-artefact part of numbers, and let only the artefacts stay. this is a nice example of it.

then again, how often does one use subtractive blending? for general gaming scenes, people most the time use additive blending, or linear interpolation. and for those two, RGBE would work very good imho. i can't imagine an example where it wouldn't work really (in a way that it would be visible in the final image).

but for GPGPU tasks, of course, subtractive blending, and much other ways, could be possible. then again, a GPGPU task should NEVER run on low-precicion or compressed formats anyways. thats why i hope there is support for fp16 (AND fp32) blending anyways. just in case one needs that precicion.

but for current games, RGBE should not result in really visible, only possibly measurable artefacts.

and please don't prove me wrong, i don't want to get my bubble bursted :D
 
Xmas said:
Jawed said:
I assumed the "slightly transparent" edges of blades of grass etc are simply the result of common-or-garden (bilinear, etc.) filtering.

The edges aren't blending with what's behind them, they're merely the result of a filtering of a transparent texel next to a "green" texel.
If they weren't blending with what's behind them, they'd be aliased.

I had a quick play. I see the grass is blended, but tree leaves are not. I thought the grass was the same as the leaves - my mistake.

Jawed
 
caboosemoose said:
I suppose the prospect of R520 being capable of HDR rendering and AA concurrently at sensible frame rates goes some way towards making sense of the idea that it's a large (300 million+ tranny) chip while sticking with 16 fragment pipelines (although I grant the whole issue of pipelines and their importance in qualifying performance is in flux). If this is the case then hats off to ATI for being brave and forward looking. Though you might say NV was forward looking with NV30 and look where that got them.

I suppose that my only doubt concerning R520 remaining a 16 pipe chip is that if this is the case then it follows from that snippet of info that was discussed a while back that both RV530 and RV515 are 4 pipe chips, and presumably benefit from the same HDR friendly architecture. Anyone have any thoughts on all, even if it is a little off-topic?

I think the only way ATI would of totally avoided pure pipeline increases on this core is if they created the daunted "Super Pipeline" that was mentioned briefly in the wave of rumors. With each pipeline doing X more work then the ones we have today. Theres really so many possabilities though.
 
I'm expecting ATI to make R520's fragment pipelines capable of 2x MAD per clock (similar to G70).

Jawed
 
Jawed said:
I'm expecting ATI to make R520's fragment pipelines capable of 2x MAD per clock (similar to G70).

Jawed

I'm want the same - that the pre ALUs (sometimes called mini-alus) are upgraded to be a full MADD ALU (ie like the existing full ALU in R4X0).

Then 4 quads clocked at 650Mhz should be a good match for G70.

But maybe R520's ALUs were finalised when ATI had the real leap over NV's shader performance and therfore it's more likely that R580 may have the upgrade to two 'full MADD ALUs'?
 
SugarCoat said:
I think the only way ATI would of totally avoided pure pipeline increases on this core is if they created the daunted "Super Pipeline" that was mentioned briefly in the wave of rumors. With each pipeline doing X more work then the ones we have today. Theres really so many possabilities though.
Given what we know of the Xenos, it is highly unlikely that ATI is going in this direction. ATI seems to be moving in the direction of more simple pipelines rather than fewer complex pipelines.

nVidia's the one that's going the "extreme pipeline" route.
 
Chalnoth said:
From what I recall, the R420 is more of a TEX + ALU + mini ALU design.

Yeah. Texture sampler + vec3 full/mini ALUs + scalar full/mini ALUs.
 
My understanding of R420's "mini-ALU" (vec+scalar) is that it supports the PS1.4 instruction set.

That presumably means it's limited to integer, and a reduced set of instructions.

I'm not saying that it's trivial to convert a PS1.4 ALU into a PS3 ALU, but I do suspect (hope, hahahaha) that rumoured 1.3x increase in capability per "pipe" comes from implementing a pair of PS3 ALUs (vec3+scalar).

Jawed
 
The entire PS pipeline of R300+ is FP24 Float, its only converted to integer at the backend. The "mini ALU" has PS1.4 mdifiers in there but ATI maintain thats not all, but not specifying what else there is (IIRC).
 
Are there any specific PS1.4 benchmarks for 8500 versus 9700Pro (etc.) that give a clue as to whether R300 can, for example, dual-issue PS1.4 operations?

I kinda suspect not, rummaging back through old articles...

Jawed
 
Back
Top