If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#876 |
|
Senior Member
|
That's probably an effect from the Snipping Tool (ignores the desktop gamma setting?) - the rest of the shots were captured by FRAPS.
Temporal stability is OK-ish to me, but I have only tested with this old OGL demo for now.
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic. Microsoft: Russia -- Big and bloated. Linux: EU -- Diverse and broke. |
|
|
|
|
|
#877 |
|
Senior Member
Join Date: Nov 2004
Location: Ohio
Posts: 1,209
|
Yeah it's probably better to capture all screens using the same method for consistancy.
|
|
|
|
|
|
#878 | |
|
Crazy coder
|
Quote:
|
|
|
|
|
|
|
#879 | |
|
Member
Join Date: Nov 2007
Posts: 947
|
Quote:
And then there is the tonemapping (and gamma) issue as well. You have to keep the edge blocks at sample frequency until you do tone mapping and gamma. So basically you need to do your post processing also with stencil masked sample frequency (bloom combine, low res particle buffer combine, etc, should be done before tone mapping). Doing this properly adds extra cost. So basically you cannot do 2xMSAA in under 3ms (3ms is 18% of your frame time if you aim at 60 fps). And 2xMSAA antialiasing quality isn't worth that big sacrifice. And it doesn't do anything to transparency edges and specular aliasing. I don't like how PC drivers apply post process AA after UI rendering. It makes text look blurry. Post AA should be applied before UI rendering. Last edited by sebbbi; 20-May-2011 at 08:46. |
|
|
|
|
|
|
#880 | ||
|
Senior Member
Join Date: Mar 2010
Posts: 1,283
|
Being the stalker that I am...
Quote:
Could this mean that BF3 will use MLAA? Quote:
Last edited by Ruskie; 21-May-2011 at 01:14. |
||
|
|
|
|
|
#881 |
|
penguins
Join Date: Feb 2004
Posts: 13,978
|
__________________
|
|
|
|
|
|
#882 | ||
|
Senior Member
Join Date: Mar 2010
Posts: 1,283
|
Update on FXAA,now version v3.
Quote:
Quote:
|
||
|
|
|
|
|
#883 |
|
Member
Join Date: Nov 2007
Posts: 947
|
Did some dynamic branch optimizations for FXAA2 today. Now it runs at 0.9ms on Xbox with a ifAll branch.
I was searching DX11 documentation for a way to branch depending on the result of all threads in the same branching unit (ifAll, ifAny). Cuda has __all and __any, but I can't find equivalents in DirectCompute/DX11... DirectCompute doesn't have any way to query the size of a warp/wavefront, so maybe this feature was too low level as well. |
|
|
|
|
|
#884 |
|
Senior Member
Join Date: Mar 2010
Posts: 1,283
|
Well,thats really fast.I wonder if FXAA v3 code is available now?It suppose to have better quality and performance.
Here is DFs comparison in Enslaved(No AA-console FXAA) http://img705.imageshack.us/img705/8053/fxaa001.png |
|
|
|
|
|
#885 | |
|
Senior Member
Join Date: Feb 2002
Posts: 2,021
|
Quote:
|
|
|
|
|
|
|
#886 |
|
AndyTX
Join Date: May 2004
Location: British Columbia, Canada
Posts: 1,841
|
It's non-portable: it lets you write code that will run differently based on the SIMD size of the underlying implementation, which is not allowed for obvious reasons. As mentioned of course typical dynamic branching should perform well enough on modern PC cards... there's fairly little overhead to adding a branch.
__________________
The content of this message is my personal opinion only. |
|
|
|
|
|
#887 | |
|
Member
Join Date: May 2010
Location: California
Posts: 110
|
Quote:
|
|
|
|
|
|
|
#888 | |
|
Member
Join Date: Feb 2010
Posts: 170
|
Quote:
|
|
|
|
|
|
|
#889 |
|
Member
Join Date: Nov 2007
Posts: 947
|
It doesn't use any memory at all.
G-buffers are not used after post processing, so we have two full screen buffers unused at that point of rendering. We resolve the post processed back buffer to one of our g-buffer textures. The AA samples texels from the g-buffer and outputs pixels to EDRAM. UI is drawn on top of the antialiased result. MLAA would also cost no memory, as two g-buffers are enough to keep it's temp results (MLAA needs two temp buffers). Basically any reasonable post AA filter would have zero memory usage when used in deferred renderer. With forward rendering you would likely get a memory hit, unless you for example reuse a shadowmap memory area to store the AA temp results. On consoles you can overlap multiple textures to the same memory areas, so it doesn't matter that the shadowmap uses different format than RGBA8. And for forward renderer, you could also use hardware MSAA. Post process AA filters are most useful for deferred renderers. |
|
|
|
|
|
#890 | |
|
Member
Join Date: May 2010
Location: California
Posts: 110
|
Quote:
|
|
|
|
|
|
|
#891 | |
|
Member
Join Date: Nov 2007
Posts: 947
|
Quote:
For example a simple if-else branch: if (needSlowPath) execute40Instructions() else execute20Instructions(); With standard branching you do 60 instructions if both sides of the branch occur inside a single SIMD (and predicate the invalid instructions for non appliable threads). With ifAny branch, if any of the threads inside a SIMD evaluate needSlowPath = true, all pixels in the SIMD will pass the if-statement. So you only execute 40 instructions. All threads will jump over the 20 instructions (including those that would evaluate needSlowPath = false). And it also guarantees that all threads inside the SIMD run exactly the same instructions (no predicates, etc needed). Standard if: Slow path for all pixels: 40 instructions Mixed slow and fast path: 60 instructions Fast path for all pixels: 20 instructions ifAny/__any: Slow path for all pixels: 40 instructions Mixed slow and fast path: 40 instructions (saves 20 instructions here) Fast path for all pixels: 20 instructions Of course you can only use this kind of branching for cases, where one of the execution paths satisfies both comparison results. You are correct that the hardware SIMD size affects how this kind of branches get executed. But if branch A provides correct results for all threads, then it's safe to execute any amount of branch B threads using the branch A. The smaller SIMD size, the more branch B threads will execute branch B. The programmer has to be really careful however, since both branch sides can be optimized slightly differently, and there can be different float rounding ect, making the calculation possibly slightly nondeterministic. |
|
|
|
|
|
|
#892 | |
|
AndyTX
Join Date: May 2004
Location: British Columbia, Canada
Posts: 1,841
|
Quote:
This is a more important optimization on CPUs anyways as predication currently costs instructions there. On GPUs predication is mostly "free" so all you're potentially saving is the other side of the branch when you have a general case and a specific case (for warps which diverge).
__________________
The content of this message is my personal opinion only. |
|
|
|
|
|
|
#893 | |
|
Member
Join Date: Nov 2007
Posts: 947
|
Quote:
Many of the most efficient CUDA algorithms for calculating stuff like prefix sum depend on intra warp optimizations. The SIMD width can be used as a powerful synchronization tool, since basically you get a free warp wide synch barrier after each instruction. I wonder however, what will happen if NVIDIA chooses to use different warp size than 32 in their future GPUs. Many highly optimized CUDA algorithms (also featured in popular CUDA libraries) will break completely. That will be a mess for sure |
|
|
|
|
|
|
#894 | |
|
AndyTX
Join Date: May 2004
Location: British Columbia, Canada
Posts: 1,841
|
Quote:
__________________
The content of this message is my personal opinion only. |
|
|
|
|
|
|
#895 |
|
Member
Join Date: Jan 2010
Posts: 117
|
|
|
|
|
|
|
#896 | |
|
Member
Join Date: Aug 2009
Posts: 836
|
Quote:
Every console game released without any kind of AA in 2012, will be consider as a tech disappointment for me. |
|
|
|
|
|
|
#897 | |
|
AndyTX
Join Date: May 2004
Location: British Columbia, Canada
Posts: 1,841
|
Quote:
__________________
The content of this message is my personal opinion only. Last edited by Andrew Lauritzen; 06-Jun-2011 at 20:13. |
|
|
|
|
|
|
#898 |
|
Member
Join Date: Nov 2007
Posts: 947
|
Some screenies of the 0.9ms optimized FXAA2 on Xbox 360 I mentioned earlier in this thread. These are unfortunately slightly upscaled. We will release better screenshots later (hopefully with FXAA3).
http://www.redlynx.com/media/files/R...ress%20Kit.zip |
|
|
|
|
|
#899 | |
|
Member
Join Date: Aug 2009
Posts: 836
|
Quote:
And about consoles, yes it was valid earlier, but now i prefer FXAA/MLAA/DLAA over any other post processing effect first - 1ms isnt really that much, especially since most games are optimized for 33ms. @sebbbi How to bribe You for PC release of Trials Revolution? :> |
|
|
|
|
|
|
#900 | |
|
AndyTX
Join Date: May 2004
Location: British Columbia, Canada
Posts: 1,841
|
Quote:
AA looks pretty good close up but some of the far stuff doesn't seem to be AAd at all. I can understand the fence and wheel spokes and stuff that needs subsampling, but the pipe in the background - is it just too close to horizontal to be picked up by the filter? How does it look in motion?
__________________
The content of this message is my personal opinion only. Last edited by Andrew Lauritzen; 07-Jun-2011 at 20:41. |
|
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|