Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 17-Feb-2012, 17:02   #101
cjo
Member
 
Join Date: Mar 2010
Posts: 133
Default

Looks very good. It would be interesting to see the interior scene at a lower resolution and spending the processing time on more samples/pixel to see how much it improves noise levels / reduces the motion blur, which is a little distracting at its current level.
cjo is offline   Reply With Quote
Old 19-Feb-2012, 19:40   #102
Xenus
Senior Member
 
Join Date: Nov 2004
Location: Ohio
Posts: 1,248
Default

Yeah for now if you let it sit still it's both are very impressive soon as there in movement is the issue. Seems like we are still a few gens from being able to do clean movement in realtime at least with the algorithms used there.
Xenus is offline   Reply With Quote
Old 20-Feb-2012, 13:53   #103
straaljager
Junior Member
 
Join Date: Sep 2008
Posts: 77
Default

@ cjo, Xenus: I've uploaded a video rendered at 60 fps toggling motion blur on and off:

http://www.youtube.com/watch?v=udWNc_YeN20 (16 samples per pixel per frame, 60 fps)


straaljager is offline   Reply With Quote
Old 20-Feb-2012, 14:16   #104
fellix
Senior Member
 
Join Date: Dec 2004
Location: Varna, Bulgaria
Posts: 2,832
Send a message via Skype™ to fellix
Default

You can set FRAPS to record asynchronously to the screen output, so it won't cap render frame-rate.
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic.
Microsoft: Russia -- Big and bloated.
Linux: EU -- Diverse and broke.
fellix is offline   Reply With Quote
Old 20-Feb-2012, 15:31   #105
straaljager
Junior Member
 
Join Date: Sep 2008
Posts: 77
Default

Thanks, will try that next time.
straaljager is offline   Reply With Quote
Old 20-Feb-2012, 15:51   #106
cjo
Member
 
Join Date: Mar 2010
Posts: 133
Default

Thanks for doing that. How may frames does it average with the blur on? And is it a weighted average? I wonder if some sort of noise filtering would improve the image without adding the smearing.
cjo is offline   Reply With Quote
Old 20-Feb-2012, 16:14   #107
straaljager
Junior Member
 
Join Date: Sep 2008
Posts: 77
Default

I've set the blur to 0.8 using the accumulation buffer, which means that it averages the samples of 5 frames = 1/(1 - 0.8). There's no weighting, it's just simple averaging, but the real-time image quality is equivalent to an image with 5x as many samples. It's not the best method, but certainly not the worst either when the framerate is high enough. More sophisticated noise filters could potentially solve the smearing.
straaljager is offline   Reply With Quote
Old 20-Feb-2012, 16:18   #108
cjo
Member
 
Join Date: Mar 2010
Posts: 133
Default

I suspect that weighting the average so that older frames contribute less may help (at the cost of some increased noise). How many frames would you need to draw to get a perceived noise-free image? (say, on the 16 spp indoor scene)
cjo is offline   Reply With Quote
Old 20-Feb-2012, 16:38   #109
straaljager
Junior Member
 
Join Date: Sep 2008
Posts: 77
Default

Just tested it and found that 10 frames of 16 spp per frame are sufficient in the indoor scene to have noise free results (outdoors converge faster, so I guess 4 frames would be enough). I'll try the trick where older frames contribute less next time.
straaljager is offline   Reply With Quote
Old 23-Feb-2012, 20:28   #110
straaljager
Junior Member
 
Join Date: Sep 2008
Posts: 77
Default

New path tracing test with Brigade2 rendered at 720p on 2 GTX 580 gpus:


http://www.youtube.com/watch?v=L51hHcbZNhg


More info at http://raytracey.blogspot.com


straaljager is offline   Reply With Quote
Old 25-Feb-2012, 07:03   #111
Acert93
Artist formerly known as Acert93
 
Join Date: Dec 2004
Location: Seattle
Posts: 7,714
Default

Thanks straaljager. Maybe you can extrapolate for me: Assuming a 50% increase in GPU performance every 2 years for top end models how much longer will it be before we begin seeing technology like this fast enough to function in higher-end game complexity (e.g. games with hundreds of thousands of polys, maybe a couple million, on screen at a time and fast enough to perform @ a locked 30Hz)?
__________________
"In games I don't like, there is no such thing as "tradeoffs," only "downgrades" or "lazy devs" or "bugs" or "design failures." Neither do tradeoffs exist in games I'm a rabid fan of, and just shut up if you're going to point them out." -- fearsomepirate
Acert93 is offline   Reply With Quote
Old 25-Feb-2012, 10:57   #112
straaljager
Junior Member
 
Join Date: Sep 2008
Posts: 77
Default

Quote:
Originally Posted by Acert93 View Post
Thanks straaljager. Maybe you can extrapolate for me: Assuming a 50% increase in GPU performance every 2 years for top end models how much longer will it be before we begin seeing technology like this fast enough to function in higher-end game complexity (e.g. games with hundreds of thousands of polys, maybe a couple million, on screen at a time and fast enough to perform @ a locked 30Hz)?
There is actually a 200% increase in performance between generations, i.e. moving from a GTX 280 to a GTX 580 tripled the ray tracing performance in my experiments, due to the almost perfect scaling of the path tracing algorithm with the number of shaders and the use of caches in the Fermi architecture. I expect Kepler to triple performance again compared to Fermi cards. So path tracing performance will increase much faster (about 4-6x) than rasterization performance.

Another advantage is that the geometric complexity of the scene doesn't matter that much, e.g. I don't see much difference in performance between a 50k and a 500k poly scene, it's maybe 1.5x slower. I'm running 300k poly scenes at 30 fps with 16 samples per pixel (sufficient for outdoor scenes) and 640x360 resolution on 2 GPUs.

When implementing biased optimizations to calculate diffuse GI, I can see path tracing hit 30 fps at 720p for complex scenes within one or two GPU generations (I'm talking about photorealistic rendering quality with real-time diffuse GI, glossy reflections and refractions, soft shadows, ambient occlusion, ... everything). Maxwell GPUs should be able to reach this goal easily.

The area which needs the most improvement currently is raytracing of highly dynamic scenes, where the ray tracing acceleration structures need to be updated or rebuilt every frame if you want to have multiple highly detailed animated characters. Recently there were some huge advancements made, f.e. the HLBVH2 technique from Nvidia, which should be able to handle a scene with 1 million completely independent dynamic triangles in real-time. I have yet to see it in action, but I'm convinced that complex photorealistic games will be possible within two years.
straaljager is offline   Reply With Quote
Old 25-Feb-2012, 11:42   #113
Alexko
Senior Member
 
Join Date: Aug 2009
Posts: 2,066
Send a message via MSN to Alexko
Default

Quote:
Originally Posted by straaljager View Post
There is actually a 200% increase in performance between generations, i.e. moving from a GTX 280 to a GTX 580 tripled the ray tracing performance in my experiments, due to the almost perfect scaling of the path tracing algorithm with the number of shaders and the use of caches in the Fermi architecture. I expect Kepler to triple performance again compared to Fermi cards. So path tracing performance will increase much faster (about 4-6x) than rasterization performance.
Where do you expect this performance increase to come from?
__________________
"Well, you mentioned Disneyland, I thought of this porn site, and then bam! A blue Hulk." —The Creature
My (currently dormant) blog: Teχlog
Alexko is offline   Reply With Quote
Old 25-Feb-2012, 12:31   #114
fellix
Senior Member
 
Join Date: Dec 2004
Location: Varna, Bulgaria
Posts: 2,832
Send a message via Skype™ to fellix
Default

Data structure traversal is probably the biggest bottleneck and will come to be even more with the desire to use ray-tracing in more "liberal" ways. ALU throughout is also a factor, but that's cheap in todays architectures and doesn't relate to the data access issues. Caches and interfaces are the problem, and without some fundamental redesign it won't scale well with the computing rate. Think of vertical die stacking -- you get plenty of embedded RAM and ultra-wide interface.
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic.
Microsoft: Russia -- Big and bloated.
Linux: EU -- Diverse and broke.
fellix is offline   Reply With Quote
Old 25-Feb-2012, 13:35   #115
straaljager
Junior Member
 
Join Date: Sep 2008
Posts: 77
Default

Quote:
Originally Posted by Alexko View Post
Where do you expect this performance increase to come from?
Increased shader count + other architectural improvements + ray tracing scales much better than rasterization (almost perfectly), because it's extremely parallelizable. It's explained in more detail in this video presentation by Nvida about GPU ray tracing: http://www.youtube.com/watch?v=0IC2NIogWR4 (from 8 minutes in). G80 to GT200: 2x speedbump, GT200 to Fermi: another 4x speedbump due to caches.

Last edited by straaljager; 25-Feb-2012 at 13:40.
straaljager is offline   Reply With Quote
Old 25-Feb-2012, 14:26   #116
Alexko
Senior Member
 
Join Date: Aug 2009
Posts: 2,066
Send a message via MSN to Alexko
Default

Quote:
Originally Posted by straaljager View Post
Increased shader count + other architectural improvements + ray tracing scales much better than rasterization (almost perfectly), because it's extremely parallelizable. It's explained in more detail in this video presentation by Nvida about GPU ray tracing: http://www.youtube.com/watch?v=0IC2NIogWR4 (from 8 minutes in). G80 to GT200: 2x speedbump, GT200 to Fermi: another 4x speedbump due to caches.
Thanks. I suppose 2× scaling might be possible for Kepler then, but Fermi's caches are already there, and apparently work fairly well. Is there some kind of clear bottleneck in Fermi that you hope will be removed in future GPUs?
__________________
"Well, you mentioned Disneyland, I thought of this porn site, and then bam! A blue Hulk." —The Creature
My (currently dormant) blog: Teχlog
Alexko is offline   Reply With Quote
Old 25-Feb-2012, 14:36   #117
Acert93
Artist formerly known as Acert93
 
Join Date: Dec 2004
Location: Seattle
Posts: 7,714
Default

I know this is a CUDA project, but has there been any tests of this approach on AMD's GCN (7970) product yet? Is it an on-par architecture for this or are there certain advantages/disadvantages?
__________________
"In games I don't like, there is no such thing as "tradeoffs," only "downgrades" or "lazy devs" or "bugs" or "design failures." Neither do tradeoffs exist in games I'm a rabid fan of, and just shut up if you're going to point them out." -- fearsomepirate
Acert93 is offline   Reply With Quote
Old 25-Feb-2012, 15:20   #118
straaljager
Junior Member
 
Join Date: Sep 2008
Posts: 77
Default

Quote:
Originally Posted by Alexko View Post
Thanks. I suppose 2× scaling might be possible for Kepler then, but Fermi's caches are already there, and apparently work fairly well. Is there some kind of clear bottleneck in Fermi that you hope will be removed in future GPUs?
The biggest bottleneck right now is transferring the BVH (acceleration structure) of dynamic objects, which is currently built/updated on the CPU, from CPU to GPU. Rebuilding and refitting the BVH on the GPU could mitigate this. Also, future GPUs (from Maxwell onward) will have CPU cores on the same die, which will greatly help as well. Another thing is warp divergence in path tracing, for which a move to more MIMD like architectures would increase efficiency. And bigger caches of course.


Quote:
Originally Posted by Acert93 View Post
I know this is a CUDA project, but has there been any tests of this approach on AMD's GCN (7970) product yet? Is it an on-par architecture for this or are there certain advantages/disadvantages?
Good question. I've read that Dade's SmallLuxGPU runs very well on the 7970, about twice as fast as the HD 5870. I haven't done any tests with AMD cards though, so I can't say for sure.
straaljager is offline   Reply With Quote
Old 25-Feb-2012, 20:22   #119
Alexko
Senior Member
 
Join Date: Aug 2009
Posts: 2,066
Send a message via MSN to Alexko
Default

Quote:
Originally Posted by straaljager View Post
The biggest bottleneck right now is transferring the BVH (acceleration structure) of dynamic objects, which is currently built/updated on the CPU, from CPU to GPU. Rebuilding and refitting the BVH on the GPU could mitigate this. Also, future GPUs (from Maxwell onward) will have CPU cores on the same die, which will greatly help as well. Another thing is warp divergence in path tracing, for which a move to more MIMD like architectures would increase efficiency. And bigger caches of course.
Makes sense, thanks.
__________________
"Well, you mentioned Disneyland, I thought of this porn site, and then bam! A blue Hulk." —The Creature
My (currently dormant) blog: Teχlog
Alexko is offline   Reply With Quote
Old 28-Mar-2012, 13:11   #120
Voxilla
Member
 
Join Date: Jun 2007
Posts: 263
Default

Anybody tested this cuda path tracing yet on new Kepler ?
Voxilla is offline   Reply With Quote
Old 29-Mar-2012, 19:18   #121
straaljager
Junior Member
 
Join Date: Sep 2008
Posts: 77
Default

Quote:
Originally Posted by Voxilla View Post
Anybody tested this cuda path tracing yet on new Kepler ?
From all the reviews it appeared that the path tracing performance of the GTX 680 would be abysmal and far below expectations (worse than GTX 580), but I just found this CUDA path tracing benchmark (thanks to toxie from ompf forum) comparing GTX 480 and GTX 680, which looks a bit more promising: http://www.tml.tkk.fi/~timo/HPG2009/index.html

Last edited by straaljager; 29-Mar-2012 at 19:33.
straaljager is offline   Reply With Quote
Old 29-Mar-2012, 20:07   #122
CNCAddict
Member
 
Join Date: Aug 2005
Posts: 278
Default

So almost 2x faster than the 480? That is pretty impressive if you ask me. It seems like the 680 is a bit of a puzzle...supposed to suck at path tracing and things like CFD; but excels at them in certain tests and does poorly in others. Seems like the code just needs to be optimized for kepler..then it's off the the races!!
CNCAddict is offline   Reply With Quote
Old 29-Mar-2012, 20:09   #123
hoho
Senior Member
 
Join Date: Aug 2007
Location: Estonia
Posts: 1,218
Send a message via MSN to hoho Send a message via Skype™ to hoho
Default

From what I understand it sucks as long as you try to use doubles. As long as you stay with floats (and compiler actually spits out something usable) it's awesome.
hoho is offline   Reply With Quote
Old 29-Mar-2012, 20:33   #124
fellix
Senior Member
 
Join Date: Dec 2004
Location: Varna, Bulgaria
Posts: 2,832
Send a message via Skype™ to fellix
Default

Quote:
Originally Posted by straaljager View Post
So, they are using the tex cache for the fetches on Kepler? One must wonder why. If NV was to follow their architectural plan from Fermi, a single SMX in Kepler now should boast at least 48KB of tex streaming cache with corresponding increase of the access ports, that better track the increased compute throughput than the L1d cache.
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic.
Microsoft: Russia -- Big and bloated.
Linux: EU -- Diverse and broke.
fellix is offline   Reply With Quote
Old 30-Mar-2012, 08:11   #125
Voxilla
Member
 
Join Date: Jun 2007
Posts: 263
Default

From what I understand, this current Kepler, has been optimized for fast texturing, sacrificing L1/lL2 cache and shared memory in the process. For graphics this is a win but for compute algorithms relying on CPU like caches it hurts.
I'm still curious how current path tracing optimized for Fermi runs on Kepler.
Voxilla is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 06:26.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.