ATI's "Toy Shop" Demo is jaw-dropping

I like the toy shop demo best - had the biggest visual impact on me but the pantheon one looks better than the "Artist Impression" scenes from recent doco's I've seen. COOL!!
 
Joe DeFuria said:
Bump...there's a new demo: Parthenon
Yeah, and it didn't really seem to have the same quality as the other demos to me. I mean, they showcased some excellent parallax mapping in conjunction with a whole lot of texture data, but other than that it just didn't seem all that impressive.
 
Mintmaster said:
I think it will only be in ATI's best interests if you make it work on NV hardware.

Can you just use alternate formats? I assume you're talking about single/double channel FP formats, but I'm not sure. Maybe floating point depth maps to be used as a shadow buffer?
That is a darn good point, if it did run,but dog slow people might think a bit.
 
Thank you Natasha, and the other guys at ATi, for all your answers. I do have a couple more questions if I may? :)

Can you explain better the "SM 3.0 done right" thing? Some are saying ATi is not SM3.0 compliant , some say it's a technicality. What is your opinion on this? Can you explain it better?

Will the X1800XT be for sure available on Nov. 5th?

HKEPC had a slide that said the R580 was "in house and working". Do you have a good guess on which month ATi will release the R580s? Thanks so much for all your answers. :)

Moderator - post edited for inappropriate questions
 
Last edited by a moderator:
Aw, come on. Don't scare off techies with a PR interview!

Well, other than the female thing --which is liable to scare her off on a whole different level. . .:LOL: Or, who knows, maybe not. We want pictures of the first B3D wedding tho! "Hi, my name's R300King!" might just work for her. . .:p
 
Last edited by a moderator:
Thanks a lot Chris, Thorsten, and Natasha, this demo rocks ! I hope we see more of such quality in future demos.
icon14.gif
 
Natasha, I hope you and your team get a fat raise out of this. :smile:
 
Last edited by a moderator:
natashaATI said:
Also, the parallax occlusion mapping technique really takes advantage of the excellent dynamic branching that X1K cards have.
That's good news! I think the good control flow implementation is what interests me the most in the new ATI hardware. Well, that and AA with fp16 render targets (although I'm still concerned that we don't get aniso/mip-mapped fp16 filtering by my understanding...). It should certainly be a fun card to play around with if I can get my hands on one :)

natashaATI said:
Of course, one could think up alternative ways to implement some of the algorithms that we have used in this demo. For example, you could use relief mapping instead of parallax occlusion mapping.
I agree that relief mapping probably isn't going to give you as good results, but did you look into PP Displacement Mapping with Distance Functions (Donney, GPU Gems 2)? It seems to produce some very nice results, run extremely fast, and support complex geometry with undercuts, etc. I'd be interested to know how it compares to your modified parallax mapping solution, as it is dependent texture read heavy, but much lighter on control flow.

Thanks for your time, and a great demo!
Andrew Lauritzen
University of Waterloo / Sh GPU Metaprogramming Language (libsh)
 
AndyTX said:
That's good news! I think the good control flow implementation is what interests me the most in the new ATI hardware.
I would tend to agree. If the NV50 doesn't have a good flow control implementation as well, it's going to be a very hard sell (same with HDR FSAA, of course), since by that time we'll have games that will have begun to use dynamic flow control, such as UT2007.
 
AndyTX - is that the method using sphere tracing? If so, it's major drawback appears to be storage space... (3d texture versus 2d heightfield). Also, IIRC the authors mention that it is amenable to optimization with dynamic branching (early exit) as well, even if they haven't implemented it in the example shaders provided.
 
psurge - yes that's the one, and yes the major disadvantage is memory usage, although that can be handled in a variety of manners (compression, lower-res distance map, etc). And yes, good dynamic branching support would allow one an "early-out" on pixels that converge quickly. This would be a moderate win at most for high-frequency distance maps due to thread coherency problems, but I suspect a huge win for smooth data sets.

In any case the method is fairly robust and has few artifacts (I think NVIDIA mentioned a few things that they ran in to using it in the Luna demo)... plus it's super-fast! The only big disadvantage is that it's practically unusable on current-generation ATI hardware due to the dependent texture read limitations. I suspect it would work quite well on the new hardware though, which is why I'd be interested to hear if it was considered.
 
Wow!

=D My post (and the entries in it) lured the ATI demo team into the forums. Thanks so much Chris, Thorsten, and Natasha for your time and information. XD Out of curiousity, does Humus hang out with you guys a lot?

The real reason I posted a reply goes back to my original question about animation production. Is ATI pushing for use in development of commercial 3D animation with their hardware solutions? Considering what it was able to render in real time, it seems like an array of current generation cores from ATI could render television series quality animation orders of magnitude faster than is done with through multiple CPUs... if the rendering software was able to take advantage of the 3D hardware. Even given the limitations Dio mentioned, the Toy Shop demo really made me wonder at the possiblities. =D Have you guys ever thought of doing a scene from reboot as a tech demo? Unless my memmory is rather fuzzy, it seems that Toy Shop has considerably surpassed the quality of that show's animation.

:Edit: I noticed you guys were in Boston... ^^; I guess that prevents much chance of daily Humus visits. It also explains the slide with a picture of "Rainy night in Boston" picture in the demo presentation as well.
 
Last edited by a moderator:
OICAspork said:
The real reason I posted a reply goes back to my original question about animation production. Is ATI pushing for use in development of commercial 3D animation with their hardware solutions? Considering what it was able to render in real time, it seems like an array of current generation cores from ATI could render television series quality animation orders of magnitude faster than is done with through multiple CPUs...
Maybe, but remember that it's not CPU time that costs, it's man hours. Making a scene realtime requires a lot more work than doing an offline render. In fact, the latter is often the first step in the former.

I think their best bet is to continue to do what they're doing, like supporting GPGPU-esque stuff.
 
psurge said:
AndyTX - is that the method using sphere tracing? If so, it's major drawback appears to be storage space... (3d texture versus 2d heightfield). Also, IIRC the authors mention that it is amenable to optimization with dynamic branching (early exit) as well, even if they haven't implemented it in the example shaders provided.

Actually, it's not so bad. I implemented this method in the parallax mapping sample (both OpenGL and D3D) that's in the latest SDK. The storage space isn't bad at all. For instance for a 512x512 bumpmap, I converted that to 128x128x16 in L8 format, which is just 256KB, thus smaller than the bumpmap itself. The quality is excellent. I used dynamic branching in this demo, taking as many loops as required to get an acceptable error (max 128 loops). The max error can be adjusted in real-time with a slider so you can compare quality vs performance. I tried this sample on 7800GTX recently, and found that the R520 outperforms it with more than 2x.

I haven't implemented POM myself though (I should probably do that) so I don't know which method is better, though POM has the advantage of solving the self-shadowing problem too, which on the other hand can also efficiently be implemented with horizon mapping for other techniques.
 
AndyTX said:
psurge - yes that's the one, and yes the major disadvantage is memory usage, although that can be handled in a variety of manners (compression, lower-res distance map, etc). And yes, good dynamic branching support would allow one an "early-out" on pixels that converge quickly. This would be a moderate win at most for high-frequency distance maps due to thread coherency problems, but I suspect a huge win for smooth data sets.

I tried using ATI1N compression for it, but saw nearly no performance gain. Lowering the resolution though improved it a bit. I think lowering the resolution is probably the better option for this technique, as the compressed version had some minor but visible quality issues.
The performance win is of course a lot dependent on how high your max loop count is otherwise. But even for low values I see a significant performance increase using dynamic branching.
 
OICAspork said:
=D My post (and the entries in it) lured the ATI demo team into the forums. Thanks so much Chris, Thorsten, and Natasha for your time and information. XD Out of curiousity, does Humus hang out with you guys a lot?

I'm in Markham and they are in Marlboro. I have met Natasha though (and had a lot of email conversation back in my internship in 2003, when I worked on RenderMonkey workspaces and she worked on the RenderMonkey application).
 
Mintmaster said:
Maybe, but remember that it's not CPU time that costs, it's man hours. Making a scene realtime requires a lot more work than doing an offline render. In fact, the latter is often the first step in the former.

I'm sorry I'm not making myself clear. What I'm trying to say is. Given the flexibility of current GPU hardware, would it be possible to map existing professional 3D-animation software to the GPU rather than the CPU such that the rendering is accelerated by the GPU (at least for any command that could be accelerated through the GPU pipeline) rather than sent to the CPU in order to massively increase the rendering speed. It doesn't need to be realtime, only faster than what could be done by a CPU (or CPUs) and have a speed gain adequate to make the software creation worth the investment. I hope that is clearer.

Erm... in case it isn't... one more example...

Pixar creates a scene in Renderman and similtaniously tries to render the scene on two seperate machines.

One machine uses a version of Renderman compiled to be rendered on CPU(s)

The other uses a version of Renderman compiled to render on GPU(s) with assistance of the CPU for any effects that cannot be rendered by the GPU.

Wouldn't the second finish first? I realize that Pixar uses massive, massive renderfams with thousands of CPUs, and currently it would be impossible or at least impracticle to do a similar array of GPUs (what is the current theoretical max for ATI GPUs?), but say a station wanted to create a new CG television series and didn't have the budget for pixar-sized renderfarms... couldn't software designed as I suggested drop the cost of producing such a show drastically?

I hope I'm finally clear.
 
Back
Top