View Full Version : AMD's GDC07 presentations are online.
http://ati.amd.com/developer/techpapers.html
Andrew Lauritzen
30-Mar-2007, 16:22
So to summarize, AMD likes noise functions and NVIDIA likes isosurface extraction and skin rendering ;)
Just kidding. Actually those articles were kind of interesting, and you always wonder when companies warn away from using certain API features right before they release a new product.
That said, Cascades is still my favorite presentation of this year's GDC (so far).
and you always wonder when companies warn away from using certain API features right before they release a new product.)
What did they warn on?
Jawed
Sound_Card
30-Mar-2007, 17:13
Posted this in the R600 thread. However feel it needs to be said here as well. I think "Whiteout" was coded on the frost bite engine.
Andrew Lauritzen
30-Mar-2007, 17:36
In particular they made a very strong and somewhat out-of-place point about "optional API features", specifically noting that you need to check for MSAA and filtering support for fp32 surfaces... that would seem a bit irrelevant to mention if R600 supported both of these, as G80 does. I really hope they prove me wrong on this though :(
They also warn a bit about the speed of some integer ops, but that might be the case all around (i.e. integer divide *is* expensive usually).
It's also interesting that they mention that you should let the HLSL compiler do the work of pulling out ddx/ddy for texture reads where necessary. That implies that ddx/ddy=>SampleGrad is slower than Sample with implicit derivatives. I guess that could be true due to register pressure, but otherwise it should be pretty similar I think. In any case, I can't see that being particularly significant...
In particular they made a very strong and somewhat out-of-place point about "optional API features", specifically noting that you need to check for MSAA and filtering support for fp32 surfaces... that would seem a bit irrelevant to mention if R600 supported both of these, as G80 does. I really hope they prove me wrong on this though :(
Hmm, that optionality in D3D10 has always been there - I've been expecting R600 to demur since G80's launch. Of course it sucks if you've already coded using optional features of D3D10. It's partly why I asked you why you didn't use int32 in your SAT-VSM, not just because int32 has more precision...
They also warn a bit about the speed of some integer ops, but that might be the case all around (i.e. integer divide *is* expensive usually).
I think the CUDA documentation warns on this.
Jawed
Andrew Lauritzen
30-Mar-2007, 19:23
It's partly why I asked you why you didn't use int32 in your SAT-VSM, not just because int32 has more precision...
Yes of course it has always been optional, but bringing that fact up specifically when the competition is already known to support it seems to confirm that R600 won't, which is disappointing no matter how one looks at it.
int32 makes sense for SAT - I just haven't gotten to it yet :) However it's interesting to note that G80 does *not* support filtering and multisampling on int32, so standard VSMs (which are arguably more useful for most applications) need to use fp32 on G80.
So much for standardizing the API ;)
Yes of course it has always been optional, but bringing that fact up specifically when the competition is already known to support it seems to confirm that R600 won't, which is disappointing no matter how one looks at it.
Ah well, I've had my suspicions about R600 ever since this optionality in D3D10 became clear.
int32 makes sense for SAT - I just haven't gotten to it yet :) However it's interesting to note that G80 does *not* support filtering and multisampling on int32, so standard VSMs (which are arguably more useful for most applications) need to use fp32 on G80.
So much for standardizing the API ;)
Groan - OK, now I'm confused :oops: does D3D10 mandate int32 filtering or is it optional? Sigh. I really thought it was mandatory.
Jawed
Posted this in the R600 thread. However feel it needs to be said here as well. I think "Whiteout" was coded on the frost bite engine.
ATI doesn't use Frostbite, they have their own pretty cool demo engine that they use for their demos.
So what do you guys think of my Frostbite presentation? It wasn't very detailed but wanted to give more of an overview of how a couple of the key rendering systems work.
Andrew Lauritzen
30-Mar-2007, 20:09
Groan - OK, now I'm confused :oops: does D3D10 mandate int32 filtering or is it optional? Sigh. I really thought it was mandatory.
Yeah what's mandatory and optional is a bit unclear to me... that G80 doesn't support int32 filtering came from a guy that I know at NVIDIA, although I haven't tested it yet myself.
trinibwoy
30-Mar-2007, 20:22
ATI doesn't use Frostbite, they have their own pretty cool demo engine that they use for their demos.
Hmmm some of the shots in your presentation look eerily similar to some scenes in the Ruby demo though.
So what do you guys think of my Frostbite presentation? It wasn't very detailed but wanted to give more of an overview of how a couple of the key rendering systems work.
Looks great, can't wait to see it in action again. Looks like CryEngine just might have some competition :) Any word on when we'll see it on the PC?
ATI doesn't use Frostbite, they have their own pretty cool demo engine that they use for their demos.
So what do you guys think of my Frostbite presentation? It wasn't very detailed but wanted to give more of an overview of how a couple of the key rendering systems work.
pretty good overveiw, you guys are using mostly procedural textures for terrain only or objects aswell? I take it the textures are pregenerated at load time, or are they made on the fly?
When you say shader permutations you guys are using something like how 3dmark 05 and 06 have only one huge shader?
Andrew Lauritzen
30-Mar-2007, 20:34
So what do you guys think of my Frostbite presentation? It wasn't very detailed but wanted to give more of an overview of how a couple of the key rendering systems work.
It was definitely interesting, although of course more detail would have be nice :)
I'm interested in the shader generation system as well. What you describe seems to be a model pretty similar to something like Sh or RapidMind, that support dynamic code generation and a "shader algebra". Are you generating code by "copy-pasting" small segments of HLSL/GLSL/whatever and trying to match variable names, or is it a real JIT (or preprocess) compile - i.e. you have a low-level representation of the program that gets manipulated, and then the final shader is compiled into HLSL?
I've used a surface shader/light shader design for many of my projects, to the point that it's possible to flick a switch and change between deferred lighting or forward lighting. That's probably overkill for most projects, but I'd be interested to know how far Frostbite goes in this direction.
Too bad Bad Company isn't coming for PC (according to the presentation) :(
repi - it made me want to read detailed presentations :). The BFBC shots look amazing BTW. Is there any chance of something a bit more detailed on Frostbite's approach to parallelism (especially since it seems like it targets such a wide range of architectures), and a walk-through of how the preprocessing of shader node graphs/high level states works to produce vertex/pixel shaders, constants, etc...? (I know you are probably way too busy and/or not allowed to say too much, but I figure asking can't hurt :)
Jawed - I think int32 would be more expensive to filter/blend than fp32, so it would be a bit strange to make it mandatory but then say that fp32 is optional.
Sound_Card
30-Mar-2007, 20:56
Too bad Bad Company isn't coming for PC (according to the presentation) :(
Nope, They said it is comming to PC.:wink:
By repi
ATI doesn't use Frostbite, they have their own pretty cool demo engine that they use for their demos.ATI doesn't use Frostbite, they have their own pretty cool demo engine that they use for their demos.
ah thanks for the clear up. Reason why was because 80% if the slides were talking about Frostbite, then out of no where you see the new ruby demo.
It's also interesting that they mention that you should let the HLSL compiler do the work of pulling out ddx/ddy for texture reads where necessary. That implies that ddx/ddy=>SampleGrad is slower than Sample with implicit derivatives. I guess that could be true due to register pressure, but otherwise it should be pretty similar I think. In any case, I can't see that being particularly significant...
I guess that is because SampleGrad can take different derivatives per pixel, while normal sampling will use the same derivatives for a 2x2 pixel quad. In any case it's better to use a "combined" function than using smaller functions and let the compiler work out how they fit together.
It's also interesting that they mention that you should let the HLSL compiler do the work of pulling out ddx/ddy for texture reads where necessary. That implies that ddx/ddy=>SampleGrad is slower than Sample with implicit derivatives. I guess that could be true due to register pressure, but otherwise it should be pretty similar I think. In any case, I can't see that being particularly significant...
[Speaking strictly about X1K series]
ddx/ddy adds two ALU instructions, so skipping those is a gain right there. And yes, samplegrad is slightly slower than sample. Not nearly as badly as on the G70 and G80 though (which is important when talking about dynamic branching performance since you'll often have to use samplegrad inside branches).
Andrew Lauritzen
07-Apr-2007, 01:24
ddx/ddy adds two ALU instructions, so skipping those is a gain right there.
Sure, but an extremely minor one in my experience... modern GPUs are monsters at math.
And yes, samplegrad is slightly slower than sample. Not nearly as badly as on the G70 and G80 though (which is important when talking about dynamic branching performance since you'll often have to use samplegrad inside branches).
Hmm, interesting. I haven't noticed a terrible hit for using SampleGrad on the G80, but to be honest I haven't profiled. Indeed I agree that it is an important thing to consider. Seems like one of those features that ought to be only a bit slower than implicit derivatives though, particular for already-dependent texture reads. Then again, what do I know about the hardware ;)
vBulletin® v3.8.6, Copyright ©2000-2013, Jelsoft Enterprises Ltd.