View Full Version : GPUs Will Process Physics, ATI Says
Raja Kodouri, a senior architect at ATI Technologies told X-bit labs that the company’s graphics processors, including the RADEON 9700, which is the world’s first DirectX 9.0 graphics processing unit (GPU), are capable of processing arrays of vertexes, similar type of operation that AGEIA’s physics processing units (PPUs) can perform. The arrays of data should be processed using pixel shader processors within a chip, the architect said.
More: http://www.xbitlabs.com/news/video/display/20051005202331.html
Goes well with Dave's suggestion (http://www.beyond3d.com/forum/showpost.php?p=586966&postcount=84).
Acert93
06-Oct-2005, 19:44
Yep.
Got a couple threads going on this right now. One in the console forum (http://www.beyond3d.com/forum/showthread.php?t=24328)and a poll/discussion (http://www.beyond3d.com/forum/showthread.php?t=23948)here from last week with some news pieces from around the industry from yesterday.
Karma Police
06-Oct-2005, 19:44
I still don't think ATi's idea will be as effective as a dedicated PPU.
Dave Baumann
06-Oct-2005, 19:47
< Cough (http://www.beyond3d.com/reviews/ati/r520/index.php?p=08) >
Bottom of the page (he says, wondering if people read these articles) ;)
Karma Police
06-Oct-2005, 19:52
< Cough (http://www.beyond3d.com/reviews/ati/r520/index.php?p=08) >
Bottom of the page (he says, wondering if people read these articles) ;)
Yes, I read that. However having a dedicated PPU with it's own RAM seems to have alot more "oomph" to it than having part of the GPU doing the physics while the rest of it working on graphics, too.
How many transistors does Aegia have in their top of the line chip, compared to how much ATi could have working on physics? I don't doubt ATi can do physics, it just seems it's done alot better w/ Aegia's chips.
Cartoon Corpse
06-Oct-2005, 19:55
you mean at the same time it's doing graphics? i have a 9700pro. is it already doing that now? or can be forced to do that?
are they saying i could use it as an AGP physics chip? even though that's not going to be in future boards? (i heard initial PPU's would be old PCI)
sorry i'll read the article. nice find.
< Cough (http://www.beyond3d.com/reviews/ati/r520/index.php?p=08) >
Bottom of the page (he says, wondering if people read these articles) ;)
I didnt read your article yet, saving the best for the last. :oops:
Yes, I read that. However having a dedicated PPU with it's own RAM seems to have alot more "oomph" to it than having part of the GPU doing the physics while the rest of it working on graphics, too.
How many transistors does Aegia have in their top of the line chip, compared to how much ATi could have working on physics? I don't doubt ATi can do physics, it just seems it's done alot better w/ Aegia's chips.
125 million transistors in Aegia's PhysX chip.
DemoCoder
06-Oct-2005, 19:58
I don't think we will see devs using this capability of desktop GPUs.. GPGPU algorithms eat up performance and certain physics features would be very inefficient (collision, creation of new constraints/reaction forces, adding/deleting vertices), plus now you're sharing physics data with video memory and bandwidth. It's better to host this stuff elsewhere. I say slap a CELL processor on a PCB and be done with it.
It's a neat trick, but I don't think it co-exists nicely with graphics, unless you are prepared to buy a second GPU and dedicate it just for physics. But rather than buy a second $400 GPU and waste it on physics, I'd rather have a dedicated PPU that isn't carrying baggage from needing to support graphics ops around.
In order words, trying to repackage a GPU as a PPU via software layer and sell it as that I think is a kludge extraordinaire and just GPU vendors scrambling to catch up to Ageia due ot the enormous positive PR response they got.
Cartoon Corpse
06-Oct-2005, 20:13
well i'd prefer to reserve my GPU power for G's and have a separate PPU chip for P's. more more more.
Mariner
06-Oct-2005, 20:23
The question is, would RV530 or other R520 derivatives also be able to support, say AGEIA's physics engine?
IMO it's likely that the take up of the physics engine itself will be slow with only a few games using it initially. The proposed AGEIA card isn't expected to be too cheap either from what I remember and for many games I'm sure it will be sitting there unused. On the other hand, if for example, you could use two RV530 cards and have one process physics in supported games, then you could still get use out of the second card if a game didn't support the physics engine. Does this sound feasible?
I remember Dave mentioning a couple of weeks ago that the most interesting part of R520 probably wouldn't be mentioned on the launch day and the possibility of using one card as a physics co-processor certainly sounds "interesting" to me.
wireframe
06-Oct-2005, 20:34
The question is, would RV530 or other R520 derivatives also be able to support, say AGEIA's physics engine?
I think this is key. It will come down to physics APIs. With the launch of Ageia's PPU I am not so sure they will be willing to trade a possible hardware market for easy acceptance by licensing their API to GPU manufacturers. Then again, maybe they will be happy to have broader support and make money off the software license...but you'd think they would have already talked that one over and not made any noise about a specialized PPU.
I think the ATI way sounds very cool. One moment you have two video cards driving multiple displays and then, when you game, you let your secondary card do the physics. Certainly a nicer hardware function if the GPU can keep up.
Hmmm...I think I just saw something clearly now...and it doesn't look good for Ageia. ATi seems very cozy with Microsoft and I would be willing to bet that the next "DirectX" will have a DirectPhysics component.
Acert93
06-Oct-2005, 20:39
It's a neat trick, but I don't think it co-exists nicely with graphics, unless you are prepared to buy a second GPU and dedicate it just for physics. But rather than buy a second $400 GPU and waste it on physics, I'd rather have a dedicated PPU that isn't carrying baggage from needing to support graphics ops around. Of course the benefit is that if a game does not use intense physics the 2nd GPU can be used to accelerate graphical performance.
In such situations a PPU is a waste of silicon.
GPUs also offer a nice transition period. Waiting every 6 months for an application to use a PPU would make me question such an investment. GPUs are already a high selling product with excellent market penetration--basically a 100% overlap in game products that would need advanced physics--so the support would be swift if it offered real world speed increases.
And finally, the option of upgrading your GPU and using your old GPU as a dedicated physics card is also an interesting suggestion.
Even if GPUs lag behind in performance they have a significant edge in market positioning and utilization. I have yet to be sold on the idea of a $200 PPU that will sit idle in 95% of games. This is worse than my Audigy 2 ZS which in hind site was a wasted investment due to the fact so few games use it in a way that stands out compared to integrated audio solutions.
Ageia due ot the enormous positive PR response they got. The question is whether the PR they have received (which IMO was met with a lot of skepticism in the press) will translate into developer support and actual consumer sales.
Ageia is being fronted by both Dual Core CPUs and now GPUs in the PC sector and in general by the consoles. On the PC front both competitors may not perform as well but they have much broader use and excellent market penetration.
Joe DeFuria
06-Oct-2005, 20:43
It's a neat trick, but I don't think it co-exists nicely with graphics, unless you are prepared to buy a second GPU and dedicate it just for physics....In order words, trying to repackage a GPU as a PPU via software layer and sell it as that I think is a kludge extraordinaire and just GPU vendors scrambling to catch up to Ageia due ot the enormous positive PR response they got.
As others have said, the key for me would be the SLI / X-Fire possibilities. I have always been lukewarm on all multi-board solutions from a value standpoint...and also lukewarm on a PPU board at the rumored $200+ price point.
However, having a second GPU that can be dedicated toward either more graphics power OR physics acceleration when supported by the game sounds very good to me. Now, I don't think this will really be relevant for this generation of products givenbirthing stage of physics acceleration in general...but perhaps for next gen...
How would an X1800XT do at Folding? How many hours per unit?
How would an X1800XT do at Folding? How many hours per unit?
I reckon it could be in the region of 20x faster than a high end P4...
In theory teams such as B3D and R3D will gradually get one hell of a serious boost...
Jawed
I reckon it could be in the region of 20x faster than a high end P4...
In theory teams such as B3D and R3D will gradually get one hell of a serious boost...
Jawed
Oooh, see now you've got my attention. :lol:
Dave Baumann
06-Oct-2005, 23:38
Its not that much:
http://www.beyond3d.com/news/images/20051006_folding.gif
Aha, I suspect you're in the middle of writing a news item in respect of this juicy trinket.
I'm really surprised by how small the difference is :shock:
Jawed
Oh and it looks like R3D is gonna love this news - I didn't see that piddly pair of GTX bars at first, they're so short :twisted:
Jawed
"I'm going to Best Buy to save humanity, dear." I like it. :lol:
You are forggeting that this is not only physics, but sound or almost any thing (even had been talking about AI), so I think it woud a great investiment in relation to a PPU, there is so many more possibiliys in GPGPU, which I think will really worth.
Skrying
07-Oct-2005, 00:18
So, what in the R520 architecture makes it so great at this type of stuff? I find this all so extremely interesting, its like a GPU is become this weird version of the CPU.... yet better!
mhouston
07-Oct-2005, 00:28
Excellent latency hiding. Basically, the memory controller on the R520 rocks. With enough math, you can hide ALL memory latency and branch penalties.
I'm sure Dave will tell you much more about this, so I won't steal his thunder.
Skrying
07-Oct-2005, 00:31
Excellent latency hiding. Basically, the memory controller on the R520 rocks. With enough math, you can hide ALL memory latency and branch penalties.
I'm sure Dave will tell you much more about this, so I won't steal his thunder.
Thanks! I'll be looking forward to future explanations on this and maybe even possible general uses!
Excellent latency hiding. Basically, the memory controller on the R520 rocks. With enough math, you can hide ALL memory latency and branch penalties.
I'm sure Dave will tell you much more about this, so I won't steal his thunder.
But the thing is, you're the man! We wanna hear it from you!
http://graphics.stanford.edu/~mhouston/
Jawed
Mike certainly knows his stuff, was a pleasure to chat to him in Ibiza. Any early hints on latency for scatters since they're uncached? Same cycles as an uncached read or more penalty for going through the write crossbar?
caboosemoose
07-Oct-2005, 01:27
I may be many miles from having a full technical grip on the issues at stake here, but I remain highly dubious of the idea of using a GPU to perform physics calculations. For starters, conventional image rendering performance will be at a premium for the foreseeable future, and I really don't think sharing avaiable processing power with physics calculations is a goer.
Moreover, in the context of the advent of multi-core CPUs, I am also not crazy about the idea of a dedicated physics chip. For sure such a dedicated chip would be orders of magnitude more efficient (and powerful) for physics processing than a general purpose CPU. But that probably isn't what counts. What I think counts is this: is it really likely that (for instance) a quad-core CPU of two years from now won't have enough poke to deliver the sort of physics that most game developers want to include in their engines? I very much doubt it.
mhouston
07-Oct-2005, 01:40
I don't like to comment on things I haven't tested. Lots of us GPGPU folks are really curious about this one. Since it's uncached, it's probably going to be expensive, 1000s of cycles maybe. The question is, how much faster is it than emulating (which REALLY sucks). Since this is the first GPU to support it, I don't think it's going to be everything we want the first time around, but sure has the potential to solve many sticky problems.
There are many apps we've held off of because we couldn't do them without scatter. Even the gromacs stuff was hard without scatter. The big ones for scatter that we'd love to see is tesselation, marching cubes, both variable output, and stream filtering applications. There is also some amazing work from Aaron Lefohn et al at UC Davis on data structures and adaptive shadow maps (stunning work!) that I think could really benefit from scatter.
DemoCoder
07-Oct-2005, 01:50
Excellent latency hiding. Basically, the memory controller on the R520 rocks. With enough math, you can hide ALL memory latency and branch penalties.
How much of the gromacs performance is related to branching performance?
see colon
07-Oct-2005, 03:04
what would be truely slick is to use older GPU's for physics work. for example, say you buy a x1800 today, in a year ATi get some support behind their "physics on the GPU" initiative, and their x2k line is out. you could pick up an x2k for graphics and use your x1k for physics.
alternatly if you could use, say, an x1300 or something low end for physics (assuming it's faster than doing it in software), it could be a nice, cheap PPU alternative.
mhouston
07-Oct-2005, 03:05
How much of the gromacs performance is related to branching performance?
None. The code was ps2b since the ps30 performance was REALLY bad on the G70. Now that they have a X1800XL to play with (when they can pry it out of my hands ;-) ), we'll see how much more they can get out of it. The main advantage with ps30 is that they can loop over a larger neighbor pool and save overall bandwidth and use the registers as a small cache.
SugarCoat
07-Oct-2005, 04:01
I think this is key. It will come down to physics APIs. With the launch of Ageia's PPU I am not so sure they will be willing to trade a possible hardware market for easy acceptance by licensing their API to GPU manufacturers. Then again, maybe they will be happy to have broader support and make money off the software license...but you'd think they would have already talked that one over and not made any noise about a specialized PPU.
I think the ATI way sounds very cool. One moment you have two video cards driving multiple displays and then, when you game, you let your secondary card do the physics. Certainly a nicer hardware function if the GPU can keep up.
Hmmm...I think I just saw something clearly now...and it doesn't look good for Ageia. ATi seems very cozy with Microsoft and I would be willing to bet that the next "DirectX" will have a DirectPhysics component.
Ageia certainly has its hands full. Lets face it, the first PPU is not going to be overly impressive. Developers will need to get use to it, figure out its tricks, and then the cool stuff happens. I think its simple folly to assume that you're going to pop a PPU in and be blown away. And at a $300.00 hefty price tag for their high end model, its even more less appealing in a world where enthusiasts already pay quite a bit for their hardware thats proven itself.
And you can be sure Microsoft is hard at work trying to get multiple forms of Physics support in Direct X. They openly welcome ageia, so did sony for the PS3, so thats good. But microsoft in computer gaming will standardize an API if they can. I know them too well as a company out to dominate the interactive entertainment market. I dont think they'll keep the advatanges of PPU's only to those with PPU's if they have the option not too. The problem for Ageia in this is that developers will be more likely to adopt something more standard, such as a feature of or in Direct X then their own which will only benefit users who have their cards. So we may see multiple forms initially supported through Direct X, ones which graphics cards can do or a user who has a dedicated PPU can do. Im happy they brought physics to light, really am. But the idea of burning anymore money for yet another peice of hardware simply isnt appealing. Plenty of innovation is left in CPU and GPU design to shut down the need for something like a PPU even before it begins.
thats my 2 cents.
I believe the UE 3 engine is using the Ageia physics but to what level? Is it so you can hard code directly to say an optional physics PPU or is it at the API level so you can use multicore options (maybe both)? With all the developers jumping on the UE 3 engine for games it might give Ageia some legs but I think if it does gain ground it will be more enthusiast at least for say the first year. For mainstream gamers I think $300 is a bit much and the "average gamer" would rather invest that money in other areas of their gaming rigs.
I've heard "rumors" that Havok has been expanding their physics middleware and is directed more at the software/API level and includes using dual core CPU for PC and multicore CPU/PPU for next gen consoles (in conjunction with their respective GPU's). (Not sure on the dedicated PPU card). I sure would like to hear more about Havoks latest offerings since its been over shadowed by Ageia lately.
Anyone have a little more indepth information on UE 3 use of Ageia physics and Havoks latest offerings? (or direct me somewhere) I have found some information on the boards here but would like to dig a little deeper :)
Hmmm...I think I just saw something clearly now...and it doesn't look good for Ageia. ATi seems very cozy with Microsoft and I would be willing to bet that the next "DirectX" will have a DirectPhysics component.
Microsoft have already advertised for jobs to write DirectPhysics, so its all ready being made.
And the job description, specifically mentioned GPU acceleration of physics.
I've heard "rumors" that Havok has been expanding their physics middleware and is directed more at the software/API level and includes using dual core CPU for PC and multicore CPU/PPU for next gen consoles (in conjunction with their respective GPU's). (Not sure on the dedicated PPU card). I sure would like to hear more about Havoks latest offerings since its been over shadowed by Ageia lately.
Havok's Hydra core is the new multi-thread engine, its the basis of the PC and console offerings.
I don't like to comment on things I haven't tested. Lots of us GPGPU folks are really curious about this one. Since it's uncached, it's probably going to be expensive, 1000s of cycles maybe. The question is, how much faster is it than emulating (which REALLY sucks). Since this is the first GPU to support it, I don't think it's going to be everything we want the first time around, but sure has the potential to solve many sticky problems.
There are many apps we've held off of because we couldn't do them without scatter. Even the gromacs stuff was hard without scatter. The big ones for scatter that we'd love to see is tesselation, marching cubes, both variable output, and stream filtering applications. There is also some amazing work from Aaron Lefohn et al at UC Davis on data structures and adaptive shadow maps (stunning work!) that I think could really benefit from scatter.
How is the support you are getting from ATI and Nvidia to do your research? Are there practicle applications we will be seeing in near future?
I don't like to comment on things I haven't tested. Lots of us GPGPU folks are really curious about this one. Since it's uncached, it's probably going to be expensive, 1000s of cycles maybe. The question is, how much faster is it than emulating (which REALLY sucks). Since this is the first GPU to support it, I don't think it's going to be everything we want the first time around, but sure has the potential to solve many sticky problems.
There are many apps we've held off of because we couldn't do them without scatter. Even the gromacs stuff was hard without scatter. The big ones for scatter that we'd love to see is tesselation, marching cubes, both variable output, and stream filtering applications. There is also some amazing work from Aaron Lefohn et al at UC Davis on data structures and adaptive shadow maps (stunning work!) that I think could really benefit from scatter.
And it's pretty obvious why scatter is (and is likely to remain for a while) uncached in current GPU architectures, even more on ATI where the framebuffer caches always store disjoint memory regions.
Someone requested a few days ago if our simulator would be released because he was interested on GPGPU research and it's pretty obvious that it's a large open field with a lot of people interested. If ATI releases a non-graphic stream API (are they going to support Brook?) for their hardware it's going to be quite interesting for all that people (and it would be interesting to me as well, I find OpenGL quite limiting if I ever try to go beyond current GPUs and games :wink: ). Even more if new hardware includes new and more general purpose functionality. The simulator is pretty dead for GPGPU right now with no float point textures, buffers or render to texture implemented yet (not that they are really a problem or very complex to implement, it's just I don't have time while trying to make real but semi-legacy games like Doom3 to work to add functionality that I can't test and that the OpenGL framework will take weeks to support).
At the end scatter just adds to the already very large list of features to implement in the simulator in the future ... Even then if there was an API with which I could play with and use as reference and applications to test it I doubt it would take that much to implement uncached scatter in the simulator (a cached scatter with coherency protocol would be really fun though ... if I ever find a few months without anything to do :lol:).
What I think counts is this: is it really likely that (for instance) a quad-core CPU of two years from now won't have enough poke to deliver the sort of physics that most game developers want to include in their engines? I very much doubt it.
I sort of agree and disagree. Deano's already confirmed that DirectPhysics is on the way, opening it up for supported hardware acceleration by discrete ASICs like graphics is now. I can foresee a CPU accelerated DP implementation by the runtime, but given that you still want to throw a highly parallel stream processor at a bunch of physics (and GPGPU) problems, CPUs (even if there's four or more cores) still will suck (relatively) at that in two years time.
I think the PhysX chip will do well enough to kickstart that little market and I can see how GPGPU folks might like a more generalised FP processor (that's not a GPU) to play with as well.
wireframe
07-Oct-2005, 11:17
Microsoft have already advertised for jobs to write DirectPhysics, so its all ready being made.
And the job description, specifically mentioned GPU acceleration of physics.
Ahh, well, that means my brief moment of clarity was not completely mad then. Thanks for the confirmation.
How is this going to work out for Ageia? Are they a member of DirectX or is MS looking to kill these guys before they even get started?
And it's pretty obvious why scatter is (and is likely to remain for a while) uncached in current GPU architectures, even more on ATI where the framebuffer caches always store disjoint memory regions.If you use a LRU replacement strategy you could try inserting them lower in the list (perhaps have a few steps developers can select).
Dave Baumann
07-Oct-2005, 11:55
How is this going to work out for Ageia? Are they a member of DirectX or is MS looking to kill these guys before they even get started?
Well, in one way it becomes a boon for them because they will have a stable, widespread, well documented API for developers, probably with very good tools for their hardware. The flipside of that is all of a sudden they become competitors to ATI and NVIDIA.
Karma Police
07-Oct-2005, 11:57
Well, in one way it becomes a boon for them because they will have a stable, widespread, well documented API for developers, probably with very good tools for their hardware. The flipside of that is all of a sudden they become competitors to ATI and NVIDIA.
Is nVidia making physics-enabled GPU's, too?
The DX10 pipeline, with its requirement for "MEMEXPORT" pretty much means that by hook or by crook NVidia DX10 GPUs will support a physics API that utilises GPUs.
Jawed
Dave Baumann
07-Oct-2005, 13:06
Last I heard, MEMEXPORT was not a requirement for DX10, but something ATI were hoping to get in.
can anyone enlighten me how scatter is handled by R520?
Joe DeFuria
07-Oct-2005, 14:06
Well, in one way it becomes a boon for them because they will have a stable, widespread, well documented API for developers, probably with very good tools for their hardware. The flipside of that is all of a sudden they become competitors to ATI and NVIDIA.
Yeah, I think Ageia's best "hope" is ultimately to have their software bought out by Microsoft (for eventual integration with DirectX), and ultimately get out of the hardware business.
Well, in one way it becomes a boon for them because they will have a stable, widespread, well documented API for developers, probably with very good tools for their hardware. The flipside of that is all of a sudden they become competitors to ATI and NVIDIA.
Yeah, seems to me we were predicting this around here not so long ago. Tho possibly not this quickly.
Luminescent
07-Oct-2005, 14:18
Somebody care to explain what scatter is?
For the GPU, an arbitrary number of writeable outputs into card memory from a shader program/interface. The shader program/interface can select a memory location and write into it with whatever it likes. Gather is the reverse.
mhouston
07-Oct-2005, 16:20
Somebody care to explain what scatter is?
a[i] = x;
Basically, and indirect write in which you specify the address to output to.
mhouston
07-Oct-2005, 16:25
Is nVidia making physics-enabled GPU's, too?
We use both ATI and Nvidia hardware for GPGPU. For example, Brook supports both as does Sh. Basically anything that is DX9+ can do at least some amount of GPGPU. Heck, we started all of this on ATI 9700's (a ray tracer shown at SIGGRAPH) and NV30's. However, it's really been the current boards, NV4X+ and R4XX/R5XX that are fast enough to make things interesting and worth the current difficulty running on the GPU.
See gpgpu.org for LOTS more info
Well, in one way it becomes a boon for them because they will have a stable, widespread, well documented API for developers, probably with very good tools for their hardware. The flipside of that is all of a sudden they become competitors to ATI and NVIDIA.
Not to mention XGI and S3... :lol:
wireframe
07-Oct-2005, 16:54
We use both ATI and Nvidia hardware for GPGPU. For example, Brook supports both as does Sh. Basically anything that is DX9+ can do at least some amount of GPGPU. Heck, we started all of this on ATI 9700's (a ray tracer shown at SIGGRAPH) and NV30's. However, it's really been the current boards, NV4X+ and R4XX/R5XX that are fast enough to make things interesting and worth the current difficulty running on the GPU.
See gpgpu.org for LOTS more info
This is all interesting stuff. What would be really cool is a small demo app that can be downloaded to give a general idea about speed differences. Something extremely tuned for SSE/SSE2/SSE3 (so as not to give an inflated perception of GPGPU benefits) and whatever architecture and then GPGPU processing. Just something to give one an idea of the difference in capability at a certain task. Is any such app available or planned?
mhouston
07-Oct-2005, 16:56
This is all interesting stuff. What would be really cool is a small demo app that can be downloaded to give a general idea about speed differences. Something extremely tuned for SSE/SSE2/SSE3 (so as not to give an inflated perception of GPGPU benefits) and whatever architecture and then GPGPU processing. Just something to give one an idea of the difference in capability at a certain task. Is any such app available or planned?
Actually, there are several matrix multiple routines available for the GPU. One is included in GPUBench. It's easy to compare this against MKL/ATLAS.
Once again, there's much more info at gpgpu.org including sample apps and papers describing full applications and comparing against CPU performance.
wireframe
07-Oct-2005, 17:08
Actually, there are several matrix multiple routines available for the GPU. One is included in GPUBench. It's easy to compare this against MKL/ATLAS.
Once again, there's much more info at gpgpu.org including sample apps and papers describing full applications and comparing against CPU performance.
I'm looking at GPGPU.org and I am only seeing a few Nvidia demos. I have also read some papers regarding performance. What I meant is if there is some specific demo app planned (or available that went under the radar) for those who want to see this from a non-technical perspective. I am sure you are familiar with benchmarking applications like Sandra. It has a multi-media module, for exaple, that computes a mandelbrot. Something like this that anyone can download, run on their CPU and then run on their GPU to see the difference, gawk, and go "whoa!!"
I am not seeing anything of this nature on GPGPU.org.
Joe DeFuria
07-Oct-2005, 18:37
I think the first "killer app" / benchmark you're looking for (for the non techies) will be Gromacs / folding. Especially if they rework the code to take advantage of X1800's new capabilities (and it makes even more of a difference).
Get folding@home to release a GPU version....
http://www.beyond3d.com/forum/showpost.php?p=589213&postcount=17
For some reason, Dave didn't quote the source of this:
http://graphics.stanford.edu/~mhouston/public_talks/R520-mhouston.pdf
Jawed
Acert93
07-Oct-2005, 18:45
Yeah, I think Ageia's best "hope" is ultimately to have their software bought out by Microsoft (for eventual integration with DirectX), and ultimately get out of the hardware business. I wonder how Havok would feel about this?
So... we are getting a physics API... it will be interesting to know if it is more general, a software layer meant to be accelerated by GPUs, Dual Core CPUs, PPUs, etc... or a more feature-specific API like Havok/Novodex?
I always thought a physics API was a good idea... now lets see if they can make a basic AI one. ;)
mhouston
07-Oct-2005, 19:26
http://www.beyond3d.com/forum/showpost.php?p=589213&postcount=17
For some reason, Dave didn't quote the source of this:
http://graphics.stanford.edu/~mhouston/public_talks/R520-mhouston.pdf
Jawed
Dave was being nice and protecting me, which is much appreciated. However, since others leaked the full slide set, I just went ahead and put them up out in the open.
Chalnoth
07-Oct-2005, 19:28
Excellent latency hiding. Basically, the memory controller on the R520 rocks.
Er, latency hiding has absolutely nothing to do with the memory controller. The structure of the memory controller is what sets the latency. It has nothing to do with hiding it.
I'm pretty well convinced that the structure of the R520's memory controller serves a dual-purpose:
1. It allows for higher clockspeeds at the cost of a significant number of transistors.
2. It allows for easy communication between the vertex units and the texture units.
The latency hiding all comes down to the decoupling of the texture units and the pixel units. This has nothing to do with the memory controller, but rather the pipeline structure. This is the primary difference between the X1x00 parts and the Xx00 parts, and therefore responsible for the majority of the efficiency difference.
Dave Baumann
07-Oct-2005, 20:49
Chalnoth, I'd say there has been quite a bit of work put into the logic of the memory controller alone (well, the die shots bear evidence to it) in order to maximise accesses. There's also the the fact there there is an increase in efficiency by virtue of the fact there is more blocks and more (smaller) channels.
In actuality the structure of the Pixel and texture pipelines are not different between R520 and R300-R420, they have just been represented differently - they were always decoupled (and I believe if you look at some of Mikes tests for X850 you can probably derive that). The primary chages are the instruction scheduler to increase utilisation and the large memory changes.
Point, Demirug. Someone flag me when he's wrong someday.
Chalnoth, I'd say there has been quite a bit of work put into the logic of the memory controller alone (well, the die shots bear evidence to it) in order to maximise accesses. There's also the the fact there there is an increase in efficiency by virtue of the fact there is more blocks and more (smaller) channels.
Totally, but that's still not hiding latency. It's reducing latency, which is arguably better.
The primary chages are the instruction scheduler to increase utilisation and the large memory changes.
I'm still curious to know if the R520 scheduler can proactively schedule texture operations even when a batch in context (actually executing in a shader quad) doesn't require a texture operation.
In other words, are the texture pipes kept busy - or are they simply responding to the demands of batches as the texture operations arise?
It seems to me that R520 can't proactively schedule texture operations. I've always presumed that Xenos can and used that model as the basis for R520. But the fairly tight binding of the texture pipes with the shader pipes seems to imply not :cry:
Jawed
Dave Baumann
07-Oct-2005, 22:11
In Xenos each of the units have arbiters to deterimine if work can be carried out on those units - each shader array has two arbiters (as there are two interleved threads) while the texture pipelines both have their own arbiters. The texture units are just seen as another resource and threads are passed down and work on them - more if the unit doesn't have any work (its finished a thread or waiting on dependant data) then another thread is passed to it, so essentially it tries to keep all the units running in parallel - I'd imagine that R520 is probably trying to do something similar (although I know more about Xenos's scheduling than I do about R520's, curiously).
With Jeremy Sugerman saying this:
Hooray for ATI. We have an X1800 / R5-something or another to play with now that it's officially launched. Mike has some impressive looking slides (http://graphics.stanford.edu/~mhouston/public_talks/R520-mhouston.pdf). At the same time the gpubench results (http://graphics.stanford.edu/projects/gpubench/results/X1800XL-5340/) don't look all that much more impressive than the 7800 GTX (http://graphics.stanford.edu/projects/gpubench/results/7800GTX-7772/). The thing is, that really seems to underscore shortcomings in gpubench's methodology. Side by side, the 6800 and 7800 look at least as good, if not better, than their X800 and X1800 counterparts. In testing though, both ClawHMMer and our ray tracing stuff came very firmly down in favour of the ATI cards and Mike's GROMACS data is the same way. It seems like we need to augment or adjust gpubench so that we could get a better sense of the factors that have been key for
us.
http://graphics.stanford.edu/~yoel/notes/
I guess we'll eventually get some answers on the texture-scheduling question, as GPUBench is tweaked.
Jawed
I wonder how Havok would feel about this?
So... we are getting a physics API... it will be interesting to know if it is more general, a software layer meant to be accelerated by GPUs, Dual Core CPUs, PPUs, etc... or a more feature-specific API like Havok/Novodex?
I always thought a physics API was a good idea... now lets see if they can make a basic AI one. ;)
Amen Acert93. The only problem Ive been running into lately with my A.I. work is that physics is playing a bigger role in conjunction with A.I. then I thought it would use previously. I have broaden my outlook but keep keep my self in check not forgetting the need to allow enough room for other resources as well. (can never have enough of resources :) ) Multicore CPU's are definately giving me more freedom. But I want more always more :)
mhouston
08-Oct-2005, 17:31
Er, latency hiding has absolutely nothing to do with the memory controller. The structure of the memory controller is what sets the latency. It has nothing to do with hiding it.
I'm pretty well convinced that the structure of the R520's memory controller serves a dual-purpose:
1. It allows for higher clockspeeds at the cost of a significant number of transistors.
2. It allows for easy communication between the vertex units and the texture units.
The latency hiding all comes down to the decoupling of the texture units and the pixel units. This has nothing to do with the memory controller, but rather the pipeline structure. This is the primary difference between the X1x00 parts and the Xx00 parts, and therefore responsible for the majority of the efficiency difference.
You are right that the much of the latency hiding comes from decoupled texture/ALU/branch. However, the decoupled texture/ALU was already available on the R4XX chips. What really differentiates the R5XX is the memory controller. It has the ability to have MANY more references in flight and better track which clients are nearing starvation. A good example of the improved latency hiding/tolerance allowed by the controller is the random read performance of the new board. Also, the new memory controller vastly improves the cache hit rates which further helps latency.
Actually, this new chip is very similar conceptually to how the Tera MTA memory subsystem was designed to hide latency.
In the end, this is mainly an argument about semantics, but I'll agree with your statement above.
Thinking about it, what are those obvious reasons not to use the cache for memexport? I dont see them.
Dedicated hardware for write combining doesnt make much sense given how little it will be used in the near future ... but with the cache being fully associative the necessary hardware needed to let you subvert the cache for that purpose seems to me minimal. A bitmask to indicate validity for each byte in a cacheline I assume it already has, and giving the memexport writes lower priority in cache replacement so they dont trash the cache should be trivial too.
Hell, providing the extra path for uncached writes almost seems more work.
mhouston
08-Oct-2005, 18:36
Thinking about it, what are those obvious reasons not to use the cache for memexport? I dont see them.
Dedicated hardware for write combining doesnt make much sense given how little it will be used in the near future ... but with the cache being fully associative the necessary hardware needed to let you subvert the cache for that purpose seems to me minimal. A bitmask to indicate validity for each byte in a cacheline I assume it already has, and giving the memexport writes lower priority in cache replacement so they dont trash the cache should be trivial too.
Hell, providing the extra path for uncached writes almost seems more work.
Actually, to do cached writes, you have to deal with cache coherency which is non-trivial. Trust me on that one or read H&P edition 3 about MESI/MOESI/DRAGON protocols. You also need to deal with invalidating pending requests, and, if you reorder (which can and does happen in a threaded chip) you now have to proactively invalidate and restart threads which got a read you just invalidated. You'll can look at the Singh book on parallel architecture for more information here. In short, it's much easier to do uncached writes if you can't predict the addresses (like you can with framebuffer writes).
I might not be doing a great job explaining this. In some ways, we would have prefered cached writes since much of our scatters will be pretty coherent. But, at least we finally have a chip that can do it.
There might be multiple color buffer caches on the chip, but I assume each buffer location can only be present in a single one, so I dont see how cache coherency is an issue. Writes can already invalidate pending reads and threads even without caching, so consistency doesn't seem an issue either (in fact I would assume weak consistency is the best you are going to get even now).
CELL 'caches' (in fact just local SRAM) don't implement coherency but don't know if CELL implements any technique to protect memory regions when using DMA.
Offering something as dangerous as uncoherent cached random writes would require big WARNING banners all around the API description to disuade most programmers from using such feature. Or ATI bug department would become overwhelmed with so many 'why this doesn't work' complains :lol:
I don't know yet how limited or unlimited MEMEXPORT. Is it writing to a bounded buffer like a texture or a linear array? Or can it write to anywhere in the GPU address space? I guess it's the first one.
Lets try that again instead of editing my post as I go along.
There might be multiple color buffer caches on the chip, but I assume each buffer location can only be present in a single one, so cache coherency is not an issue. Writes can already invalidate pending reads and threads even without caching, so consistency doesn't seem an issue either.
Each color cache is associated with a single quad pipeline. And each quad pipeline is associated with their own disjoint framebuffer regions (precisely to remove the requirement for a coherency protocol). There are no buses interconnecting those caches. When using random writes (for example if the fragment position could be modified in the shader) into a buffer the problem is that the position accessed could be to a region owned by a different quad pipeline. If two quad pipelines write, through their cache, to the same region one of the changed cache lines will disapear from existance when both cache lines are evicted to memory. The quad pipeline that is the last to evict the cache line will remove any evidence that any other quad pipeline ever wrote in that region of memory. In a CPU multiprocessor even with a debugger a problem derived from that would be very hard to debug. In a GPU it would be practically impossible.
The invalidations you talk about are precisely the coherency protocol and it isn't as easy as it may look at first glance.
There are no buses interconnecting those caches.Ya, forgot about that for a moment. You can route the data through the memory subystem to the other cache without actually going through memory though, you dont need any extra busses.
The invalidations you talk about are precisely the coherency protocol and it isn't as easy as it may look at first glance.Threads (potentially reordered) can be working on data from, or have reads pending for, locations which are being written to even if you dont cache anything at all.
Ya, forgot about that for a moment. You can route the data through the memory subystem to the other cache without actually going through memory though, you dont need any extra busses.
Threads (potentially reordered) can be working on data from, or have reads pending for, locations which are being written to even if you dont cache anything at all.
But you have to change the memory controller effectively implementing a coherence protocol.
The other problem you mention is slightly different and is related to access atomicity and the expected execution order. I doubt they implement access atomicity nor a fixed execution order for fragments and instructions. Reading in the same pass (primitive batch) a buffer being written with MEMEXPORT is likely to be forbidden just as GPUs don't support reading the current framebuffer from the shader (something that IHVs have already stated that they don't want to support even if developers would love it).
The problem with caches is that cache lines can survive for thousands of cycles and across primitive batches (if an explicit flush isn't implemented). If the cache doesn't implement a per line element write mask even elements accessed by a single pipeline are affected as the whole cache line is written back to memory, even elements not touched by the pipeline evicting the line.
But you have to change the memory controller effectively implementing a coherency protocol.No, you simply write to the other cache instead of to memory ... without copies in multiple caches you dont need cache coherency.
If the cache doesn't implement a per line element write mask even elements accessed by a single pipeline are affected as the whole cache line is written back to memory, even elements not touched by that the pipeline evicting the line.They already need hardware in place to deal with partial writes to cachelines anyway, and I assume they already do it with masks.
Well, in one way it becomes a boon for them because they will have a stable, widespread, well documented API for developers, probably with very good tools for their hardware. The flipside of that is all of a sudden they become competitors to ATI and NVIDIA.
Just saw this: http://www.overclockers.cl/news/ageia/ageia_answer.htm including:
1.- ABU_METAL: Would you consider in the future integrating your PPUs on video cards to minimize the investment needed?
Manju Hegde: There has been discussion around integration, but at this time we are offering the PPU as a stand alone add in card.
Wonder who the conversations were with?
Simon F
10-Oct-2005, 13:39
a[i] = x;
Basically, and indirect write in which you specify the address to output to.
How do you guarantee temporal dependencies? It's not too difficult in the GPU if the write is restricted to a particular location for "a thread", but if those are competely arbitrary...:shock:
Switching render targets probably forces a flush, so that should work :)
hcpizzi
10-Oct-2005, 18:19
Mike, I've read your presentation.
How is scattering exposed in the PSs?
Is it scattering to the frame buffer or to textures?
If so, can you use a texture as a render target and read from it and coherence is assured so you can accumulate results, for example?
Very interesting read, by the way.
mhouston
10-Oct-2005, 18:28
Mike, I've read your presentation.
How is scattering exposed in the PSs?
Is it scattering to the frame buffer or to textures?
If so, can you use a texture as a render target and read from it and coherence is assured so you can accumulate results, for example?
Very interesting read, by the way.
It's not exposed currently. I don't know of any public DX shader model that supports this functionality. Under GL, this can just be made an extension, and I hope that's one way we see it. Under DX, I don't know how this could be exposed until a new shader model is released. Maybe something ultra hacky like a gather from a certain sampler really becomes a scatter? (yuck).
Scattering is to textures, but in the current world, framebuffers are textures. (Yes this is a simplistic generalizaton, but FBOs and RTT make things appear this way).
Read-modify-write, i.e outputting to a target and reading from it in the same shader pass, will not work. So, accumulation as you are suggesting doesn't really work. You'd have to take a multipass approach and use scatter/fbuffer to get this to work. The X1000 series scatter is the most useful for us currently as a way to (finally) build data structures, handle register spilling, and more than 16 float outputs (current MRT limit is 4 float4's).
Many of us in the GPGPU community are waiting, somewhat impatiently ;-), for scatter to be exposed.
Demigod
10-Oct-2005, 18:40
Dont know if this has been posted byut has anyone seen this
http://www.nvnews.net/vbulletin/showpost.php?p=714242&postcount=10
HDR with tone mapping, FP16 blending and filtering, Bump mapping
Physics running purely on the GPU
hcpizzi
10-Oct-2005, 18:46
Thank you very much.
I thought it was already exposed in some beta driver. Well, at least the feature is there, so it's just waiting.
About exposing it in DX, it would need a new shader revision, cause as you say, exposing it using the current shader language can be scary. That's the main reason that I prefer the xtension driven approach of OGL to the inmutable DX.
mhouston
10-Oct-2005, 18:46
Dont know if this has been posted byut has anyone seen this
http://www.nvnews.net/vbulletin/showpost.php?p=714242&postcount=10
HDR with tone mapping, FP16 blending and filtering, Bump mapping
Physics running purely on the GPU
That sure is a pretty demo. Most of my stuff never draws anything and just spits out the results. ;-)
DemoCoder
10-Oct-2005, 18:55
Of course a particle system is vastly different than a full physics system since many don't model collisions, and therefore, the streaming approach of integrating forces works very well.
mhouston
10-Oct-2005, 19:02
Of course a particle system is vastly different than a full physics system since many don't model collisions, and therefore, the streaming approach of integrating forces works very well.
But it's still pretty. It's difficult to make HMMer look pretty, or GROMACS. Our GPU raytracer (Foley and Sugerman) looks good, but it's purpose IS to draw something. ;-)
DemoCoder
10-Oct-2005, 19:09
Yeah, but most 3D demos will never have the mathematical elegance of HMMer :) Beauty is in the eye of the beholder. :)
DemoCoder
10-Oct-2005, 19:14
Switching render targets probably forces a flush, so that should work :)
Yah, but what if N pipelines are reading and writing the same position? There's no serialization primitive in shaders. Seems to me that a scatter write must be independent (can't be read or written by any other fragment thread during a draw call) In theory, since the actual tiling/drawing pattern of GPUs is known, you might be able to get away with it in some scenarios, but it seems the more complex the scheduler, the less guarantees you have on sequential ordering.
This to me says that a scatter write in a GPU should not be a true scatter, but more or less a permutation of a stream. You are allowed to write to arbitrary locations, but that location may only be processed once.
Simon F
11-Oct-2005, 11:02
Yah, but what if N pipelines are reading and writing the same position? There's no serialization primitive in shaders. Seems to me that a scatter write must be independent (can't be read or written by any other fragment thread during a draw call)
Thanks Demo. That is my concern...
In theory, since the actual tiling/drawing pattern of GPUs is known, you might be able to get away with it in some scenarios, but it seems the more complex the scheduler, the less guarantees you have on sequential ordering.
Unless you know exactly how the HW functions - but I don't think think that is likely - I don't think you can make those assumptions. What if the system is clever enough to swap threads based on the activity of the memory bus? A refresh of the screen might be enough to change the timings of the various writes.
This to me says that a scatter write in a GPU should not be a true scatter, but more or less a permutation of a stream. You are allowed to write to arbitrary locations, but that location may only be processed once.
I'd agree with that but it sounds just as tricky to program as simply writing to the "fixed location" destination.
If you just specify the write order as undefined and use buffered writes, reordering adjacent writes to optimize for memory paging, the performance hit shouldn't be that huge. Or am I missing something?
Slightly OT though but if this scatter stuff works like I think/hope it does, then a good application would be blurring by recursive filtering (IIR).
The idea is to iterate all pixels in a row from a single pixel-shader, this way the recursive state can be kept in a register during the excecution. Like this pseudocode:
t = 0; // filter state
c = 0.1; // filter coefficient
for(x=0; x<w; x++)
{
t = (1-c)*t + fb[x]*c; // read and filter, not necessarily from the framebuffer though
fb[x] = t; // write back
}
Doing this 4 times (horizontal and vertical, both directions) could be quite an efficient blur filter. Note that it should only be run once per row/column as each 'pixel' writes the entire row/column. Varying the coefficient by the z-value could be a nice way to hack DOF. Only going in one direction will give a 'wind' filter instead.
Is there anything that would prevent this kind of usage? It does support an arbitrary number of writes, right? And each pixel is only written once.
Dave Baumann
12-Oct-2005, 01:08
Mmmm, looks like the GROMACS numbers posted earlier may still have room to grow (http://forum.folding-community.org/viewtopic.php?p=113040#113040) (hopefully, all round on the graphics).
http://www.theinquirer.net/?article=26868
Soooo, B3D planning to get into the PPU reviewing biz? Ground floor and all that, plus the trends would seem to suggest it would be a forward-looking thing to do. . .
I don't think we will see devs using this capability of desktop GPUs.. GPGPU algorithms eat up performance and certain physics features would be very inefficient (collision, creation of new constraints/reaction forces, adding/deleting vertices), plus now you're sharing physics data with video memory and bandwidth. It's better to host this stuff elsewhere. I say slap a CELL processor on a PCB and be done with it.
It's a neat trick, but I don't think it co-exists nicely with graphics, unless you are prepared to buy a second GPU and dedicate it just for physics. But rather than buy a second $400 GPU and waste it on physics, I'd rather have a dedicated PPU that isn't carrying baggage from needing to support graphics ops around.
In order words, trying to repackage a GPU as a PPU via software layer and sell it as that I think is a kludge extraordinaire and just GPU vendors scrambling to catch up to Ageia due ot the enormous positive PR response they got.
Think of the geometry that you have to copy over the bus to the PPU card and then copy to GPU. With the GPU you just send commands telling it how to manipulate the geometry. You could do clipping or bounds checking without having to move objects out of the GPU memory.
thatdude90210
14-Oct-2005, 03:38
Would be cool if the GPGPU guys could use the xbox 360 to fold. All those Xenos will be sitting around doing nothing most of the day anyway.
DemoCoder
14-Oct-2005, 08:47
Think of the geometry that you have to copy over the bus to the PPU card and then copy to GPU. With the GPU you just send commands telling it how to manipulate the geometry. You could do clipping or bounds checking without having to move objects out of the GPU memory.
Generally speaking, the physics data consists of forces, inertial tensors, bounding boxes, and collision structures. The position of vertices is a subset. So you'll be sending extra data to the GPU as well, data which will eat up extra RAM and cut into bandwidth, which will slow down rendering. A single GPU solution is IMHO a bad one to the problem of Physics + Rendering.
Moreover, the only bandwidth wasted by having a second GPU card is the bandwidth to download the vertices from the CPU and upload them to the GPU. This is not likely to be a huge concern, and is certainly better than today's situation which is operating on slow system memory with the CPU, and then uploading the vertices from system memory to GPU.
As the amount of memory on the GPU increases and the 3D scenes become larger and larger it becomes less practical to move all the data around.
DemoCoder
14-Oct-2005, 09:06
Bandwidth is bandwidth, whether it is GPU bandwidth or bus bandwidth. If you're manipulating 100 million vertices, you're chewing up 4Gbyte/s on the bus either way, and if your touching all the verticies via GPU physics, you're chewing up a multiple of that. With GPUs already constrainted for bandwidth and fillrate, running huge physics datasets on the GPU is likely to kill your rendering performance far more than copying data between a PPU and GPU via two PCIx8/x16 slots, or an "SLI" like connector. A separate PPU is additional bandwidth, not sharing GPU bandwidth.
The single-GPU with uber-ram "physics solution" IMHO is a non-starter.
Bandwidth is bandwidth, whether it is GPU bandwidth or bus bandwidth. If you're manipulating 100 million vertices, you're chewing up 4Gbyte/s on the bus either way, and if your touching all the verticies via GPU physics, you're chewing up a multiple of that. With GPUs already constrainted for bandwidth and fillrate, running huge physics datasets on the GPU is likely to kill your rendering performance far more than copying data between a PPU and GPU via two PCIx8/x16 slots, or an "SLI" like connector. A separate PPU is additional bandwidth, not sharing GPU bandwidth.
The single-GPU with uber-ram "physics solution" IMHO is a non-starter.
I don't agree. If that was true we would see more graphics objects in memory on the PC being copied over the PCI bus.
DemoCoder
14-Oct-2005, 09:33
That in fact, used to be the case. When GPU bandwidth was maxed out, AGP would be used, to balance the load. This in fact, was one of the first major Detonator optimizations.
You're arguing that intra-card bandwidth will be the limiting factor because the dataset is so large. I'd argue that if the geometry dataset starts to approximate the texture and framebuffer/rendertarget dataset, your GPU will be starved, severly impacting rendering performance. GPUs should be optimized for RENDERING PERFORMANCE/QUALITY first and foremost. That's their whole reason for existence.
In reality, even if your scene consisted of 100 million vertices, the output of the physics engine won't be anywhere near that large. The physics engine will output new transforms for whole objects, and new bones or perturbations for those objects, but it won't systematically generate and upload new vertices every frame, that would be ridiculous.
What you'd do is calculate a new position and orientation for all of the rigid bodies, and let the vertex shading hardware deal with it. # of bodies <<< # of vertices.
Today, physics is calculated by the CPU, so we are in exactly the dire situation you predict, yet, the PCIE bus is not the limiting factor in physics performance today. Ergo, the data doesn't support your hypothesis. Physics engines are not AGP/PCIE limited today, not even close.
Maintank
14-Oct-2005, 17:56
Even if GPUs lag behind in performance they have a significant edge in market positioning and utilization. I have yet to be sold on the idea of a $200 PPU that will sit idle in 95% of games. This is worse than my Audigy 2 ZS which in hind site was a wasted investment due to the fact so few games use it in a way that stands out compared to integrated audio solutions
Heh you know they probably said the same thing about 3dfx's Voodoo 10 years ago.
Even if CPUs lag behind in performance they have a significant edge in market positioning and utilization. I have yet to be sold on the idea of a $200 GPU that will sit idle in 95% of games. This is worse than my Creative 16 which in hind site was a wasted investment due to the fact so few games use it in a way that stands out compared to integrated audio solutions
;)
There is a market for it, the key is getting a big time game developer behind it to show off its potential.
Chalnoth
14-Oct-2005, 18:26
The original 3D graphics accelerators had a huge advantage, though: there was a night-and-day difference between using and not using the 3D accelerators. Plus, once a game developer made use of a 3D accelerator, they got that night-and-day difference just for implementing the accelerated path.
With a physics processor, just implementing the accelerated path nets you nothing. The game developer must also then implement a whole lot of additioanl content to make a reasonable difference. So the number of games that support a PPU will remain small, and the number of games that support a PPU and show a notable difference will be even smaller.
trinibwoy
14-Oct-2005, 19:11
With a physics processor, just implementing the accelerated path nets you nothing. The game developer must also then implement a whole lot of additioanl content to make a reasonable difference. So the number of games that support a PPU will remain small, and the number of games that support a PPU and show a notable difference will be even smaller.
And the number of games that show a notable difference that is integrated into gameplay will be even smaller than that :) But I'd much rather have the hardware and have developers code for it in the future than nothing at all.
Chalnoth
14-Oct-2005, 20:26
I don't see the point, since within five years we'll be moving towards massively-parallel CPU's with Cell-like architectures that will themselves be very good at physics processing anyway.
Hellbinder
14-Oct-2005, 20:44
The original 3D graphics accelerators had a huge advantage, though: there was a night-and-day difference between using and not using the 3D accelerators. Plus, once a game developer made use of a 3D accelerator, they got that night-and-day difference just for implementing the accelerated path.
With a physics processor, just implementing the accelerated path nets you nothing. The game developer must also then implement a whole lot of additioanl content to make a reasonable difference. So the number of games that support a PPU will remain small, and the number of games that support a PPU and show a notable difference will be even smaller.
I dont think so..
Look at the trend in most FPS games from last year looking into next year. They are all pushing physics as a major selling point. So are games like Empire earth III etc..
With the big core licensing engines all supporting major physics components obviously a much greater number of developers will automatically use a PPU's if the engine is designed to from the ground up.
Chalnoth
14-Oct-2005, 21:08
I dont think so..
Look at the trend in most FPS games from last year looking into next year. They are all pushing physics as a major selling point. So are games like Empire earth III etc..
Yes, but they're pushing physics that can be run on today's CPU's just fine. Assuming that the PPU can, in a realistic gaming scenario, accelerate physics enough to dramatically increase the amount of physics processing that can be done, the developer would then need to create two separate sets of content, one for people with a PPU, one without (and, realistically, they'd have to do all physics through the same API, which would be potentially slower on CPU's than competing CPU-only physics API's, making the implementation even less likely).
I think I can recall one situation in the history of games where a developer has developed custom content that is only usable for people with specific hardware (UT's S3TC textures). So I don't think that consumers will ever see the "killer app" that really says to people, "Damn, I have to get a PPU so I can play the game like that."
But you need much more additional content to get a quality difference. Just using the PPU is not enough.
John Reynolds
14-Oct-2005, 21:09
I think I can recall one situation in the history of games where a developer has developed custom content that is only usable for people with specific hardware (UT's S3TC textures). So I don't think that consumers will ever see the "killer app" that really says to people, "Damn, I have to get a PPU so I can play the game like that."
I have a Q&A in with AGEIA and suggested something along those lines. It'll be interesting to see how, and if, they respond to that.
DemoCoder
14-Oct-2005, 21:28
GLQuake was the killer app for 3D. It was the software that sold Voodoo1's and made people say "Damn, I have to have that!"
(no, VQuake didn't have anywhere near as big of an impact for some reason)
silhouette
14-Oct-2005, 21:30
Yes, but they're pushing physics that can be run on today's CPU's just fine. Assuming that the PPU can, in a realistic gaming scenario, accelerate physics enough to dramatically increase the amount of physics processing that can be done, the developer would then need to create two separate sets of content, one for people with a PPU, one without (and, realistically, they'd have to do all physics through the same API, which would be potentially slower on CPU's than competing CPU-only physics API's, making the implementation even less likely).
I think I can recall one situation in the history of games where a developer has developed custom content that is only usable for people with specific hardware (UT's S3TC textures). So I don't think that consumers will ever see the "killer app" that really says to people, "Damn, I have to get a PPU so I can play the game like that."
There is one more scenario that PPU can be useful. If the game is CPU bound because of heavy use of physics and a slow CPU, a PPU can be useful to decrease the computation-burden on CPU, and therefore, boost overall frame rate. But, of course, you can argue that why to spend money on a PPU board rather than upgrading the CPU itself (which is infact more useful as it is going to be used in all other kinds of applications as well.)
I think the immediate use of PPU is only for boosting frame rate in CPU bound games (without adding any content), and for adding small details to the scene which the player and NPCs can not interact with (make the scene more livelier but does not add anything to game play at all). Flying pieces of papers in the streets in the MGS4 demo, better looking smoke, or each blade of grass moving independently can all be examples to the latter use.
Maintank
14-Oct-2005, 22:03
The original 3D graphics accelerators had a huge advantage, though: there was a night-and-day difference between using and not using the 3D accelerators. Plus, once a game developer made use of a 3D accelerator, they got that night-and-day difference just for implementing the accelerated path.
With a physics processor, just implementing the accelerated path nets you nothing. The game developer must also then implement a whole lot of additioanl content to make a reasonable difference. So the number of games that support a PPU will remain small, and the number of games that support a PPU and show a notable difference will be even smaller.
Sure they showed a huge advantage but didnt each game have to use the GLide or opengl API in order for it to take advantage of the 3d accelerator? At first how many games took full advantage of such a thing? I still remember the comments that software rendering on the CPU can be better than these new clunky 3d accelerators.
Here is the thing with the PPU. If a developer can create a killer game using the full potential of the product and show realistic physics in the game. You know people will be wanting more. Right now physics is one of the main issues with games, they lack a realistic feeling and is sparsley used.
Maintank
14-Oct-2005, 22:04
I don't see the point, since within five years we'll be moving towards massively-parallel CPU's with Cell-like architectures that will themselves be very good at physics processing anyway.
huh? Is Intel and AMD changing their roadmaps or something? x86 is with us for a very long time. Certainly more than 5 years from now.
Maintank
14-Oct-2005, 22:06
Yes, but they're pushing physics that can be run on today's CPU's just fine. Assuming that the PPU can, in a realistic gaming scenario, accelerate physics enough to dramatically increase the amount of physics processing that can be done, the developer would then need to create two separate sets of content, one for people with a PPU, one without (and, realistically, they'd have to do all physics through the same API, which would be potentially slower on CPU's than competing CPU-only physics API's, making the implementation even less likely).
I think I can recall one situation in the history of games where a developer has developed custom content that is only usable for people with specific hardware (UT's S3TC textures). So I don't think that consumers will ever see the "killer app" that really says to people, "Damn, I have to get a PPU so I can play the game like that."
This is true and one of the reasons why the PPU manufacturer should be onsite helping the game developer make this content. But the content you mention, isnt like having a SM2,DX7, and DX8 render path?
Chalnoth
14-Oct-2005, 22:29
Sure they showed a huge advantage but didnt each game have to use the GLide or opengl API in order for it to take advantage of the 3d accelerator? At first how many games took full advantage of such a thing? I still remember the comments that software rendering on the CPU can be better than these new clunky 3d accelerators.
Er, taking full advantage wasn't the point. Once the game was programmed with 3D acceleration in mind, typically games immediately jumped from 320x200 resolution with no texture filtering to 640x480 with bilinear texture filtering, and at higher framerate. It was the same game, the same content. Just a little extra programming for the API and you get this incredible improvement in how the game looks.
With physics you have to add new content for the game to look or play any different with the PPU. Otherwise it'll just be a slight performance improvement (on slower CPU's) that could also be realized with a slightly faster CPU.
Chalnoth
14-Oct-2005, 22:35
huh? Is Intel and AMD changing their roadmaps or something? x86 is with us for a very long time. Certainly more than 5 years from now.
Yes, x86 will be with us probably for the lifetime of silicon-based processors. But you can build a Cell-like architecture within the x86 structure. It'd basically be like having n x86 processors and m processors that use a different, simplified instruction set. It would be akin to the SSE instructions that have been added to x86 processors.
I actually got the idea from an Intel slide talking about processors over the next few years. Their plan is to start dual-core, then move to quad-core, then start adding many smaller cores around the larger single-threaded cores (i.e. cores that are efficient with single-threaded code...like today's CPU's), effectively turning the CPU into a combination of multiple single-threaded processors and a massively parallel stream processor.
Chalnoth
14-Oct-2005, 22:37
This is true and one of the reasons why the PPU manufacturer should be onsite helping the game developer make this content. But the content you mention, isnt like having a SM2,DX7, and DX8 render path?
Yes. But in this case your DX9-level hardware already does a better job at every 3D accelerated game that uses older interfaces than the DX8-level hardware. So you don't really need game developers to code for your hardware to gain a benefit.
You do with a PPU.
Hellbinder
14-Oct-2005, 23:39
What i am saying is that the engines will be designed to use the PPU's automatically in many cases. Wether the content is increased or not is a different question. CPU utilization will be decreased wich will likely speed the games up at least a few %.
ondaedg
14-Oct-2005, 23:43
Well, here is my question. Are CPUs over the next two years going to be capable of the level of physics that a dedicated PPU could do?
Joe DeFuria
14-Oct-2005, 23:55
I actually got the idea from an Intel slide talking about processors over the next few years. Their plan is to start dual-core, then move to quad-core, then start adding many smaller cores around the larger single-threaded cores (i.e. cores that are efficient with single-threaded code...like today's CPU's), effectively turning the CPU into a combination of multiple single-threaded processors and a massively parallel stream processor.
Speaking of which, a "general" AMD road map:
http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2565
Note in the "future goal" column in the table: "On-chip coprocessors."
Chalnoth
15-Oct-2005, 00:18
Well, here is my question. Are CPUs over the next two years going to be capable of the level of physics that a dedicated PPU could do?
It wouldn't even matter if they could or not, as games in two years won't be programmed to make great use of PPU's, because PPU's won't have the userbase to motivate such usage.
FrameBuffer
15-Oct-2005, 00:55
It wouldn't even matter if they could or not, as games in two years won't be programmed to make great use of PPU's, because PPU's won't have the userbase to motivate such usage.
Wow !! damn if I had known you had a time machine I would have asked about the WhiteSox/Angels game ... so what exactly do you dictateerrr I mean what does the future hold for us Oh Wise One who knows what the future holds ?
/ ;p
Chalnoth
15-Oct-2005, 01:21
Dude, just look at the adoption of new 3D graphics features. PPU adoption can be nothing but slower than that. It's called simple logic.
DemoCoder
15-Oct-2005, 01:40
I actually wish there was an APU (Audio Physics Unit). Somebody, anybody, to give Creative some competition. Aureal was heading that way, but Creative killed them.
I actually think that a Microsoft DirectPhysics API will go along way towards stimulating the market.
FrameBuffer
15-Oct-2005, 01:53
I actually wish there was an APU (Audio Physics Unit). Somebody, anybody, to give Creative some competition. Aureal was heading that way, but Creative killed them.
I actually think that a Microsoft DirectPhysics API will go along way towards stimulating the market.
IIRC, nVidia had a helping hand in that deal,. nV called upon Aureal to help design the MCP Audio in the nForce series motherboards and once done was sure to slam the door as Aureal headed out..
ondaedg
15-Oct-2005, 05:31
It wouldn't even matter if they could or not, as games in two years won't be programmed to make great use of PPU's, because PPU's won't have the userbase to motivate such usage.
If demand is created for it, then there will be a potential userbase. Unless of course you feel that increased physics is not going to benefit tomorrow's games/apps. I personally do.
Which leads me to my second question, would tomorrow's cpu's be able to handle the high level of physics that we are talking about?
Chalnoth
15-Oct-2005, 06:08
Which leads me to my second question, would tomorrow's cpu's be able to handle the high level of physics that we are talking about?
That's a question best-answered by games themselves. We've had a tremendous increase in the amount of physics calculations in games over the last couple of years, and this is only going to continue.
Wunderchu
06-Jun-2006, 06:06
http://enthusiast.hardocp.com/article.html?art=MTA3OSwxLCxoZW50aHVzaWFzdA
R300King!
06-Jun-2006, 07:43
Welcome to last year! Look at the date, 2005. :D
Chalnoth
06-Jun-2006, 07:46
Where?
Demirug
06-Jun-2006, 07:55
Where?
This thread.
But it shows that ATI (and nVidia too) have started talking about physic on GPUs long time ago. At last ATI have now some examples for effect physic in the SDK but there is still no public available SDK or detailed white paper for game physic.
Personally I find it very courageously to claim 9 times the performance of a PPU. IMHO we need a general physic API and a benchmark as soon as possible.
dizietsma
06-Jun-2006, 07:59
Is there a thread here talking about the RD600 chipset and the 3 PCie slots for 3 video cards ..one being used as a ppu ?
Chalnoth
06-Jun-2006, 08:22
This thread.
But it shows that ATI (and nVidia too) have started talking about physic on GPUs long time ago. At last ATI have now some examples for effect physic in the SDK but there is still no public available SDK or detailed white paper for game physic.
Personally I find it very courageously to claim 9 times the performance of a PPU. IMHO we need a general physic API and a benchmark as soon as possible.
Ah, I see what you're saying. Yes, Wonderchu should have started a new thread...this one's well-aged.
Anyway, I think this sort of thing is only really interesting if they can get the performance there for running the physics on the same GPU as the graphics. If the claimed 9x performance is true, then you could get physics performance equal to Ageia while only decreasing rendering performance by about 11%. Running physics in a separate video card will be far too much of a niche thing for it to ever get utilized very much by game developers.
All that said, as I've been saying over and over again, I think that we're going towards Cell-like architectures for CPU's in the coming years anyway, so that it won't make much sense to bother offloading physics before too long.
Demirug
06-Jun-2006, 09:06
Anyway, I think this sort of thing is only really interesting if they can get the performance there for running the physics on the same GPU as the graphics. If the claimed 9x performance is true, then you could get physics performance equal to Ageia while only decreasing rendering performance by about 11%. Running physics in a separate video card will be far too much of a niche thing for it to ever get utilized very much by game developers.
Yes it would be better if it can work on the lowest Level with only one card. But before WDDM2 I don’t expect a superior multi tasking performance from any vendor. The second problem will be the memory. With only one card your physics will steal you texture memory.
I believe that there is a higher chance that people buy a second GPU that can sometimes improve graphics performance and sometimes gives you more physics than a PPU.
The second model is that people that buy a new GPU don’t throw away the old one and used it for physic.
The third model is using the IGP for physic.
Overall the future for GPU physic looks IMHO much brighter than the PPU future.
All that said, as I've been saying over and over again, I think that we're going towards Cell-like architectures for CPU's in the coming years anyway, so that it won't make much sense to bother offloading physics before too long.
Maybe. I am more the multi core person. But a general API for accelerated physic can solve this problem of an unknown future.
Chalnoth
06-Jun-2006, 10:05
Maybe. I am more the multi core person. But a general API for accelerated physic can solve this problem of an unknown future.
Cell-like is a very nice branch from current multi-core technologies. It's just a move from symmetric multi-core to assymetric multi-core.
EasyRaider
06-Jun-2006, 10:12
The second problem will be the memory. With only one card your physics will steal you texture memory.
That won't amount to much. Maybe 1/4 at worst, but probably much less. If you can spare the processing cycles, you can most likely spare the memory.
I believe that there is a higher chance that people buy a second GPU that can sometimes improve graphics performance and sometimes gives you more physics than a PPU.
I doubt it will ever be worth sacrificing half the rendering performance for effects physics in a real game. Maybe it would be attractive if you could pair two cards of very different performance in Crossfire/SLI.
The second model is that people that buy a new GPU don’t throw away the old one and used it for physic.
Many people won't have a suitable one ready for replacement. In the meantime, Ageia PhysX may seem like the better choice.
The third model is using the IGP for physic.
That would be a very low performance option. They all have modest shader troughput and poor or non-existing dynamic branching. I think you might as well go for CPU physics.
skilzygw
06-Jun-2006, 13:55
ATI did say anything from an x1300 on up could be used for physics. Thats more options and cheaper than the DOA Ageia PPU.
Demirug
06-Jun-2006, 14:15
That won't amount to much. Maybe 1/4 at worst, but probably much less. If you can spare the processing cycles, you can most likely spare the memory.
Physic could take up a large amount of space, too. As comparison PhysX reserve 64 MB for every rigid body scene.
I doubt it will ever be worth sacrificing half the rendering performance for effects physics in a real game. Maybe it would be attractive if you could pair two cards of very different performance in Crossfire/SLI.
100% from SLI/Crossfire is unlikely but anyway a second GPU gives you the freedom of choice a PPU not.
Many people won't have a suitable one ready for replacement. In the meantime, Ageia PhysX may seem like the better choice.
It will depend on the price/performance ratio. If the numbers that ATI claims are true an additional GPU will be the better choice. The big problem is that everybody use a different API for this stuff.
That would be a very low performance option. They all have modest shader troughput and poor or non-existing dynamic branching. I think you might as well go for CPU physics.
This was more of a future thinking.
http://enthusiast.hardocp.com/article.html?art=MTA3OSwxLCxoZW50aHVzaWFzdA
Godfrey Cheng owns physics at ATI now?
ATI left us saying, “Expect new Crossfire revisions early next year to overtake SLI. We are not here to play second fiddle.”
Bold words. Must also have been Cheng. :lol:
It will depend on the price/performance ratio. If the numbers that ATI claims are true an additional GPU will be the better choice. The big problem is that everybody use a different API for this stuff.
ah yes.. the Api.. Havok.
Wouldn't it be interesting to see current havok titles getting a "free" upgrade? or what about ati squeezing more out of their funded titles like HL2:The Physics Episode.
Edit: Just remembered that Havok announced physics support for the Wii.
Dave Baumann
06-Jun-2006, 15:00
Godfrey is the Marketing guy for Crossfire in general now. I assume, though, that ATI's equivelent of SLI Zone was neither domain named or had design input by Godfrey - www.aticrossfire.com
Of course the problem with saying you're going to overtake something in the future is it is also an admission that you're behind *now*. But that's okay, because they still are and that self-awareness is necessary to move forward.
But they better take care of profiles as part of that "overtaking"!
EasyRaider
06-Jun-2006, 15:09
Physic could take up a large amount of space, too. As comparison PhysX reserve 64 MB for every rigid body scene.
PhysX cards have 128 MiB. Assuming that's sufficent, there's no real problem for a 512 MiB video card. It's cheaper and wastes less memory than a Crossfire/SLI setup.
100% from SLI/Crossfire is unlikely but anyway a second GPU gives you the freedom of choice a PPU not.
Well, I don't think either will become mainstream enough.
Doing gameplay physics on the CPU and effects physics on the (single) VPU seems like the only way for the majority. The question is how long before it becomes viable.
Nice article:
http://www.tweaktown.com/articles/908/
video of physics demonstration - looks decent, in other words better than anything I've seen on Ageia
the machine this was running on wasn't using the two X1900s in CrossFire according to a snapshot of the back of the rig
the article suggests that an adaptor was used to put the X1600 into a X4 or x1 (erm, whatever) slotJawed
Acert93
06-Jun-2006, 18:28
Nice find Jawed. I am VERY impressed and X1600 can do that! More than I expected from the X1600 or "effects" physics.
Be funny if they sold more CF mobos for physics than dual-gpu. But I could see it happening. . .you're doing an upgrade, you've got an old card hanging around. . .
Nice find Jawed. I am VERY impressed and X1600 can do that! More than I expected from the X1600 or "effects" physics.
It's easy to forget the "fragment rate" of X1600 is rather high, 590MHz x12 = 7080 M fragments/s. Compared to X800XT, say, which is 500MHz x16 = 8000 M fragments/s.
What'll be interesting is to see how Havok markets this - which IHV will Havok "partner" with? Which hardware will Havok be using in its demonstrations? If the comparative performance of ATI and NV hardware is as unbalanced as ATI says then it puts Havok in a bit of a tricky situation.
Jawed
Skrying
06-Jun-2006, 19:25
The only way a PPU or physics on a spare graphics card could ever work is if their becomes a unified API. Havok, IMO, has a much better chance at doing that then anything Aegia could do.
Acert93
06-Jun-2006, 19:38
I know ATI is positioning this as "effects physics". I wonder, e.g., in the demo ATI is demonstrating in the link how limited user interaction is. Lets say you have a car--are you unable to interact with the junk vortex at all? If you can, what will the impact be on performance?
Chalnoth
06-Jun-2006, 19:42
Godfrey Cheng owns physics at ATI now?
ATI left us saying, “Expect new Crossfire revisions early next year to overtake SLI. We are not here to play second fiddle.”
Bold words. Must also have been Cheng. :lol:
Considering it looks like nVidia will probably be moving on to quad SLI by that time for the ultra high-end market, I'm not sure nVidia has much to fear.
Cell-like is a very nice branch from current multi-core technologies. It's just a move from symmetric multi-core to assymetric multi-core.
To me Cell isn't about asymetric cores, but the memory architecture. The local store is what makes it Cell. I don't expect PC CPUs to go in this direction.
Chalnoth
07-Jun-2006, 02:34
To me Cell isn't about asymetric cores, but the memory architecture. The local store is what makes it Cell. I don't expect PC CPUs to go in this direction.
Perhaps not, but I do expect them to go in the direction of asymetric cores, which is what I mean by Cell-like. I'd consider the local store to be a relatively minor implementation detail (which should be maskable by a good compiler and/or library).
What'll be interesting is to see how Havok markets this - which IHV will Havok "partner" with? Which hardware will Havok be using in its demonstrations? If the comparative performance of ATI and NV hardware is as unbalanced as ATI says then it puts Havok in a bit of a tricky situation.
Jawed
Well.. basically it looks like Havok will be supporting the Ageia and Ati implementation.
It will be two architectures from there on right? I think Ageia will see an implementation on Geforce boards and Wii and 360 will run their ati paths from the ati chips....
Demirug
07-Jun-2006, 13:15
Well.. basically it looks like Havok will be supporting the Ageia and Ati implementation.
It will be two architectures from there on right? I think Ageia will see an implementation on Geforce boards and Wii and 360 will run their ati paths from the ati chips....
HavokFX is based on Shader Model 3 or compareable OpenGL extension. This means if PhysX supports Shader Model 3 Havok FX will run on it. ;)
There are a least 3 different architectures at the moment:
The PhysX API works with PhysX
HavokFX work with Shader Model 3 Hardware
An ATI API that works with ATI GPUs (IIRC any with SM3 support).
Maybe will will see more in the future and hopefully we will see one that will work with every chip.
IgnorancePersonified
07-Jun-2006, 14:06
With the way Ageia has done it's thing does it get any $ from selling its api or just from chip sales? I am confused. I thought that was the plan but then I read somewhere they are basically giving it away.
This XBit article is more informative:
http://www.xbitlabs.com/news/multimedia/display/20060606235605.html
According to ATI’s internal benchmarks, Ageia PhysX PPU (366MHz) can perform about half a million sphere-to-sphere collisions per second, whereas the Radeon X1600 XT (590MHz, 12 pixel shader processors) delivers over a million, meanwhile performance of Radeon X1900 XTX (650MHz, 48 pixel shader processors) reaches five million sphere-to-sphere collisions per second.
I wonder what sort of collision frame rates are considered acceptable for effects physics? 30fps? 15?
How much more complex are non-sphere-to-sphere collisions? An order of magnitude?
These numbers imply that S-S collisions are pretty much compute-bound rather than bandwidth bound. 5M collisions per second on X1900XTX is about 6240 cycles per collision. Blimey that's a hell of a lot.
Havok FX supports a new type of rigid-body object called a Debris Primitive. [...] Debris Primitives may also be generated on the fly [...]
Havok FX Debris Primitives can even interact with game-play critical objects, through an innovative approach that will provide the GPU with a one-way transfer of critical information [...]
So, very much geared towards effects though able to take account of the world through which they move (otherwise why bother, eh?).
Jawed
An ATI API that works with ATI GPUs (IIRC any with SM3 support).
Maybe will will see more in the future and hopefully we will see one that will work with every chip.
So, is the version of Havok running on Hollywood, Xenos and Fudo the same? or are these definite different versions?
skilzygw
07-Jun-2006, 16:35
What about microsofts direct physics?
Demirug
07-Jun-2006, 16:44
So, is the version of Havok running on Hollywood, Xenos and Fudo the same? or are these definite different versions?
As these chips are used in 3 different platforms there are differences for sure. But they should base all on the same basic system.
Demirug
07-Jun-2006, 16:48
What about microsofts direct physics?
Never public announced and maybe already dead.
Never public announced and maybe already dead.
There's been a disturbance in the Force, Demi-wan? Or you have something more concrete to point at?
Demirug
07-Jun-2006, 17:20
There's been a disturbance in the Force, Demi-wan? Or you have something more concrete to point at?
After this job offer there was nothing more to hear about this project. This is very unlikely.
NocturnDragon
07-Jun-2006, 18:36
An ATI API that works with ATI GPUs (IIRC any with SM3 support).
Maybe will will see more in the future and hopefully we will see one that will work with every chip.
Iirc I read on some site (I cannot find the link anymore) that the ATI API had a backend on SM3 and GLSL for all the other GPUs, while it ran on the metal with ATI ones.
RobertR1
08-Jun-2006, 01:26
Will Ati actually do some PR work and get some Devs on board unlike with HDR+AA?
MistaPi
08-Jun-2006, 03:24
If the numbers that ATI claims are true an additional GPU will be the better choice.
But isn't a little strange that a mid-range GPU would be faster at physics than a dedicated solution?
Demirug
08-Jun-2006, 06:29
But isn't a little strange that a mid-range GPU would be faster at physics than a dedicated solution?
How dedicated is PhysX really? It’s dedicated to do fast floating-point calculations but modern GPUs can do this to. The main advantage of PhysX is the additional MIPS core that can help to save some additional CPU power with offload some of the management.
But isn't a little strange that a mid-range GPU would be faster at physics than a dedicated solution?
a mid-range GPU, with lower retail price but higher clock speeds...
It's allready brilliant considering the PCIE1x 1300 that was launched at computex, you shouldn't need 4x PCI lanes for physics, right? (or am i underestimating the amount of data that travels the bus?)
Mintmaster
08-Jun-2006, 09:44
How much more complex are non-sphere-to-sphere collisions? An order of magnitude?
These numbers imply that S-S collisions are pretty much compute-bound rather than bandwidth bound. 5M collisions per second on X1900XTX is about 6240 cycles per collision. Blimey that's a hell of a lot.
I can't imagine it's spending 6000+ cycles simply on a sphere-sphere intersection, which is really simple. The benchmark probably includes a lot of data traversal. Maybe they have a bounding sphere heirarchy or something. This is the sort of thing that DB will benefit greatly, so I can imagine ATI coding for this.
10,000 sphere objects would need a hundred million S-S collisions (i.e. 20 sec/frame on R580) if done the dumb way. A heirarchy could reduce that to maybe a hundred thousand or so. However, such a structure is not easy to implement in pixel shaders with a stream processing model.
So any word on whether ATI will release some demos on this stuff?
But isn't a little strange that a mid-range GPU would be faster at physics than a dedicated solution?
Economies of scale.
Demirug
08-Jun-2006, 09:56
I can't imagine it's spending 6000+ cycles simply on a sphere-sphere intersection, which is really simple. The benchmark probably includes a lot of data traversal. Maybe they have a bounding sphere heirarchy or something. This is the sort of thing that DB will benefit greatly, so I can imagine ATI coding for this.
As there is no tool like ShaderPerf for ATI I run a quick check for the G7X chips. A S-S-C can be done in one pipe in 4 cycles per check. We are sure get some additional overhead for the memory access (32 bit floats) but even in this case I can’t see why it should burn 6000+ cycles per check. Maybe 20 but this is already high.
It sounds more like the run a whole demo that does more than only the checks.
Demirug
08-Jun-2006, 10:00
Economies of scale.
= Development costs for the hardware are already paid by people who buy this chip for graphics?
Or does the foundries today offer such great quantity discounts?
Chalnoth
08-Jun-2006, 10:06
I believe it's mostly the first. From what I understand, the majority of the cost of most hardware is in R&D.
Dave Baumann
08-Jun-2006, 10:38
= Development costs for the hardware are already paid by people who buy this chip for graphics?
This chip and the whole line of chips using the same architecture (development costs have so far been been spread across 4 chips using the same fundamental architecture).
Or does the foundries today offer such great quantity discounts?
There is quite a bit of that as well.
Demirug
08-Jun-2006, 10:50
This chip and the whole line of chips using the same architecture (development costs have so far been been spread across 4 chips using the same fundamental architecture).
I know this.
My question was just a try to get more than 2 words from Dio. :)
I can't imagine it's spending 6000+ cycles simply on a sphere-sphere intersection, which is really simple. The benchmark probably includes a lot of data traversal. Maybe they have a bounding sphere heirarchy or something. This is the sort of thing that DB will benefit greatly, so I can imagine ATI coding for this.
10,000 sphere objects would need a hundred million S-S collisions (i.e. 20 sec/frame on R580) if done the dumb way. A heirarchy could reduce that to maybe a hundred thousand or so. However, such a structure is not easy to implement in pixel shaders with a stream processing model.
It seems like it's latency bound to me, rather than bandwidth bound, since the scaling from RV530 to R580 (~5x) is in line with the relative fragment rates (~4.4x) rather than bandwidth (~2x).
So that would seem to imply that the batch size of 48 is still far too large. Is that reasonable?
Or maybe the 5M number is simply the worst case, where every sphere is colliding with 12 others in the tightest packing. I really don't understand this stuff :oops:
So any word on whether ATI will release some demos on this stuff?
There is an R2VB Collision demo in the March 06 SDK. Needs SM3.
The Data Parallel Processing architecture needs to get out there, too. It's supposed to be aimed at the GPGPU guys as well.
But it's also down to Havok I guess.
Jawed
IgnorancePersonified
08-Jun-2006, 12:34
I hope someone comes up with an Empire Strikes Back Asteroid Field Demo. Fly the Falcon through the field chased by tie fighters whilst using the vehicles advanced vectored thrust :?: and tractor beam to not only navigate the obstacle course safley but swing smaller roids into the tie fighters path.
Chalnoth
08-Jun-2006, 13:51
Well, vectored thrust is trivial to program, basically. It's collision detection with the large number of objects that would be tough to do in such a situation.
It's collision detection with the large number of objects that would be tough to do in such a situation.
Why? I expect an objects density, mass, weight etc. to be variables in an objects' properties and it would be like applying a texture to an object. your physics card... should be able to do the calculations..
I doubt we'll see something like 20.000 asteroids on screen (everything is still poly sparse.) and even then there should be more than enough memory on a 1600 to hold (a limited amount of data) for such objects
Mate Kovacs
08-Jun-2006, 15:36
I can't imagine it's spending 6000+ cycles simply on a sphere-sphere intersection, which is really simple.
Well, the sphere-sphere collision test isn't just a sphere-sphere intersection test (linky), but it's still possible in a handful of cycles, so you're right, 6K+ cycles is nonsense.
10,000 sphere objects would need a hundred million S-S collisions (i.e. 20 sec/frame on R580) if done the dumb way. A heirarchy could reduce that to maybe a hundred thousand or so. However, such a structure is not easy to implement in pixel shaders with a stream processing model.
Or maybe AABB sweep? I guess implementing the insertion sort would be a bit problematic. :)
Chalnoth
08-Jun-2006, 18:17
Why? I expect an objects density, mass, weight etc. to be variables in an objects' properties and it would be like applying a texture to an object. your physics card... should be able to do the calculations..
Most of the objects won't be acted upon by any force, and therefore won't need any more calculation than collision detection and a simple progression of their movement.
Additionally, even when being acted upon, the most complex rigid body's reaction to forces can be fully-described by ten numbers (its mass, six more for the symmetric 3x3 moment of inertia tensor, and three for the location of the center of mass...you can reduce this all to four numbers by choosing the zero for the model to be the center of mass, and the model's x, y, and z axes to be those which diagonalize the moment of inertia tensor). The calculations for acting of a force upon a rigid body are pretty simple.
The only difficult part would be in dealing with collisions that result in breakaway pieces, but since collisions are going to only happen among a small fraction of the asteroids at any given time, you can spend a significant amount of time with each one and not impact performance.
bdotobdot2
08-Jun-2006, 18:27
Most of the objects won't be acted upon by any force, and therefore won't need any more calculation than collision detection and a simple progression of their movement.
Gravity should be acting on all of them and forces acting will go up from there.
you can reduce this all to four numbers by choosing the zero for the model to be the center of mass, and the model's x, y, and z axes to be those which diagonalize the moment of inertia tensor
But of course you can't do this with multiple bodies in the system, e.g. asteroid field. (or if you do, it doesn't save you anything, since then you have to do the axes transformations.)
Chalnoth
08-Jun-2006, 18:37
Gravity should be acting on all of them and forces acting will go up from there.
The effect of gravity would be so minimal in the local area of an asteroid field that it can easily be ignored when performing a simulation of flying through one.
Chalnoth
08-Jun-2006, 18:39
But of course you can't do this with multiple bodies in the system, e.g. asteroid field. (or if you do, it doesn't save you anything, since then you have to do the axes transformations.)
Yeah, that's true. Easier, I guess, just to go with the full description.
The only difficult part would be in dealing with collisions that result in breakaway pieces, but since collisions are going to only happen among a small fraction of the asteroids at any given time, you can spend a significant amount of time with each one and not impact performance.
The problem is, I haven't seen "breaking" in a physics demo yet :)
So far, everyone involved in physics seems to restrain itself to objectX with Y velocity and Z mass colliding with object A with B rigidity.
Or maybe AABB sweep? I guess implementing the insertion sort would be a bit problematic. :)For a demo you might be able to get away a fixed space subdivision by wasting a lot of memory, if you use a grid with a grid distance the same as the particle diameter you might even be able to use rasterization to update the structure (map each particle to the closest grid point, you can't ever get more than one particle per grid point without intersections). Not very practical outside a demo though.
My question was just a try to get more than 2 words from Dio. :)
I know, I don't post enough nowadays... I just don't have much left to say anymore. I haven't worked out if it's because you guys worked it all out and don't need me <sniff> or if I am just more circumspect :).
I did consider just replying "Both." :D
Chalnoth
08-Jun-2006, 19:52
The problem is, I haven't seen "breaking" in a physics demo yet :)
So far, everyone involved in physics seems to restrain itself to objectX with Y velocity and Z mass colliding with object A with B rigidity.
Well, right, because it's a hard problem to solve. You've got two solutions, basically:
1. Have each rigid body actually be a collection of bodies. Every time a force acts on the object, calculate the stress/strain at every connection, and break if it's too great. This solution has the problem that it makes forces acting on the object take much more processing power than the simple rigid body dynamics case.
2. Have a set of algorithms that automatically generate new bodies from old ones in the event of violent collisions. This algorithm has the drawback that performance becomes potentially unbounded if there are many collisions.
Well, right, because it's a hard problem to solve. You've got two solutions, basically:
1. Have each rigid body actually be a collection of bodies. Every time a force acts on the object, calculate the stress/strain at every connection, and break if it's too great. This solution has the problem that it makes forces acting on the object take much more processing power than the simple rigid body dynamics case.
2. Have a set of algorithms that automatically generate new bodies from old ones in the event of violent collisions. This algorithm has the drawback that performance becomes potentially unbounded if there are many collisions.
I wouldn't consider the first option (considering something like Red Faction.)
Number two seems much more plausible.
Sure, it causes an "easy" object with say, 500 poly's to generate a lot of smaller objects causing the amount of objects and amount of polygons in one scene to increase by an insane magnitude.. but the calculations are simple.
Force Vectors; surely, calculating a few vectors upon impact: one main vector for conveying the energy of impact and a limited amount of reflecting vectors deciding the amount of "debris" caused by such an impact.
I hope someone comes up with an Empire Strikes Back Asteroid Field Demo. Fly the Falcon through the field chased by tie fighters whilst using the vehicles advanced vectored thrust :?: and tractor beam to not only navigate the obstacle course safley but swing smaller roids into the tie fighters path.
Or even just Atari 'Asteroids'... now in 3D!
Humus, how about a demo? :)
IgnorancePersonified
09-Jun-2006, 02:53
Yeh didn't think vectored thrust was hard, just indicating that the empire strikes back scenario doesn't use real physics for all objects and any demo wouldn't need to. Exploding asteroids and tie fighters leaving no debris, "flying' through space, walking on the "floor" of the slug lair etc. However the Asteroids themselves provides a good case to demonstrate the difference between the 2 solutions with a toggle switch.
I didn't think the Ati demo was really that compelling. A vortex of wierd shapes spinning at high speed?? I slowed the vid that was provided down and it still still didn't realy blow me away. I know it's a lot of hard work but I found some of the agiea demo's a lot more interesting (Hanger ones) since they could be tansportable into a game scenario. In fact I think they were using Unreal 3 engine.
I'm talking about recreating the movie action scene - not simualting a "real" asteroid field since the movie was far from real and gives a lot of lattitude to show effects without being nailed to the realism cross.
Acert93
09-Jun-2006, 03:27
Not to change directions too much...
But will the changes in the D3D10 pipeline be helpful in running physics on a GPU? If so, what changes will be beneficial and in what ways?
I ask because one of the 2 primary goals setup at the onset of D3D10 development was to continue moving the API and GPUs toward architectures that were general purpose friendly(-er).
Mintmaster
09-Jun-2006, 05:08
Or maybe AABB sweep? I guess implementing the insertion sort would be a bit problematic. :)
Yeah, this is why I'm skeptical about how good GPUs will be at physics. All the tricks to reduce pair-wise interactions would be really tough to implement in a massively parallel algorithm. I could see Cell and Xenon doing fine because they don't need to decouple calculations as much to run well, but it looks tough for GPUs to do general physics.
When NVidia and AGEIA talked about 10,000 objects/boulders/particles, I was thinking oh crap, is this just n^2 brute force? I'd be surprised if a more intelligent CPU method didn't perform faster.
It would make more sense to me if GPUs only checked for collisions with a few objects external to the system, or if the CPU narrowed down the search somehow (which would require the GPU to send info back to the CPU).
I wonder if all the N-body particle interaction algorithms used for astronomy (galaxies simulations) are going to be dusted off to see if there are techniques for reducing complexity.
Think through worse case complexity; I guess you start with collision detection and on collison then you not only have to conserve momentum and kinetic energy, you have the added complexity of checking each body for elastic vs plastic vs destructive deformations during interactions and checking for further collateral releases of potential energy.
A simple example of this would be say a molotov cocktail (glass petrol bomb) hitting a pane of glass. The bomb could bounce of unhurt (at low velocity), but if this happenned and the bomb shattered on the ground it might explode and break the glass (or not), and / or the heat of its fire might melt or shatter the window pane (or not). Alternatively the bottle might pass through the pane of glass (shattering the bottle or not), or both might shatter becoming travelling shrapnel. Finally you must also check for the release (and timing) of stored potential energy (e.g. when the petrol bomb ruptures you'd have flying glass and a spray of igniting petrol rapidly vaporising, possibly starting a fuel air explosion) all of which gives you new objects and imparts new kinetic energy and momentum vectors to everything in the blast radius which could trigger further latent potential energy releases from other affected bodies.
That's the penultimate modelling I guess.
Tricky do do for many bodies in real time, possibly N-Bodies interaction simplification techniques could help...
http://img95.imageshack.us/img95/8278/crossfirephysics0su.jpg (http://imageshack.us)
....
I really like the concept, though.. I have 2 spare cards right now and they're just sitting pretty.
Then again, this might as well be our future..
http://img80.imageshack.us/img80/5717/sliocta1kj.jpg (http://imageshack.us)
:roll:
nutball
10-Jun-2006, 10:08
I wonder if all the N-body particle interaction algorithms used for astronomy (galaxies simulations) are going to be dusted off to see if there are techniques for reducing complexity.
Those algorithms rely heavily on the sorts of data-structures which are quite hard to build on GPUs in their current form (trees, linked-lists, etc.).
According to ATI’s internal benchmarks, Ageia PhysX PPU (366MHz) can perform about half a million sphere-to-sphere collisions per second, whereas the Radeon X1600 XT (590MHz, 12 pixel shader processors) delivers over a million, meanwhile performance of Radeon X1900 XTX (650MHz, 48 pixel shader processors) reaches five million sphere-to-sphere collisions per second.
5 million???
just quickly (half drunk) done a benchmark on my athlon2.0 ghz, + i get over 40million/sec + even ~30million with high collision rates, thus either
A/ they suck (so why the noise) so that cant be right
B/ theyre doing a lot more than sphere-to-sphere collisions
+ WRT flying through asteroid belt see one of these (sorry i should of labled them better + excuse the singing as well)
http://rapidshare.de/files/7375543/testF.avi.html
http://rapidshare.de/files/7281010/testE.avi.html
http://rapidshare.de/files/7236368/testB.avi.html
Mate Kovacs
10-Jun-2006, 11:49
Those algorithms rely heavily on the sorts of data-structures which are quite hard to build on GPUs in their current form (trees, linked-lists, etc.).
Yeah, anything that needs random access would suffer from the stream model. Even an AABB sweep (that only needs arrays, so nothing fancy) would suck.
IgnorancePersonified
10-Jun-2006, 12:07
zed:
Says the files have been deleted excepting the middle one.
thats the one flying over an island i take it?
another was in a desert + another was flying through an asteroid belt.
i can upload those if ppl wish
thats the one flying over an island i take it?
another was in a desert + another was flying through an asteroid belt.
i can upload those if ppl wish
It would be very nice if you do it. Thanks.
scuse the singing + soundquality (ill be getting a soundcound card soon, so will record some more crap)
http://rapidshare.de/files/22790157/video_B.avi.html
http://www.filefactory.com/?b564b1
the asteroids are in the second one.
these vids are from 7 months ago, ive changed the gameplay somewhats its more of a typical shoot-em-up, the main problem with having a thirdperson viewpoint is u cant aim that well (necessary in a fast paced game)
im curious how gears of war solves it, im 99% theyre shooting from the first person view, which may work with fast bullets but what about missiles etc?
Firingsquad have posted an interview (http://www.firingsquad.com/news/newsarticle.asp?searchid=10649) with Havok regarding the ATI announcement:
[...]
FiringSquad: Are there any plans to have something similar to the ATI Crossfire method for NVIDIA's SLI set up, especially since NVIDIA already has support for four graphics cards in SLI mode?
Jeff Yates: To be fair, this is something that works already for both setups that are being tested in house. There are no barriers we know of that will prevent games using Havok FX from leveraging dual GPU configurations in either camp – including uses of older graphics cards.
[...]
IgnorancePersonified
12-Jun-2006, 13:17
scuse the singing + soundquality (ill be getting a soundcound card soon, so will record some more crap)
http://rapidshare.de/files/22790157/video_B.avi.html
http://www.filefactory.com/?b564b1
the asteroids are in the second one.
these vids are from 7 months ago, ive changed the gameplay somewhats its more of a typical shoot-em-up, the main problem with having a thirdperson viewpoint is u cant aim that well (necessary in a fast paced game)
im curious how gears of war solves it, im 99% theyre shooting from the first person view, which may work with fast bullets but what about missiles etc?
Very cool zed. Thanks for the links. A couple of questions if you don't mind. You mentioned somewhere that the asteroids would blow up from point of impact. Did you get that too happen?
Is it right to say that each asteroid is unique? and a gameplay question: What is happening when the ship parks near a roid and the green light comes on... mining?
GraphixViolence
12-Jun-2006, 17:23
FYI, here's a few notes I jotted down from Nvidia's GPGPU talk in the Advanced Visual Effects with OpenGL tutorial at GDC, which I thought might be relevant to this discussion. They're talking about the Havok FX implementation in this case. Unfortunately I didn't get a copy of the actual presentation.
- Each collision requires >1500 pixel shader cycles, and ~100 texture fetches
- Uses CPU for collision detection, GPU for integrating positions/velocities and resolving collisions
- Said collisions tend to be a sparse data set; need to either pixel gather (use dynamic branching) or vertex scatter (render single pixel primitives at particular co-ordinates); which one is better depends on matrix size and distribution of collisions
- Showed a graph of performance vs. collision matrix size, which stair-stepped with every 1k increase in matrix size (appeared to illustrate the effect of thread size for dynamic branching)
And a couple more notes from Havok's GDC presentation on Havok FX:
- Makes collision physics a data parallel task by grouping collisions into sets of unrelated pairs that can be processed in parallel (usually 1000s of pairs per batch), then iterating until all pairs have been processed
- Position/velocity integration is 100% data parallel, collision detection is 70% data parallel, and collision solving is 99% data parallel[/font]
And finally, a link to ATI's white paper on asymmetric physics processing which gives some insight into their approach:
http://www.ati.com/technology/crossfire/promotions/physics/Asymmetric_Physics_Processing_with_ATI_CrossFire.p df
Richard
12-Jun-2006, 19:37
100 texture fetches? Is that correct?
This doesn't contain as much detail as we might hope:
http://download.nvidia.com/developer/presentations/2006/gdc/2006-GDC-NVIDIA-Havok_FX.pdf
More struggles with collision capabilities:
http://www.continuousphysics.com/Bullet/phpBB2/viewtopic.php?t=321&sid=daea1fabd4eaaebfb605d18f8f162a60
Jawed
Very cool zed. Thanks for the links. A couple of questions if you don't mind. You mentioned somewhere that the asteroids would blow up from point of impact. Did you get that too happen?
Is it right to say that each asteroid is unique? and a gameplay question: What is happening when the ship parks near a roid and the green light comes on... mining?
cheers
well the videos dont have much relation to what the game is like now, see gameplay footage here
http://www.filefactory.com/?35b6ab (no music, btw im rendering about 200k polygons/frame )
i've had to change it from a 3d shooter where u can move up + down, to a more traditional 2d one where everything happens on the xz plane. the reason for this change was it was to difficult to play, to aim with a 3rd person view is practically impossible to do fast accurately, the only solution was to make a firstperson camera viewpoint (which didnt appeal to me, as i wanted more of an oldskool feel) or implement some autoaim (which i did have in the videos u may notice) personally though i hated autoaim, as it dumbed down the game to much and aint really in the spirit of a shoot-em-up.
asteroid - i dont know about blowing apart from the point of inpact, the explosions aint realistic, im just doing the game 'asteroids' approach replace a larger asteroid with 3 smaller ones at the place of inpact
also the asteroids are procedurally generated so they can be unique but im just creating a pool of 16 different types, for speed reasons
After this job offer there was nothing more to hear about this project. This is very unlikely.
ExtremeTech, [H], Xbit, and Techpowerup have headlines about this, with neither of them bothering to notice the job listing is nearly a year old now. :lol:
http://www.extremetech.com/article2/0,1558,1979051,00.asp
http://www.hardocp.com/news.html?news=MTk2NTEsLCxobmV3cywsLDE=
http://www.xbitlabs.com/news/multimedia/display/20060620235215.html
http://www.techpowerup.com/index.php?13408
Demirug
21-Jun-2006, 14:32
Looks like that everybody is searching for something to fill the summer hole.
In this case I should tell you something new but I need to refer to the job listing too. You may remember that the want somebody who have experience with HLSL. HLSL is the one and only shader language for D3D10. Maybe you have heard about the GPGPU sample in the D3D10 part of the SDK. One interesting part of this sample is that it creates a head less device. A new feature that allows using a GPU without output something to a window or screen. So far nothing new at all but after a short question to Microsoft I now know that the spec allows to write D3D10 drivers for non GPU device. Any company can take a strong math copro put it on a card and write a D3D10 driver for it to make it’s power useable with the D3D10 API. This could make DirectPhysic to a D3D10 extension.
Demi, did you just suggest that PhysX might be compatible with a new MS DirectPhysics? Sometimes I think you and Ail could sit next to each other and rule the world without anyone else catching 10% of your meaning. :wink:
DailyTech has a piece too this morning, but they went up a notch in my (not too high) estimation, as at least they pointed at the fact that the listing is nearly a year old.
Having said that. . .if I was the suspicous type (who, me?), I might wonder if there's a hand in the background somewhere that thought it might be useful to get this some more visibility right now even though they didn't want to show any new facts --but that doesn't mean there aren't some new facts somewhere. . .
Demirug
21-Jun-2006, 15:47
Demi, did you just suggest that PhysX might be compatible with a new MS DirectPhysics?
This is something that is not impossible at all.
At least Microsoft will allow something like this. But I have to add that the person at Microsoft told me too that he is not aware that there is anybody outside the GPU business currently working on a D3D10 driver for their hardware. This mean that the D3D10 driver for the non GPU path is not tested and may not work correctly in the current version of Vista/D3D10.
Otto Dafe
22-Jun-2006, 21:34
Anyone see much chance this'll work with IGPs someday? It would be nice to get some bonus from otherwise wasted silicon, I don't know, perhaps that could even motivate people using IGPs to get AIBs.
Otto Dafe
22-Jun-2006, 21:35
Oh btw, hi everyone, long time listener, first time caller :smile:
Anyone see much chance this'll work with IGPs someday? It would be nice to get some bonus from otherwise wasted silicon, I don't know, perhaps that could even motivate people using IGPs to get AIBs.
"Someday" is so open-ended. ;) At the moment, ATI isn't going below X1600 for physics support, and their current IGPs are all below that.
With next year's DX10 IGP part? Well, could be I suppose. . . I could see that being a "decelerator" tho, depending on the viddy card it was paired with. . .
Otto Dafe
22-Jun-2006, 23:35
With next year's DX10 IGP part?
Yes, with Havokfx only requiring SM 3 it seems any DX10 IGP (or Aero capable, maybe?) would be more than sufficient...
. . . I could see that being a "decelerator" tho, depending on the viddy card it was paired with. . .
notwithstanding. :wink: I would expect some acceleration out of a 4 pixel pipe part from one of the big 2, but I guess only if it had batch sizes/ dyn branching/ threading reasonably in line with it's bigger cousins, which probably puts someday quite a few years back. Would anyone hazard a guess as to when we'd see something X1600ish in an IGP?
Would anyone hazard a guess as to when we'd see something X1600ish in an IGP?
Speed-wise, next year for sure. Maybe even end of this year.
Speed-wise, next year for sure. Maybe even end of this year.
From who?
From who?
All of them. We already have X1300-level stuff, hell even the S3 integrated 4-piper is on that level with much better video to boot (EDIT: sucks in 3D as just measured). So just by logical extrapolation, the additional 30-50% speed till next year sounds realistic, don't you think?
Well, I haven't noticed that IGP scales at the same rate/frequency as the top, alas. Of course, I'd prefer that you be right --I've been generally concerned that the gap between the bottom and the top is not a good thing. And Vista would certainly be an opportunity to address it.
ATI's current IGP is a two-piper, not a 4, and that's before you even get into the 3-1 shader thingy with X1600. I notice their next IGP is still described as "X700" core, which still doesn't give a lot of comfort for getting to X1600-level. Will their DX10 part, scheduled for next year, get to that level? I'd like to think so, but aren't prepared to put it in my pocket yet. But that one is certainly "next year", so I was curious who you had in mind for end of this year for that level.
Has anyone benched the S3 one yet?
I was curious who you had in mind for end of this year for that level.
Has anyone benched the S3 one yet?
Actually, I (completely blue-eyed, I admit) expect the upcoming Intel part to be at about X1600 level in general, speed-wise.
I never bothered installing any games on my HTPC besides Rayman and MDK, but if I get around to it I'll post the results.
I did compare IQ and (percieved) speed when playing videos. IQ-wise, it beats the shite out of both GF5xxx and ATIX8xx. It seemed to run a bit smoother as well, but I haven't measured it.
EDIT: in 3D, it runs slower than GF2, it seems. UT2k3 in 800x600 default runs anywhere between <10 and teens. 3DMark 2001SE says ~1350, which is absolutely awful. So it's a no-go for games, correcting my blabber above :oops:
Otto Dafe
23-Jun-2006, 22:41
I figure ATI is a bit reluctant to trickle their higher tech down right now, as their gradual progress towards unified shaders in the high-end comes with considerable overhead--overhead that doesn't really justify itself in the low-end where there aren't hungry pipes to feed. Come Vista though, everyone's gonna have to have top-to-bottom feature compliance, or atleast a hacked in attempt(whatever MS let's them slide with). I don't think the moogles will be too happy if they get their new Vista-ready Dell home to be greeted with a dialog box, "your desktop could look prettier, cheapskate. Please wait while windows reverts to win2k mode."
The interesting possibility I see here is that, in the anarchic wasteland that is HW physics support, a dev could atleast assume that any GPU that isn't pushing VGA could pick up some physics slack, and I bet their are a large amount of dormant IGPs out there. I wonder then what the low end requirements are for a GPU to be more than a physics decelerator, a la geo's comment. The price of admission is what--one frame of latency(possibly 2?), the cpu time to set up the data, and a round trip on the bus(probably nominal?).
http://www.beyond3d.com/forum/showthread.php?t=31698
In the CC today, Orton opined that physics starts to move signficant CrossFire volumes in "9-12 months".
What do we suppose he had in mind there as a driver event 9-12 months from now? Something API related?
I guess within 9-12 month we actually see some games with havok fx support
vBulletin® v3.8.6, Copyright ©2000-2013, Jelsoft Enterprises Ltd.