Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 26-Mar-2009, 20:28   #51
trinibwoy
Meh
 
Join Date: Mar 2004
Location: New York
Posts: 8,432
Default

Quote:
Originally Posted by aaronspink View Post
OpenCL is an open standard defined and maintained out in the open by a group of industry representatives, one of which is AMD?
Wooo, the point is in the other direction. But I'm sure you missed it on purpose Degustator asked why Nvidia should be responsible for developing the CAL backend.....not whether it makes sense for AMD to support a competitor's standard.

Quote:
Originally Posted by Jawed View Post
If reading 32-bits was optimal then there'd be no need to offer larger than 32-bit fetches, per thread...
In the general case it should be.

Quote:
Whether those vector fetches are for multiple data points or for multiple work items (or both) is just a detail, something the developer needs to futz with.
That's actually the basis of my admiration for Nvidia's approach. If you are forced to futz with multiple-work items per thread because of hardware configuration and not because the algorithm calls for it then it's a lot more than "just a detail". It's a major PITA. As you pointed out there will be specific cases where you need to manage this stuff more closely but I don't think those corner cases are sufficient as a basis for general best practices.
trinibwoy is online now   Reply With Quote
Old 26-Mar-2009, 21:01   #52
mhouston
System Architect, AMD
 
Join Date: Oct 2005
Location: Santa Clara
Posts: 317
Default

Quote:
Originally Posted by Davros View Post
Mike just wondered how you see the future of gpu physics do you think there will be a vendor war or vendor specific physics will give way to an neutral system dx11 compute shaders maybe ?
I speak only for myself on this one, not AMD. I think that GPU physics cannot succeed unless there is a neutral way to run on multiple platforms. (All of the physics engines run on the CPU, but we need a way to target GPUs and other architectures as well) Basing physics, and other middleware, on OpenCL, DX11, or another vendor neutral standard seems like the best way forward IMO. OpenCL has the potential advantage that multi-core CPUs, Cell, and other architectures can be supported under the same system and code. (Tuning will be different for each architecture, but getting something up and running should work if we got conformance tests right).

Coming up with a "standard" physics package is tricky because there is a lot of religion in how the solvers are implemented, i.e. there is no "one solver to rule them all". Also, to get things to run well on a GPU or a large multi-core will need exploration into algorithms that map well to massively parallel systems and the APIs need to be designed to batch smaller primitives together to "bulk" up the submission to a parallel system. (If any grad students/developers are reading this and are interested in researching this, drop me a note).
mhouston is offline   Reply With Quote
Old 26-Mar-2009, 21:30   #53
Arnold Beckenbauer
Member
 
Join Date: Oct 2006
Location: Germany
Posts: 791
Default

Mike, who did write the OpenCL code for Havok Cloth: You (AMD) or Havok?
__________________
Hail Brothers and Sisters! Coranon Silaria, Ozoo Mahoke
Eta Kooram Nah Smech!

Find Chuck Norris.
Arnold Beckenbauer is online now   Reply With Quote
Old 26-Mar-2009, 22:53   #54
mhouston
System Architect, AMD
 
Join Date: Oct 2005
Location: Santa Clara
Posts: 317
Default

AMD did the porting for the initial demos for GDC. We took the C functions that underpin the Havok API and ported them to OpenCL, i.e. some runtime OCL code and then the compute loops turned into OCL kernels.
mhouston is offline   Reply With Quote
Old 27-Mar-2009, 00:12   #55
Arnold Beckenbauer
Member
 
Join Date: Oct 2006
Location: Germany
Posts: 791
Default

Quote:
Originally Posted by trinibwoy View Post
...
Are they outsourcing their CAL backend for OpenCL too?
I presume, yes?
Quote:
How is that any different (technically) from making one for CUDA?
What is CUDA? A "thing", that allows you to get low-level access to Nvidia's hardware?
Quote:
Does Microsoft write AMD's DirectX driver?
Bad joke.
I'll try to say the same thoughts with other words:
What's held Nvidia back to make PhysX run on Radeons? AMD? No, everyone can download the ATi Stream SDK. Ok, its documentation is not as good as CUDA documentation.
What's next? The Stream SDK itself? Possible. Or the red hardware?

But, who knows: Maybe Nvidia has ported PhysX to OpenCL, too, so it's merely a matter of time.

----------------------------
Thanks, Mike.
__________________
Hail Brothers and Sisters! Coranon Silaria, Ozoo Mahoke
Eta Kooram Nah Smech!

Find Chuck Norris.
Arnold Beckenbauer is online now   Reply With Quote
Old 27-Mar-2009, 00:30   #56
trinibwoy
Meh
 
Join Date: Mar 2004
Location: New York
Posts: 8,432
Default

Quote:
Originally Posted by Arnold Beckenbauer View Post
I presume, yes?
You got me there. To whom?

Quote:
What is CUDA? A "thing", that allows you to get low-level access to Nvidia's hardware?
If that's your definition then OpenCL is no different.

Quote:
What's held Nvidia back to make PhysX run on Radeons? AMD? No, everyone can download the ATi Stream SDK. Ok, its documentation is not as good as CUDA documentation.
Not following you. Does Microsoft make DirectX run on Radeons? Why should Nvidia make PhysX run on Radeons? It's up to AMD to support whichever API's they deem necessary to support. Obviously their unwillingness to support CUDA is driven by competitive and strategic factors and not any inherent technical limitation. That should be pretty obvious.
trinibwoy is online now   Reply With Quote
Old 27-Mar-2009, 00:42   #57
Jawed
Regular
 
Join Date: Oct 2004
Location: London
Posts: 9,135
Send a message via Skype™ to Jawed
Default

Quote:
Originally Posted by trinibwoy View Post
In the general case it should be.
Really, why?

In the example given in the programming guide, 2 fetches of 128 bits are more efficient than 5 fetches of 32 bits, even though the former case wastes 96 bits. Even though the 32-bit fetches are coalesced.

Quote:
That's actually the basis of my admiration for Nvidia's approach.
Did you read the paper I linked in the other thread? That is an embarrassingly parallel kernel that performs embarrassingly badly until carefully vectorised and unrolled. ~5 GFLOPs.

Quote:
If you are forced to futz with multiple-work items per thread because of hardware configuration and not because the algorithm calls for it then it's a lot more than "just a detail". It's a major PITA. As you pointed out there will be specific cases where you need to manage this stuff more closely but I don't think those corner cases are sufficient as a basis for general best practices.
You're always forced to futz according to hardware configuration (apart from vectorisation, little questions like "registers per thread") - then take a view on the variety of hardware you're targetting (like, d'oh, test it!). The difference in performance is normally unignorable

Kernels that auto-tune to the hardware they find themselves on, by evaluating these various optimisation dimensions, are pretty neat.

---

It'll be interesting to see if Havok takes ownership of the OpenCL code that's been implemented. It seems to me it's in their interest to ensure there's some modicum of evenness in the playing field.

Jawed
__________________
Sweet-spot + tick-tock = monster
Jawed is online now   Reply With Quote
Old 27-Mar-2009, 00:53   #58
Jawed
Regular
 
Join Date: Oct 2004
Location: London
Posts: 9,135
Send a message via Skype™ to Jawed
Default

Quote:
Originally Posted by trinibwoy View Post
Why should Nvidia make PhysX run on Radeons?
Like Ageia before it, NVidia's the physics API underdog. Why did NVidia implement it on Cell, Wii and XBox360 (or if you prefer, continues to provide support)? I'm curious to see if NVidia will produce an OpenCL version.

Jawed
__________________
Sweet-spot + tick-tock = monster
Jawed is online now   Reply With Quote
Old 27-Mar-2009, 01:32   #59
Silent_Buddha
Regular
 
Join Date: Mar 2007
Posts: 5,115
Default

Quote:
Originally Posted by mhouston View Post
AMD did the porting for the initial demos for GDC. We took the C functions that underpin the Havok API and ported them to OpenCL, i.e. some runtime OCL code and then the compute loops turned into OCL kernels.
Ah so from the sounds of it, Havok hasn't done much specifically with regards to acceleration on GPU...

But rather, AMD has taken the initiative and ported some Havok funtions to be useable with GPU accerlation through OpenCL?

Or is it a bit more involved than that?

Regards,
SB
Silent_Buddha is offline   Reply With Quote
Old 27-Mar-2009, 01:38   #60
Silent_Buddha
Regular
 
Join Date: Mar 2007
Posts: 5,115
Default

Quote:
Originally Posted by DegustatoR View Post
So it's OpenCL and will run on any hardware. So now NV have Havok in addition to PhysX. And I still don't understand why it's better for AMD to not have PhysX support when NV will support both Havok and PhysX...
Or am I missing something?
Considering PhysX has made virtually no effort at all to optimise in any way for CPUs, it's a wonder any vendors would even consider using it over Havok. Of course, the fact that it will be accerlated on GPU and thus it actually gives devs a choice -- currently. Havok which is well optimized to take advantage of CPUs but has nothing with regards to GPUs OR PhysX that is optimized to take advantage of Nvidia GPUs but virtually nothing for CPU and other vendor GPUs.

If Havok will, in the future, also support GPUs that removes any relevance PhysX would have for developers. As they would not only get superb CPU support but also GPU support.

Or they could stick with PhysX which has superb GPU support for one vendor and CPU support isn't even as good as an afterthought.

The closest analogy I think think for a historical version of something similar to PhysX would be 3dfx's Glide API. Except PhysX isn't nearly as well supported or widely used.

As such I would expect it to die a quick death if a vendor agnostic solution is made available.

Regards,
SB
Silent_Buddha is offline   Reply With Quote
Old 27-Mar-2009, 01:52   #61
trinibwoy
Meh
 
Join Date: Mar 2004
Location: New York
Posts: 8,432
Default

Quote:
Originally Posted by Jawed View Post
Like Ageia before it, NVidia's the physics API underdog. Why did NVidia implement it on Cell, Wii and XBox360 (or if you prefer, continues to provide support)? I'm curious to see if NVidia will produce an OpenCL version.

Jawed
Competitive advantage on stable platforms. Supporting AMD GPU's doesn't help with that. Sure it would accelerate adoption on the PC but for what? The tech is in a nascent stage so adoption rate would be slow regardless. And what happens when AMD rolls out new hardware and deprecates CAL as we know it?

SB I'm pretty sure Nvidia will only push CUDA physx as long as they have the advantage. The moment there is real competition they will port to OpenCL or die. Unless their stuff is faster on the new API in which case they won't care. Bottom line is moving hardware after all.
trinibwoy is online now   Reply With Quote
Old 27-Mar-2009, 02:07   #62
aaronspink
Senior Member
 
Join Date: Jun 2003
Posts: 2,030
Default

Quote:
Originally Posted by trinibwoy View Post
If that's your definition then OpenCL is no different.
Oh, really there is a big difference. What is CUDA? It just a proprietary driver interface.

what is OpenCL? Its an industry standard programming language/API with all major CPU, GPU, and computational hardware and software vendors behind it.



Quote:
Not following you. Does Microsoft make DirectX run on Radeons? Why should Nvidia make PhysX run on Radeons? It's up to AMD to support whichever API's they deem necessary to support. Obviously their unwillingness to support CUDA is driven by competitive and strategic factors and not any inherent technical limitation. That should be pretty obvious.
if they want industry support Nvidia should want Physx running on all hardware available. Physx <> D3D. D3D has a clearly defined API and requirements handled by a combination of consortium and consultation with all the software and hardware developers.

And so what API would ATI support? Physx is a moving target, its interfaces to the hardware are a moving target. It makes no sense for anyone to support Physx until such a time that the hardware interfaces are open and standard.

As far as technical limitations there are numerous ones. Unless the interfaces are all standardized there are no interfaces. For all intents and purposes, physx might as well just be using the register interfaces of the Nvidia hardware. Who's to say that Physx doesn't use proprietary interfaces to run on GPUs? Who's to say that the next revision won't?

The other question is financial, why support a propriety product when there is another product, being implemented to open industry standards with a larger market share and quite honestly better capabilities?
__________________
Aaron Spink
speaking for myself inc.
aaronspink is offline   Reply With Quote
Old 27-Mar-2009, 02:12   #63
trinibwoy
Meh
 
Join Date: Mar 2004
Location: New York
Posts: 8,432
Default

Quote:
Originally Posted by Jawed View Post
In the example given in the programming guide, 2 fetches of 128 bits are more efficient than 5 fetches of 32 bits, even though the former case wastes 96 bits. Even though the 32-bit fetches are coalesced.
It's only more efficient if you're dead set on using that data structure. That approach would take 32 coalesced 128-bit reads per half-warp. On the other hand scalar reads would only require 20 reads if read from 5 scalar arrays (assuming they're nicely bank aligned). Granted it may make sense to use the struct to define a particular data structure but then you're not really forcing the issue.

Quote:
Did you read the paper I linked in the other thread? That is an embarrassingly parallel kernel that performs embarrassingly badly until carefully vectorised and unrolled. ~5 GFLOPs.
Not yet but I will. What was the problem that vectorization solved?

Quote:
You're always forced to futz according to hardware configuration (apart from vectorisation, little questions like "registers per thread") - then take a view on the variety of hardware you're targetting (like, d'oh, test it!). The difference in performance is normally unignorable
Well yeah but the less futzing about the better no? Hopefully devs need to spend less time mucking around with memory management as the architectures mature..
trinibwoy is online now   Reply With Quote
Old 27-Mar-2009, 02:30   #64
mhouston
System Architect, AMD
 
Join Date: Oct 2005
Location: Santa Clara
Posts: 317
Default

Quote:
Originally Posted by Silent_Buddha View Post

Or is it a bit more involved than that?
It's always more complicated than that.
mhouston is offline   Reply With Quote
Old 27-Mar-2009, 03:10   #65
mao5
Member
 
Join Date: Apr 2004
Location: Nanjing, CHINA
Posts: 249
Default

mhouston, I just wonder when will AMD release the Red Woman Cloth demo for 48xx users? What's the running os for it? Windows Vista and a downloaded OpenCL API? or sth alse?
mao5 is offline   Reply With Quote
Old 27-Mar-2009, 03:18   #66
green.pixel
Member
 
Join Date: Dec 2008
Location: Europe
Posts: 536
Default

Quote:
Originally Posted by mao5 View Post
mhouston, I just wonder when will AMD release the Red Woman Cloth demo for 48xx users?
I wonder the same about the Cinema 2.0 demos.
green.pixel is offline   Reply With Quote
Old 27-Mar-2009, 03:42   #67
mhouston
System Architect, AMD
 
Join Date: Oct 2005
Location: Santa Clara
Posts: 317
Default

To release the OpenCL demos, we first need to get through the Khronos OpenCL conformance tests (it's a pretty beefy test suite) and ship a driver and OpenCL runtime. What was shown at GDC was running on a alpha implementation that is not yet fully compliant.

Last edited by mhouston; 27-Mar-2009 at 04:13.
mhouston is offline   Reply With Quote
Old 27-Mar-2009, 04:11   #68
mao5
Member
 
Join Date: Apr 2004
Location: Nanjing, CHINA
Posts: 249
Icon Rolleyes

Quote:
Originally Posted by mhouston View Post
To release the OpenCL demos, we first need to get through the Khronos OpenCL conformance tests (it's a pretty beefy test suite) and ship a driver and OpenCL runtime. What was shown at GDC was running on a alpha implementation that is not fully yet fully compliant.
How long it will take? I haven't download the RV770 Ruby demo yet
mao5 is offline   Reply With Quote
Old 27-Mar-2009, 06:50   #69
CarstenS
Just wondering
 
Join Date: May 2002
Location: Germany
Posts: 1,682
Default

Quote:
Originally Posted by m.fox View Post
I wonder the same about the Cinema 2.0 demos.
That was already answered over here:
http://forum.beyond3d.com/showpost.p...27&postcount=3

Seems to be a conflict with third-party ip (read: the demo belongs to OTOY).
__________________
English is not my native tongue. Before being too nitpicky about my choice of words please consider the possiblity that I did not mean to say what you might have read into them and inquire before flaming.
CarstenS is offline   Reply With Quote
Old 27-Mar-2009, 07:31   #70
DegustatoR
Senior Member
 
Join Date: Mar 2002
Location: msk.ru/spb.ru
Posts: 1,255
Default

Quote:
Originally Posted by Arnold Beckenbauer View Post
Because it's not AMD's job to make a CAL solver for PhysX.
Who's job is it to make something work on AMDs GPUs? NVIDIA's? AMD is shutting itself out of PhysX and CUDA, not NVIDIA. Last time this question was rised NV looked ready to help AMD to implement CUDA backend on their GPUs.

Quote:
Originally Posted by Arnold Beckenbauer View Post
But, who knows: Maybe Nvidia has ported PhysX to OpenCL, too, so it's merely a matter of time.
NV doesn't have any reason to do this. They support GPU acceleration in PhysX, they'll support it in Havok. Why would they spend time and money converting PhysX to OpenCL?
Plus it's not only PhysX that use CUDA. And most of CUDA programs aren't made by NV. Sure they'll all be ported to OpenCL or DXCS sooner or later but while that didn't happen maybe AMD should think again about supporting CUDA?

Quote:
Originally Posted by Silent_Buddha View Post
If Havok will, in the future, also support GPUs that removes any relevance PhysX would have for developers. As they would not only get superb CPU support but also GPU support.
What you shouldn't forget is that Havok is exactly the same as PhysX now -- it's wholly owned Intel technology. I for one wouldn't be surprised if Havok at the time of LRB release drop OpenCL (or just stop developing this solver any futher) and switch to LRB Native. I think it's a given that Intel will optimize Havok GPGPU acceleration for LRB first, everything else later. It's exactly the same as with PhysX. So why do AMD support Havok but doesn't want to support PhysX? Intel is a much bigger threat to them than NV.
DegustatoR is offline   Reply With Quote
Old 27-Mar-2009, 07:42   #71
CarstenS
Just wondering
 
Join Date: May 2002
Location: Germany
Posts: 1,682
Default

Quote:
Originally Posted by DegustatoR View Post
Who's job is it to make something work on AMDs GPUs? NVIDIA's? AMD is shutting itself out of PhysX and CUDA, not NVIDIA. Last time this question was rised NV looked ready to help AMD to implement CUDA backend on their GPUs.
Maybe the CPU-part of Havok is more optimized for multicore-CPUs than Physx'?
__________________
English is not my native tongue. Before being too nitpicky about my choice of words please consider the possiblity that I did not mean to say what you might have read into them and inquire before flaming.

Last edited by CarstenS; 27-Mar-2009 at 08:50. Reason: added quote
CarstenS is offline   Reply With Quote
Old 27-Mar-2009, 08:00   #72
AlexV
Administrator
 
Join Date: Mar 2005
Posts: 1,897
Default

Quote:
Originally Posted by CarstenS View Post
Maybe the CPU-part of Havok is more optimized for multicore-CPUs than Physx'?
That's not saying much, really:P
__________________
A wise man commenting about a popular hero of the peoplez: that dude is so fucking ignorant, he wouldn't know if he was getting assraped by a baboon
AlexV is offline   Reply With Quote
Old 27-Mar-2009, 08:09   #73
neliz
Senile Member
 
Join Date: Mar 2005
Location: In the know
Posts: 3,860
Send a message via ICQ to neliz Send a message via MSN to neliz
Default

Quote:
Originally Posted by CarstenS View Post
Maybe the CPU-part of Havok is more optimized for multicore-CPUs than Physx'?
PhysX was never ment to run properly on a CPU and even PhysX on a GPU is not a full function replacement of the original PPU.
__________________
My views and opinions are my own and do not necessarily represent those of my employer!
neliz is offline   Reply With Quote
Old 27-Mar-2009, 09:46   #74
doob
Member
 
Join Date: May 2005
Posts: 282
Default

Quote:
Originally Posted by trinibwoy View Post
Nvidia fighs the red dress with the blue coat - http://www.youtube.com/watch?v=B3BA4...e=channel_page

Red dress wins IMO.

This particle demo is pretty cool too though: http://www.youtube.com/watch?v=RuZQp...e=channel_page
The blue coat demo is quite un-impressive, i think it comes mainly down to artistic talent/implementation than CUDA/OpenCL/CAL API or HW limitations.
I came across this 1 year old Nurien software demo running on PhysX(Skip to 5:20 to watch/hear more details) and it's not only simulating the dress physics but also the hair, although the video doesnt focus much on it or the presentation itself, and despite being 1 year old looks and seems to be doing more than what the blue coat demo achieved.

But i agree AMD's red dress demo looks better and more accurate, even than this old finding.

*Nurien seemed to be targeted as a social interactive game of sorts avaiable only in S.Korea

Edit: Do'h it's even avaiable at nvidia's PhysX demos
doob is offline   Reply With Quote
Old 27-Mar-2009, 10:05   #75
MDolenc
Member
 
Join Date: May 2002
Location: Slovenia
Posts: 381
Default

Anyone here talking about how PhysX is unoptimized for CPU actually developed anything using that library and as a bonus compared that to Havok?
What standard GPGPU platform was available to implement PhysX when NV bought Ageia? Do you really think that D3D implementation would be more portable then CUDA is (hint: "non standard use of API")?
OpenCL is coming out now. Both NV and ATI are yet to ship their OpenCL drivers and runtimes. Which puts OpenCL in about the stage where CUDA was three years ago.
MDolenc is offline   Reply With Quote

Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 10:24.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.