Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 27-May-2013, 22:48   #26
moozoo
Junior Member
 
Join Date: Jul 2010
Posts: 94
Default

Apparently amd graphics drivers require the objects to be shared to be created after both the gl and cl contexts have been made.
I've done this in v1.02 but can not test it.
moozoo is offline   Reply With Quote
Old 27-May-2013, 23:45   #27
Alexko
Senior Member
 
Join Date: Aug 2009
Posts: 2,906
Send a message via MSN to Alexko
Default

Sorry, no difference with my HD 6950.
__________________
"Well, you mentioned Disneyland, I thought of this porn site, and then bam! A blue Hulk." —The Creature
My (currently dormant) blog: Teχlog
Alexko is offline   Reply With Quote
Old 28-May-2013, 02:14   #28
moozoo
Junior Member
 
Join Date: Jul 2010
Posts: 94
Default


Thanks Alexko
moozoo is offline   Reply With Quote
Old 28-May-2013, 19:47   #29
Davros
Senior Member
 
Join Date: Jun 2004
Posts: 11,075
Default

"Apparently amd graphics drivers require the objects to be shared to be created after both the gl and cl contexts have been made"

But other opencl drivers dont ?
whats the correct behaviour ?
have you submitted a bug report
__________________
Guardian of the Bodacious Three Terabytes of Gaming Goodness™
Davros is offline   Reply With Quote
Old 29-May-2013, 07:22   #30
moozoo
Junior Member
 
Join Date: Jul 2010
Posts: 94
Default AMD APP SDK Release Notes

It is in the AMD APP SDK Release Notes for 2.8

From those notes
-------------
For OpenGL interoperability with OpenCL, there currently is a requirement on when the
OpenCL context is created and when texture/buffer shared allocations can be made. To use
shared resources, the OpenGL application must create an OpenGL context and then an
OpenCL context. All resources (GL buffers and textures) created after creation of the OpenCL
context can be shared between OpenGL and OpenCL. If resources are allocated before the
OpenCL context creation, they cannot be shared between OpenGL and OpenCL.
--------------------

I've only been reading the opencl spec , looking at other example code and debugging against the platforms I have access to till now.

I believe the 1.02 version of my program is now doing this as per AMD's requirement. So there must be something else going wrong.

The opencl compile error related to
#pragma OPENCL EXTENSION cl_khr_gl_sharing : enable
is a bug
http://devgurus.amd.com/thread/155539
moozoo is offline   Reply With Quote
Old 31-May-2013, 15:22   #31
moozoo
Junior Member
 
Join Date: Jul 2010
Posts: 94
Default

uploaded v1.03

More stumbing in the dark to get amd cards working.
moozoo is offline   Reply With Quote
Old 31-May-2013, 15:33   #32
Lightman
Senior Member
 
Join Date: Jun 2008
Location: Torquay, UK
Posts: 1,160
Default

Quote:
Originally Posted by moozoo View Post
uploaded v1.03

More stumbing in the dark to get amd cards working.
It's better as it shows all the asteroids straight after opening, but still is not updating screen when running simulation.
Lightman is offline   Reply With Quote
Old 06-Jun-2013, 16:43   #33
moozoo
Junior Member
 
Join Date: Jul 2010
Posts: 94
Default 1.04

version 1.04

Lots of changes
1) converts double4 to glfloat4 for display (calculations are still all in double)
2) rewrote device selection logic
3) cleaned up error handling
4) added frames per second
5) turned off vsync and runs during idle as well as per interval time
6) changed relevant kernels to use fma instruction
7) might work with cl_amd_fp64 only devices (Again I have no access to one)
It is much faster now
I'm getting 220fps on the default options.

Lightman, shoudl you try again, could you check the device being used in the about box.
With apu systems with a discrete gpu I don't have a way of selecting between the gpu's at the moment.
It does save and read the last device vendor id to the registry. So if you have the device vendor id you can try directly updating it under HKEY_CURRENT_USER\Software\OpenCLSolarSystem
moozoo is offline   Reply With Quote
Old 06-Jun-2013, 18:46   #34
fellix
Senior Member
 
Join Date: Dec 2004
Location: Varna, Bulgaria
Posts: 3,028
Send a message via Skype™ to fellix
Default

Quote:
Originally Posted by moozoo View Post
GTX580 @ 825MHz GPU clock: ~250 FPS with the default settings, except for the # of bodies with mass set at maximum.
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic.
Microsoft: Russia -- Big and bloated.
Linux: EU -- Diverse and broke.
fellix is offline   Reply With Quote
Old 06-Jun-2013, 21:19   #35
Alexko
Senior Member
 
Join Date: Aug 2009
Posts: 2,906
Send a message via MSN to Alexko
Default

Same settings as Fellix, about 160FPS* with my HD 6950: works like a charm this time.

I'm going to have some very nerdy fun with this.


*With a whole bunch of stuff running in the background.
__________________
"Well, you mentioned Disneyland, I thought of this porn site, and then bam! A blue Hulk." —The Creature
My (currently dormant) blog: Teχlog
Alexko is offline   Reply With Quote
Old 06-Jun-2013, 22:03   #36
Lightman
Senior Member
 
Join Date: Jun 2008
Location: Torquay, UK
Posts: 1,160
Default

Works!

If it wasn't clear, my HD 7970 is in i5 2500K rig without active HD2000, so only CPU CL and Tahiti CL devices are present.

Anyway here are my findings using HD 7970 1050/1425:

1. Default setting produces 652FPS and only 31% GPU utilization
2. Maximum body setting produces 68FPS and 80% GPU utilization
3. Maximum body and maximum Number with Mass produces 8FPS and 97% GPU utilization
4. Default settings except Number with Mass set to maximum produced 199FPS and 59% GPU utilization

Note: all other options were not changed and default view used.

Tahiti is a bit quicker in DP match than my CPU ...

PS. AMD CPU option doesn't work, still uses GPU.

Last edited by Lightman; 06-Jun-2013 at 22:09.
Lightman is offline   Reply With Quote
Old 07-Jun-2013, 08:29   #37
Davros
Senior Member
 
Join Date: Jun 2004
Posts: 11,075
Default

so what was the problem ?
__________________
Guardian of the Bodacious Three Terabytes of Gaming Goodness™
Davros is offline   Reply With Quote
Old 07-Jun-2013, 13:21   #38
moozoo
Junior Member
 
Join Date: Jul 2010
Posts: 94
Default

Thank you, everyone for helping me.

It would be best not to benchmark with maximum Number with Mass since bodies with mass are put into __constant memory and the amount of constant memory varies between graphics cards.

The maximum number of bodies in the initial.bin file might change. i.e. more asteroids are being found.

So.. I'd recommend benchmarking with defaults + number of bodies 309760 and number with mass 384. on my GTX 260 I get ~5.25 fps
I will look at adding a Benchmark option to the Menu that use these settings.

Only 350ish of the asteroids have realistic masses for them. Most are educated guesses based on their type and estimated size (as per JPL's DE405). With the rest I assigned a small mass to them.
The total mass of the asteroid belt is only about 4% of the mass of the moon. Think of all those asteroids as the talcum powder of the solar system compared to the sun and planets.

>so what was the problem ?
no way of working that out. I made lots of changes and couldn't test as I went.

I realised that 1.04 still won't attempt to run on devices that only have cl_amd_fp64. I'll fix that in the next version.

>PS. AMD CPU option doesn't work, still uses GPU.
Thanks I'll look though the code for that again.

>I'm going to have some very nerdy fun with this.

Export to a SLF, edit it with a text editor and Import it (takes ages).
If you are happy with it , use Saved Inital to save it to a .bin file.
It is much faster to load it from a .bin file.
The file format is documented in the Solex readme.
I only implemented importing the format Solex outputs.

You can set the Time Delta to 1 day and the Intergrator to "Adams Bashforth Moulton 8" and time will past faster but the accurary of the simulation will be lower.
Note some combinations are unstable (bodies bounce about and then fly off), This isn't a bug as such, but a limitation of the intergration algorithm.
moozoo is offline   Reply With Quote
Old 07-Jun-2013, 14:19   #39
Lightman
Senior Member
 
Join Date: Jun 2008
Location: Torquay, UK
Posts: 1,160
Default

With your benchmark settings of 384 bodies with mass and 309760 number of bodies I'm getting 50FPS with 84% GPU utilization. There are moments when it jumps to 52-53FPS and GPU utilization to 87% but I can't pin point why this is happening.
Lightman is offline   Reply With Quote
Old 07-Jun-2013, 20:24   #40
Alexko
Senior Member
 
Join Date: Aug 2009
Posts: 2,906
Send a message via MSN to Alexko
Default

Quote:
Originally Posted by moozoo View Post
Thank you, everyone for helping me.

It would be best not to benchmark with maximum Number with Mass since bodies with mass are put into __constant memory and the amount of constant memory varies between graphics cards.

The maximum number of bodies in the initial.bin file might change. i.e. more asteroids are being found.

So.. I'd recommend benchmarking with defaults + number of bodies 309760 and number with mass 384. on my GTX 260 I get ~5.25 fps
24~25 FPS on my HD 6950.

Quote:
Originally Posted by moozoo View Post
Export to a SLF, edit it with a text editor and Import it (takes ages).
If you are happy with it , use Saved Inital to save it to a .bin file.
It is much faster to load it from a .bin file.
The file format is documented in the Solex readme.
I only implemented importing the format Solex outputs.

You can set the Time Delta to 1 day and the Intergrator to "Adams Bashforth Moulton 8" and time will past faster but the accurary of the simulation will be lower.
Note some combinations are unstable (bodies bounce about and then fly off), This isn't a bug as such, but a limitation of the integration algorithm.
I'll try that, thanks.
__________________
"Well, you mentioned Disneyland, I thought of this porn site, and then bam! A blue Hulk." —The Creature
My (currently dormant) blog: Teχlog
Alexko is offline   Reply With Quote
Old 07-Jun-2013, 21:13   #41
fellix
Senior Member
 
Join Date: Dec 2004
Location: Varna, Bulgaria
Posts: 3,028
Send a message via Skype™ to fellix
Default

Quote:
Originally Posted by moozoo View Post
So.. I'd recommend benchmarking with defaults + number of bodies 309760 and number with mass 384. on my GTX 260 I get ~5.25 fps

Quote:
Originally Posted by moozoo View Post
I will look at adding a Benchmark option to the Menu that use these settings.
That would be very useful. A bunch of dedicated CLI options would suffice, too.
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic.
Microsoft: Russia -- Big and bloated.
Linux: EU -- Diverse and broke.
fellix is offline   Reply With Quote
Old 23-Jun-2013, 01:26   #42
moozoo
Junior Member
 
Join Date: Jul 2010
Posts: 94
Default

Quote:
Originally Posted by moozoo View Post
I will need to use a amd graphics card to debug this.
I plan to buy a new system once Haswell is released early next month and I will include a AMD HD7970Ghz edition graphics card.

............

My development so far has been on just getting it to work.
I don't have access to anything that can do kernel profiling until I buy new hardware. So the kernels aren't optimised.
I have my new system , I now have a HD 7770 . Anyway I believe this will let me do everything I need. I have done some initial profiling but it will take me awhile to get up to speed on it. On the flip side I'm going to lose the system with the nvidia card ( it's my ex's) so I'm going to have the reverse problem lol.
moozoo is offline   Reply With Quote
Old 23-Jun-2013, 14:15   #43
Lightman
Senior Member
 
Join Date: Jun 2008
Location: Torquay, UK
Posts: 1,160
Default

Quote:
Originally Posted by moozoo View Post
I have my new system , I now have a HD 7770 . Anyway I believe this will let me do everything I need. I have done some initial profiling but it will take me awhile to get up to speed on it. On the flip side I'm going to lose the system with the nvidia card ( it's my ex's) so I'm going to have the reverse problem lol.
Good to hear you've got GCN now and a bit of a shame you loosing green camp card. Still, I'm curious how much extra performance you will find in GCN
Lightman is offline   Reply With Quote
Old 16-Jul-2013, 14:38   #44
moozoo
Junior Member
 
Join Date: Jul 2010
Posts: 94
Default

Released V1.05
Added a acceleration kernel that uses local memory to cache gravitational bodies
Removed a lot of unnecessary synchronization.
Added Stereoscopic support (AMD + HDMI monitor) The i,I keys adjust the eye separation.
Added 2560 and 65536 bodies options.
Uploaded the roguestar.bin file

Cross vendor gl sharing seems broken now. i.e. amd gl with amd opencl and intel gl with intel opencl (cpu) works, but amd gl with intel opencl (cpu) doesn't work.

I can no longer test Nvidia opencl. If anyone could test its not broken, I would appreciate it.
moozoo is offline   Reply With Quote
Old 16-Jul-2013, 16:48   #45
fellix
Senior Member
 
Join Date: Dec 2004
Location: Varna, Bulgaria
Posts: 3,028
Send a message via Skype™ to fellix
Default





* GTX580
* 314.22 WHQL
* Win8 64-bit
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic.
Microsoft: Russia -- Big and bloated.
Linux: EU -- Diverse and broke.
fellix is offline   Reply With Quote
Old 16-Jul-2013, 22:03   #46
Lightman
Senior Member
 
Join Date: Jun 2008
Location: Torquay, UK
Posts: 1,160
Default

Great job!
It's about 10% faster than previously on GCN
Lightman is offline   Reply With Quote
Old 16-Jul-2013, 22:23   #47
lanek
Senior Member
 
Join Date: Mar 2012
Location: Switzerland
Posts: 1,186
Default

I have not really got much time for test, but seems no problem with 7970's. CAT 13.6 b2.

( only usage stay low, but i had CFX enabled, and ofc , windowed only load one gpu. )
lanek is offline   Reply With Quote
Old 17-Jul-2013, 02:39   #48
moozoo
Junior Member
 
Join Date: Jul 2010
Posts: 94
Default

Thanks fellix, sorry about that, could you try the Release binary at
http://moozoo.dyndns.org/OpenCLSolarSystem/

Lightman, From what I can tell the acceleration kernel is the bottle neck and its at 100% ALU Busy... So I don't think I can optimise the kernels without sacrificing accuracy which I don't want to do. The version that caches positions in local memory runs slower than the one that doesn't.

At the moment I'm copying old values to new. I can avoid that by just switching pointers. That would cut out the buffer copies.

When AMD release their next cards (October?) I'm going to implement multiple card support.

About Stereoscopic support, the driver switches the monitor to stereoscopic mode when it sees me requesting stereo in the window's pixel format. I don't explicitly control it. When the program exits, the driver doesn't switch the monitor out of stereoscopic mode. Hence you end up with your monitor stuck in stereoscopic mode.
One workaround is to run a program that does know how to switch the monitor (via DirectX) and then exit it. I use Tridef's Multimedia player.
Note I've only tested stereoscopic mode under windows 8. Under Windows 7 the application might need to go into full screen mode, which my program currently does not do.
moozoo is offline   Reply With Quote
Old 21-Jul-2013, 12:56   #49
moozoo
Junior Member
 
Join Date: Jul 2010
Posts: 94
Default

Well I released 1.051 that should work with nvidia
moozoo is offline   Reply With Quote
Old 21-Jul-2013, 13:42   #50
fellix
Senior Member
 
Join Date: Dec 2004
Location: Varna, Bulgaria
Posts: 3,028
Send a message via Skype™ to fellix
Default

Quote:
Originally Posted by moozoo View Post
Well I released 1.051 that should work with nvidia
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic.
Microsoft: Russia -- Big and bloated.
Linux: EU -- Diverse and broke.
fellix is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 21:22.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.