The one and only Folding @ Home thread

Broken Hope · Apr 8, 2010

Dave Baumann said:
Indeed, and we're all behind their movement to OpenCL.

Which is nice and all but Radeon users still don't have any OpenCL support, despite it being advertised on the box since release last September. The SDK doesn't count because users aren't supposed to be installing that, developers are. If it's not included in Catalyst it may as well not exist in a user perspective.

I actually don't understand what is taking so long for users to get OpenCL support, like has been mentioned in other threads in the past Nvidia has had OpenCL support in their drivers for months, including full ICD support fairly recently.

The Catalyst 10.4 drivers are due in the next few weeks, it's unlikely there will be OpenCL support in those and with ATI tending to be working months ahead on new drivers and no mention of OpenCL support in the future it isn't looking likely users will have access to OpenCL in the near future, I'd bet on it being at least another 6 months.

CarstenS · Apr 8, 2010

I disagree. While it'd be nicer to include the OpenCL ICD into the regular WHQL Catalysts, downloading the SDK is actually no problem for end users any more and installation not any more complicated than installation of normal drivers.

Broken Hope · Apr 8, 2010

CarstenS said:
I disagree. While it'd be nicer to include the OpenCL ICD into the regular WHQL Catalysts, downloading the SDK is actually no problem for end users any more and installation not any more complicated than installation of normal drivers.

Maybe if the SDK was stripped down to just install the required components for OpenCL and was linked on the Catalyst driver page I'd agree with you, but currently if you don't know where to find the SDK you don't get OpenCL support as an end user. But even if you extract the SDK and just install the OpenCL part and don't install the profiler or samples it still takes up something like 80MB which still seems too much for just the OpenCL dll's so I'm guessing there is still development stuff even in just the bare OpenCL installer.

OpenGL guy · Apr 8, 2010

Broken Hope said:
Maybe if the SDK was stripped down to just install the required components for OpenCL and was linked on the Catalyst driver page I'd agree with you, but currently if you don't know where to find the SDK you don't get OpenCL support as an end user. But even if you extract the SDK and just install the OpenCL part and don't install the profiler or samples it still takes up something like 80MB which still seems too much for just the OpenCL dll's so I'm guessing there is still development stuff even in just the bare OpenCL installer.

Are you looking at a 32-bit or 64-bit system? The OpenCL components are about 42 MB for each of 32-bit and 64-bit so if you're on a 64-bit system, you will need about 84 MB of drivers to support 32- and 64-bit apps. That's just for the required runtime components and no extra development stuff.

Broken Hope · Apr 8, 2010

OpenGL guy said:
Are you looking at a 32-bit or 64-bit system? The OpenCL components are about 42 MB for each of 32-bit and 64-bit so if you're on a 64-bit system, you will need about 84 MB of drivers to support 32- and 64-bit apps. That's just for the required runtime components and no extra development stuff.

I'm on a 64-bit system. Good to know, larger than I expected though. So the ATIStreamSDK_dev.exe file that's within the SDK when extracted just installs the bare runtime without extra dev stuff then?

willardjuice · Apr 8, 2010

No, the client isn't artificially limited to a certain number of stream processors.

I of course didn't mean to imply there was a literal cap.

My point was the work load was (more or less) capped (for what ever reason) which in turn only allowed about 320 stream processors to be fully utilized.

OpenGL guy · Apr 8, 2010

Broken Hope said:
I'm on a 64-bit system. Good to know, larger than I expected though. So the ATIStreamSDK_dev.exe file that's within the SDK when extracted just installs the bare runtime without extra dev stuff then?

I don't know exactly what's in there, but I'd expect the executables, libs and headers in a file called "dev". The headers don't take up much space (400k or so). You need most of the libs, even if you're not building OpenCL programs yourself, so ~84MB is about as small as you can get right now.

ShaidarHaran · Apr 9, 2010

Well that pretty much clears the air. Thanks Mike. I was wondering if you could go into a bit more detail what about the current GPU v2 client limits performance on Radeons to a seemingly fixed number of ALUs?

swaaye · Apr 10, 2010

Shaidar, I saw this in his post:

mhouston said:
No, the client isn't artificially limited to a certain number of stream processors. However, smaller work units will not fully utilize newer chips.

Not exactly sure what that means, but it sounds like there is some sort of bottleneck that prevents benefitting from more ALUs.

Overall, OpenCL needs to work out well and the old clients need to go away.

mhouston · Apr 10, 2010

Smaller proteins don't generate enough parallelism to fill the chip.

ShaidarHaran · Apr 10, 2010

mhouston said:
Smaller proteins don't generate enough parallelism to fill the chip.

Is it a compiler issue then? I ask because NV doesn't appear to have the same problem.

mhouston · Apr 10, 2010

Different algorithms are being used on each GPU. Moreover, Nvidia is a narrower architecture with faster ALUs so they can do a little better with less parallelism. For example, I'll bet that the ultra small proteins won't scale all that great from a GT200 to a Fermi because Fermi is a wider chip.

And no, we are not talking about VLIW vs scalar. This is the vector width of each core and the number of cores, i.e. how many work-items you need to be able to put in flight.

ShaidarHaran · Apr 10, 2010

You're talking about the width of the SIMDs themselves, then? i.e. 80 ALUs per SIMD for Radeons and 16/24/32 for Geforces.

Arnold Beckenbauer · Apr 10, 2010

ShaidarHaran said:
You're talking about the width of the SIMDs themselves, then? i.e. 80 ALUs per SIMD for Radeons and 16/24/32 for Geforces.

IIRC there are cases, when 2 or three SIMDs are utilized.

CarstenS · Apr 10, 2010

mhouston said:
For example, I'll bet that the ultra small proteins won't scale all that great from a GT200 to a Fermi because Fermi is a wider chip.

The then-current preview version of F@H which supports the GF100 seems to prove your point:
http://www.pcgameshardware.com/aid,...Fermi-performance-benchmarks/Reviews/?page=18

Roughly between 55 and 69% more performance compared to GTX 285 is less than what could have hoped for with all the fancy GPU-Computing-Stuff inside Fermi.

mhouston · Apr 10, 2010

ShaidarHaran said:
You're talking about the width of the SIMDs themselves, then? i.e. 80 ALUs per SIMD for Radeons and 16/24/32 for Geforces.

It's not the ALUs, but the width of the SIMDs and the number of wavefronts/warps in flight per SIMD, and the number of SIMDs. Moreover, Nvidia runs an N^2/2 algorithm and ATI is running an N^2 algorithm. The N^2 algorithm also aligned well with the Brook programming model (streaming). Part of the algorithm choice was to try to scale out better, but restrictions on earlier hardware and the programming model also drove that choice.

ShaidarHaran · Apr 10, 2010

Got it. Thanks Mike.

larrabee · Apr 11, 2010

mhouston said:
Different algorithms are being used on each GPU. Moreover, Nvidia is a narrower architecture with faster ALUs so they can do a little better with less parallelism. For example, I'll bet that the ultra small proteins won't scale all that great from a GT200 to a Fermi because Fermi is a wider chip.

And no, we are not talking about VLIW vs scalar. This is the vector width of each core and the number of cores, i.e. how many work-items you need to be able to put in flight.

this is even true today. on some of the new smaller wu's you might see more ppd than the average size wu's on a g92 based card but you lose ppd on gt200 cards. the inverse is true for large wu's. on the SW side running smaller wu's is important too. they recently proved that their algorithm can accurately simulate proteins on a millisecond timescale, a pretty big feat.

here it is:
http://www.youtube.com/watch?v=gFcp2Xpd29I

Arnold Beckenbauer · May 26, 2010

F@H GPU3 client beta is out: http://folding.typepad.com/news/2010/05/open-beta-release-of-the-gpu3-clientcore.html

While this release is for NVIDIA only to start, we are actively pushing ATI support (with the help of AMD/ATI), although we have no ETA at the moment.

Why it's so hard to say, that this client uses CUDA? And they are pushing OpenCL support for all and not only for ATi?

Mintmaster · Dec 22, 2010

OpenCL update

I stumbled upon this recently:
http://foldingforum.org/viewtopic.php?f=51&t=17055

The one and only Folding @ Home thread

Broken Hope

CarstenS

Moderator

Broken Hope

OpenGL guy

Broken Hope

willardjuice

super willyjuice

OpenGL guy

ShaidarHaran

hardware monkey

swaaye

Entirely Suboptimal

mhouston

A little of this and that

ShaidarHaran

hardware monkey

mhouston

A little of this and that

ShaidarHaran

hardware monkey

Arnold Beckenbauer

CarstenS

Moderator

mhouston

A little of this and that

ShaidarHaran

hardware monkey

larrabee

Arnold Beckenbauer

Mintmaster

Similar threads