The one and only Folding @ Home thread


Buttons on your underwear?

Maybe you should stop worrying about epeening with PPD and just actually do something useful with your computer.

According to whose standards? Computers are tools, to be used as seen fit. I use mine to surf the internet mostly, play games occasionally, watch movies every so often, and fold during the winter when it's cold enough to handle the extra heat from my PCs. You should be thankful I'm also using Intel CPUs for running the CPU client ;)
 
Indeed, and we're all behind their movement to OpenCL.
That's good to hear. :) You don't have any estimate of an ETA that you are allowed and would like to share?

I hope the new client won't get delayed like some other ambitious project that was just recently introduced officially. ;)
 
I imagine the GPU v2 client was written and optimized for both IHVs hardware at approximately the same time. The installation executable for the GPU V2 client supports both ATi & NV hardware.
The GPU2 client was introduced with Brook+ support for AMD some months prior to NVIDIA support.

And I agree - mods, can you please split this part of the discussion from this thread?
 
Do you blame the TV manufacturer when a channel's quality isn't up to the other channel's quality? Or do you (should you) blame your channel provider? Now, if none of your channels have good quality then you might have a case against the TV manufacturer.

This conversation has mostly nothing to do with AMD hardware, only that Shaider is upset his F@H doesn't run better on his AMD GPUs. I think the answer to this frustration has been thoroughly acknowledged: the ATI client is old, clunky and broken and is therefore not a good direct representation of the ability of AMD's newer generation GPUs.

Someone said something about an OpenCL client. It's great to have a standardized specification but that doesn't mean the code you write will be automatically optimized for all architectures. Each architecture as tradeoffs and design decisions, which have to be accounted for when trying to write optimized code.

Lastly, I still honestly do think Brook+ has it's place. There are apps that still run much better using Brook+ than they do OpenCL. Now I'm sure that mostly has to do with the infancy of the AMD OpenCL compiler, but it's still true. Also, the dev time is much lower in Brook+, MUCH lower.
 
So?




Maybe you should stop worrying about epeening with PPD and just actually do something useful with your computer.

Searching for cures to cancer and other stuff isn't useful? My grandmother with althiemers would love to bitch slap you if she could. May you NEVER have a family member who suffers from an ailment that might have a cure found with its roots going back to F@H.

If the above offends, I care not. Saying F@H is useless when its use it to help find cures for cancer and other major ailments that can affect millions of people world wide is just plain hateful.
 
Last edited by a moderator:
Well then, honestly, that makes this conversation even stranger. So what reference do you have?

Which games have broken AA? I admit I only play casually and a very small number of games, hence the question (4870 here and no issues with AA in any game).

Search AMD's own forums @ game.amd.com for numerous examples. Mostly games with deferred rendering (lighting) engines have this issue. Generally speaking you can at least force AA in these games through Nvidia's control panel, but there is no such option with ATi cards.
 
Yes, that is my understanding of the current situation WRT the GPU client V2. So we're back at the beginning again. NV hardware is faster than ATi hardware for F@H.
No, the current CUDA client is faster than the Brook based client. It doesn't say a lot about the hardware capabilities.
The topic of this thread is actually the "ATI graphics architecture" and its possible advantages. In my book that concerns mostly the hardware capabilities, not how well a certain client for a certain DC project is suited to the proteins they are folding currently there. Besides that, F@H is an isolated example (someone linked here some performance numbers of three other DC projects, where ATI GPUs simply destroy anything nvidia has to offer).
If you look it up, the thread starter wanted to discuss how design decisions like the VLIW structure affects the performance and the efficiency of the GPU. You are miles away from the topic of the thread.
 
I don't think you can view Fermi in a vacuum for this reason (CUDA vs Brook, etc). The developer story does matter, and you can't view HW in isolation, you have to look at the tools, drivers, middleware, and HW. We do exactly this when comparing cards based on *current* game workloads, so despite what AMD HW may ultimately be capable of, you can't brush aside issues of tooling and ease of development, which may have effects on what kinds of future workloads are written, vis-a-vis F@H for example, the fact that CUDA isn't available for AMD, and there's no OpenCL version is not Fermi's fault. You can only compare what exists.
 
They're getting the same work units and outputting the same results. The only difference I'm aware of is the fact that ATi needs to redo some calculations.
So we're back at the beginning again. NV hardware is faster than ATi hardware for F@H.
Ugh. Are you purposely being dense? Did you not read my sorting example?

This is a SOFTWARE issue. They happened to write GPU2 when ATI had limitations in their HW and NVidia didn't. It's just like when they happened to write GPU1 when NVidia's hardware couldn't do anything useful. No further work has gone public. NVidia just got lucky with its timing.

This is a non-profit organization. They don't care that ATI GPUs aren't working at full tilt with GPU2, because they have lots of WUs that only work on CPUs anyways and have plenty of GPUs and PS3s flying through the rest. As aaronspinks aptly said, it's just an epeen contest. Their efforts are best spent in writing new code for GPUs that will do as much work as possible.
XMAN26 said:
Searching for cures to cancer and other stuff isn't useful? My grandmother with althiemers would love to bitch slap you if she could.
If someone at F@H comes out and says "Geez, if only AMD GPUs ran faster, our research wouldn't be stalled", then you can make this claim. Since they can't even be bothered to take advantage of Cypress' extra shader units, let alone its local memory, and don't give GPUs or PS3s as many points as a CPU for the same work due to their limitations (see here), I'm pretty sure they have more than enough GPU horsepower available.
 
I don't think you can view Fermi in a vacuum for this reason (CUDA vs Brook, etc). The developer story does matter, and you can't view HW in isolation, you have to look at the tools, drivers, middleware, and HW.
This really looks like a timing issue than a tools issue to me. Nothing has gone public since GPU2, and they haven't used CUDA to make NVidia's client able to work on more types of WUs.
 
Ugh. Are you purposely being dense? Did you not read my sorting example?

Yes, and I don't doubt you have more experience in this field than I, I am not a programmer, I'm just a PC tech. I stand by my previous statement, however. If the client is the same, the work units are the same, the output is the same, and the only *known* differences are that ATi GPUs have to redo some calculations (due to cache limitations) and are apparently arbitrarily limited to a rather low ALU utilization, how is that a hugely different algorithm?

This is a SOFTWARE issue.

So the lack of cache which forces the recalculation of certain operations is a software issue?

They happened to write GPU2 when ATI had limitations in their HW and NVidia didn't. It's just like when they happened to write GPU1 when NVidia's hardware couldn't do anything useful. No further work has gone public. NVidia just got lucky with its timing.

Each IHV contributed to the project though. It's not as though Nvidia stepped in and said "we'll take it from here" and locked ATi out of the process.

This is a non-profit organization. They don't care that ATI GPUs aren't working at full tilt with GPU2, because they have lots of WUs that only work on CPUs anyways and have plenty of GPUs and PS3s flying through the rest.

They want as many *useful* FLOPS as they can get. A re-write of the client would benefit both parties, more so ATi.

As aaronspinks aptly said, it's just an epeen contest.

Ostensibly there is some useful work being done here, but I suppose that's open to interpretation.

Their efforts are best spent in writing new code for GPUs that will do as much work as possible.

I agree.

If someone at F@H comes out and says "Geez, if only AMD GPUs ran faster, our research wouldn't be stalled", then you can make this claim.

I don't think anyone's making this claim, including myself. However, in this case there's no such thing as too many FLOPs, and the more FLOPs the faster the problem can be solved, with greater precision, and the greater variety of individual problems that can be solved.

Since they can't even be bothered to take advantage of Cypress' extra shader units, let alone its local memory, and don't give GPUs or PS3s as many points as a CPU for the same work due to their limitations (see here), I'm pretty sure they have more than enough GPU horsepower available.

I don't think it's a matter of Pande Group "can't be bothered", they're a very small outfit. IIRC it's literally one guy writing the code. They really do need all the help they can get from the IHVs.
 
This is a non-profit organization. They don't care that ATI GPUs aren't working at full tilt with GPU2, because they have lots of WUs that only work on CPUs anyways and have plenty of GPUs and PS3s flying through the rest. As aaronspinks aptly said, it's just an epeen contest. Their efforts are best spent in writing new code for GPUs that will do as much work as possible.
If someone at F@H comes out and says "Geez, if only AMD GPUs ran faster, our research wouldn't be stalled", then you can make this claim. Since they can't even be bothered to take advantage of Cypress' extra shader units, let alone its local memory, and don't give GPUs or PS3s as many points as a CPU for the same work due to their limitations (see here), I'm pretty sure they have more than enough GPU horsepower available.

You are aware that alot of people who do F@H do it because they have a family member or know someone who has cancer or an illness that F@H might find help find a cure to right? Now I'm not gonna speak for them, but I personally could give a damn rats ass about the points, I'm doing it because of family. Hell, until I knew what F@H was about I neve even bothered with it as I had my machine doing Seti@home.

Also my comment was to aaronspink and his "do something more useful with your computer" comment in regards to F@H. Helping to possibly find a cure to something that one day I know will be affecting my mother(I believe it has already started to) is an extremely useful way to use my rig. I'm not sure about Shaider, but he may also feel the same way while also caring about the points.
 
I just wonder though how beneficial FAH really is. Has it accomplished anything really significant in the years that its been burning thru megawatts of power? It's actually rather inefficient in some ways because they run each work unit many times for data precision. Their testing environment is anything but ideal.

Donations to a fund might be more beneficial than FAH to curing illnesses. Who knows.


Shaidar, my impression is that the AMD client is very unoptimized for current AMD GPUs. It's designed for R600/RV670, I think. The NV client is just better at working with the hardware probably because when it was written NV had their GPGPU architecture more figured out than when the AMD client was written. I think that's what Mintmaster meant.
 
Last edited by a moderator:
I just wonder though how beneficial FAH really is. Has it accomplished anything really significant in the years that its been burning thru megawatts of power? It's actually rather inefficient in some ways because they run each work unit many times for data precision. Their testing environment is anything but ideal.

Just a random check on their site shows that the folding at home project has generated 72 scientific papers and their group 158 since 2000 or so. Checking references for those papers show that several papers have been referenced over 200 times and many over 100, so the papers are obviously getting used in the scientific community.

As of last year, they were reporting on some possible discovered drug pathways for Alzheimer's that they were going to be publishing soon. They have also made discoveries regarding Huntington's disease and antibiotics.

I would count them as a very productive experiment from a scientific view point. Sure, they haven't cured cancer yet - but they have greatly increased knowledge about the way many of these diseases work.
 
Also my comment was to aaronspink and his "do something more useful with your computer" comment in regards to F@H. Helping to possibly find a cure to something that one day I know will be affecting my mother(I believe it has already started to) is an extremely useful way to use my rig. I'm not sure about Shaider, but he may also feel the same way while also caring about the points.

The GPU client is extremely limited and the things that are going to actually give the breakthroughs are the dedicated protein folding machines that can calculate things not only at higher detail but at higher speeds and more accurately. The amount of work being done by F@H is certainly nice, but it isn't going to lead to some major breakthrough. Its like everyone running miniature colliders in their back yards. Sure its cool, and allows some science to be done, but even with 10 billion of them, its not going to give more real science than CERN/LHC or the SCSC (if they had built it).
 
NVidia just got lucky with its timing.
Could remember it wrong, but I seem to remember that there were two NV-guys helping out at Stanford, maybe in a more directly product-related way than AMDs Mike Houston did, who basically did - to the best of my knowledge - help breaking ground in this area.
 
Everyone take a deep breath. I know this can touch a nerve with people and I'm surprised at the venom that always seems to crop up, but I can understand the passion. This is a project about scientific exploration of nasty things, and it's great when people and companies donate resources if they are able and willing. And before people question my personal commitment, I'll repeat again that parkinson's disease runs in my family and is one of the things F@H is targeting for research, so I have my own selfish reasons to be engaged. Moreover, there are people at AMD (and Nvidia, and many other places) that have battled with cancer and other diseases being studied by F@H, and/or have family members that have. Now, having said that:

Yes we are working with Stanford. It's not a trivial porting effort to move a large code base as well as bring up some new stuff at the same time. Stanford has carried a lot of this load with our help. It's not like we can do all the code on our own since we frankly do not have the expertise that the researchers at Stanford have, however we can suggest algorithm tradeoffs, high level tuning advice, API/language clarifications, etc. Stanford and AMD have been engaged, and I'm sure Nvidia has been as well, and possibly others. OpenCL covers a larger range of devices, so debugging can be interesting as things may work on some devices and not on others. Bugs on all sides, race conditions in code/logic errors, compiler/runtime bugs, API/language funkiness, etc., but progress is being made. There will be tuning work to do as well, but I think everyone is hopeful of the doors this may open.

As far as the older client, my post explaining some of the differences in implementations was already quoted in this thread (soon to be a break-off thread?), so I won't rehash too much. No, the client isn't artificially limited to a certain number of stream processors. However, smaller work units will not fully utilize newer chips. In the GPU2 era, AMD and Nvidia run very different algorithms achieving the same result. As I have stated before, and the data in in the publications, for really large proteins the performance distance closes much more, however, really large proteins are also really hard on the system (UI can get really laggy). Nvidia's implementation and hardware had an advantage on smaller proteins ("narrower" machine with higher ALU clocks) and we were more competitive with bigger stuff ("wider" machine with lower ALU clocks), despite an algorithmic disadvantage. We couldn't support some of the other algorithmic variants all that well on R6XX that may have performed better. R7XX was more flexible, but still had restrictions that made it tricky and the Brook model was showing it's limits.

The hope is that with OpenCL, there will be a more unified code base that all vendors can work on and Stanford can concentrate on the science and new algorithms for advancement of the field.
 
If the client is the same, the work units are the same, the output is the same, and the only *known* differences are that ATi GPUs have to redo some calculations (due to cache limitations) and are apparently arbitrarily limited to a rather low ALU utilization, how is that a hugely different algorithm?
I am not a programmer either, but it's apparent here, that the algorithm must be different when two pieces of hardware use different approaches to the solution of a given problem. It's a bit like the comparison CPU and GPU: They both can 'do folding' but do it quite differently. But since it has been made pretty clear, how limited the use is GPU2 makes of current hardware capabilities for Radeons, performance comparisons based on that are somewhat moot and would not belong to this topic.

But they shed light on another point, though, which strengthens your view (i imagine). If you're early to market with a certain feature set, that may cost transistors, thus die space and thus be an investment, chances are new programs or even generic algorithms may be tailored to the way you've implemented in hardware.

And, to not be off-topic wholly here: I think one point regarding architectural advantages of AMD hasn't been mentioned yet in this thread. With a large number of titles today being cross-platform developed between PC and Xbox360 (more so than between PC and PS3 or Xbox360 and PS3), programmers tend to already have used quite optimized tools to take advantage of a 5-way VLIW architecture at a basic level. Though current Radeon GPUs are certainly much more capable even on a per ALU level than the processor used in Xbox 360. That's one of the reasons, btw, why I have my doubts that AMD will cut a lane from the 5-way design in one of the Islands generation.


edit:
Mike, now that we've succeeded in grabbing your attention ;) How would you see the chances that also CPU clients use OpenCL and that both types of processors work at the same kind of problems?
 
I
edit:
Mike, now that we've succeeded in grabbing your attention ;) How would you see the chances that also CPU clients use OpenCL and that both types of processors work at the same kind of problems?

OpenCL gives you that type of portability. I'm not sure what Stanford's plans are here as the SMP and uniprocessor clients are designed to be good at a different set of algorithms than the GPUs. I think it might be an interesting experiment to see how things can be tuned for multi-core as well, and of course mixing CPUs and GPUs instead of jamming all the compute into one or the other. And of course there is the IBM Cell OpenCL implementation, and I'm sure other hardware vendors will have CL implementations as well. The other thing to remember is that OpenCL runs on Windows, Linux, OSX as well as x86, CPU, GPU. So it has the potential to simplify codebases and provide portability. Tuning will vary especially when going for all out performance (i.e. when the companies gear up for battle, and or Scott and I (and/or others) get each other riled up). :devilish:
 
Thank you M. Houston. You've cleared up all the problems in this thread and its good to see such a matter resolved in such a satisfying manner, so often are we left hanging by the vaguearies of NDAs etc. Now I wait in anticipation for an improved OpenCL FAH client in order to grow my epeen to a size worthy of sporting an advanced ATI graphics card.

Just remember, Dave now owes you some cake, don't let him tell you otherwise! :D
 
Back
Top