|
|
#1 |
|
Resident Sasquatch
|
http://www.techpowerup.com/reviews/P...D_4830/20.html
Why is the GF9600 ahead of 4870? This is odd, I'd say. |
|
|
|
|
|
#2 |
|
Naughty Boy!
|
Well, I've always had this theory that ATi's instructionset wasn't very efficient, and graphics were actually a pretty good case, many GPGPU tasks would be worse.
Perhaps that's what we're seeing now, nVidia's scalar threading approach paying off. |
|
|
|
|
|
#3 |
|
Regular
|
I think it's nothing more than the very low performance of Brook+ since the compiler isn't "optimised" apparently.
Also, because ATI hardware is the reference for ATI and NVidia performance, any deficit in ATI performance below what the hardware is theoretically capable of is a multiplier for the performance of NVidia. In other words, if the ATI hardware when running optimally compiled F@H is taken as the reference, but Brook+ code can only achieve 50% of that, then NVidia hardware automatically gets a free 2x multiplier if comparing NVidia points against ATI points. It's then a matter of how efficiently coded the NVidia core is, in terms of what that hardware is theoretically capable of. Jawed |
|
|
|
|
|
#4 |
|
Member
|
I am under the impresion that Ati Fah client is just made to work with the 4800 series, and isn't optimized .. Am I at least partially correct?
|
|
|
|
|
|
#5 |
|
Gamerscore Wh...
Join Date: Jan 2002
Posts: 12,196
|
Mainly things are single thread CPU bound so we're not using the graphics engine to the maximum. The latest core update (which would have been applicable in time for this review) is actually pushing through smaller protiens (hence smaller shaders) on our solution - the previous core was pushing through larger protiens (hence larger shaders) and these score better for us because we were still primarily CPU bound but we used more of the engine, hence we were doing harder WU's, hence higher scoring, in the same time. There's a CAL update that is presently being qualified that should partially reduce the overhead but still won't get us to the maximum the engine can do.
__________________
Expand. Accelerate. Dominate. ATI Radeon HD 5800 Series Graphics Cards - Designed by the Community |
|
|
|
|
|
#6 | |
|
Naughty Boy!
|
Quote:
Quite a difference there. nVidia's approach is simpler, so you don't rely as much on compiler optimization in the first place. But I don't really think that the compiler doesn't do ANY optimization, so I doubt that they could make up for the large performance deficiency. At any rate the ball is in ATi's court, because nVidia already delivered a good SDK and compiler for Cuda. ATi needs to do the same if they want to compete in GPGPU. |
|
|
|
|
|
|
#7 |
|
Naughty Boy!
|
|
|
|
|
|
|
#8 |
|
Gamerscore Wh...
Join Date: Jan 2002
Posts: 12,196
|
They did something different.
__________________
Expand. Accelerate. Dominate. ATI Radeon HD 5800 Series Graphics Cards - Designed by the Community |
|
|
|
|
|
#9 |
|
Member
Join Date: Mar 2008
Location: Jurong West
Posts: 544
|
I'm still waiting for the updated Brook+ and F@HCore to see if it works as I expected. If it does, there's probably going to be an undervolted RV770 doing some churning.
__________________
As a kid I thought the Cray-1 was a futuristic piece of furniture with computers inside. |
|
|
|
|
|
#10 |
|
Naughty Boy!
|
|
|
|
|
|
|
#11 | ||
|
Regular
|
Quote:
It's my impression that Brook+ -> IL is not tuned for ATI's memory architecture. Brook+ even for simple things like matrix multiply, is far from optimal. Of course MM isn't as trivial as it first appears once you have to start programming for the cache architecture to get optimal performance. http://ati.amd.com/technology/stream...08Tutorial.pdf Since Brook+ doesn't expose any of the memory system (unless you consider explicit usage of upto 8 inputs and 8 outputs, "memimport" and "memexport" per kernel as explicit), it's all down to the compiler. Quote:
Jawed |
||
|
|
|
|
|
#12 | |
|
Naughty Boy!
|
Quote:
|
|
|
|
|
|
|
#13 |
|
Regular
|
Did you read the PDF already?
Jawed |
|
|
|
|
|
#14 | |
|
Member
Join Date: Oct 2006
Location: Goettingen, Germany
Posts: 697
|
No folders here?
Quote:
If you have for example an HD3850 and get P4742 with 1254 atoms, your PPD will be ~2.000 or 548 points per WU, every frame takes ~ 4 minutes. With NV-GPU2-Client you will still get "test proteins" with 480 points per WU (576 atoms), with a 8800GT you will need 2 min or less for one frame. So if you want points only, a geforce8/9 is a better choice. But one month ago NV folders got larger projects with 1254 atoms and their PPDs went strongly down. For every WU they got 430 points (and not 548 points like ATi folders): http://foldingforum.org/viewtopic.php?f=52&t=5452
__________________
Hail Brothers and Sisters! Coranon Silaria, Ozoo Mahoke Eta Kooram Nah Smech! Find Chuck Norris. Last edited by Arnold Beckenbauer; 21-Dec-2008 at 15:19. |
|
|
|
|
|
|
#15 |
|
Gamerscore Wh...
Join Date: Jan 2002
Posts: 12,196
|
Personally I don't know the in's-and-out's of what NVIDIA have done with their client, and I'm sure they don't want to tell us. But, what I say is the simple fact of the matter that is fairly well known for those that have followed GPU F@H.
Arnold has pointed out the trend of what happened with NVIDIA, but from our side when the GPU2 client was introduced the smaller test protiens were very, very CPU bound, then a new core came through and larger proteins came through and our score rose significantly because we were executing them in a similar timeframe still, but just using more the the GPU processing power; conversly NVIDIA went down because they were already GPU bound in even on the smaller proteins. The latest core that came through is looking at mid sized proteins so our socres have decreased a little again. Yes, there are
__________________
Expand. Accelerate. Dominate. ATI Radeon HD 5800 Series Graphics Cards - Designed by the Community |
|
|
|
|
|
#16 |
|
Naughty Boy!
|
|
|
|
|
|
|
#17 |
|
Senior Member
Join Date: Sep 2003
Posts: 1,670
|
The PPD discrepancy is because they benchmark with a 3850.
Its very notable that the client stats page shows ATI consistently doing slightly more 'Actual Terraflops' per active client. Divide the Actual Terraflops by the number of active processors & you get: 0.1100 for ATI 0.1099 for NV If they are both doing nearly the same amount of actual work per processor (with ATI even slightly in the lead) the PPD difference is simply because the benchmark is on the ATI side. The benchmark machine goes faster when any ATI side improvement happens so there is no/less PPD increase on the ATI side. I feel the benchmark should really be a CPU running the same simulation model/work unit. Then as long as the end result is the same, the GPU that does more actual work ie finishes the same work unit faster will be the one that gets most PPD.
__________________
However, the above is the heart of the foreskin capacitance |
|
|
|
|
|
#18 | |
|
Member
Join Date: Aug 2003
Posts: 814
|
Quote:
And how G200 gets more points than anything out there, even quad cores running A2s. |
|
|
|
|
|
|
#19 | |
|
Naughty Boy!
|
Quote:
|
|
|
|
|
|
|
#20 |
|
Member
Join Date: Oct 2006
Location: Goettingen, Germany
Posts: 697
|
Because the NV-GPU2-Client was not as efficient as the reference machine.
__________________
Hail Brothers and Sisters! Coranon Silaria, Ozoo Mahoke Eta Kooram Nah Smech! Find Chuck Norris. |
|
|
|
|
|
#21 |
|
Regular
|
|
|
|
|
|
|
#22 | |
|
Naughty Boy!
|
Quote:
People measure performance in Points-Per-Day (PPD), right? So that would have to be a measure of the sum of workunits * weight per time unit. So why would they vary the weight of workunits that are the same size, depending on what processor is being used? If the system is not as efficient, it will not be able to complete as many workunits per day anyway, lowering its score. |
|
|
|
|
|
|
#23 |
|
Naughty Boy!
|
|
|
|
|
|
|
#24 |
|
Gamerscore Wh...
Join Date: Jan 2002
Posts: 12,196
|
And yet here we are just explaining the simple fact of the matter that anyone that has followed folding knows Scali.
__________________
Expand. Accelerate. Dominate. ATI Radeon HD 5800 Series Graphics Cards - Designed by the Community |
|
|
|
|
|
#25 |
|
Naughty Boy!
|
"They did something different" isn't exactly an explanation.
And what if you haven't followed Folding, can't you ask some questions then? I could care less about folding myself, to be perfectly honest with you. So no, I don't run the client myself, never have, and haven't kept up-to-date with its development. However, I am interested in the fact that it's one of the few applications where GPGPU is applied on both ATi and NV hardware, so we can more or less have some kind of comparison between the two approaches to GPGPU. Apparently one is CPU-limited, the other not. And for some reason the scoring is 'adjusted' to processor. |
|
|
|
![]() |
| Bookmarks |
| Thread Tools | |
| Display Modes | |
|
|