Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 12-Nov-2007, 02:42   #51
OICAspork
Member
 
Join Date: May 2003
Location: Nara, The Land of the Rising Sun
Posts: 210
Send a message via AIM to OICAspork
Default

Quote:
Originally Posted by Tim Murray View Post
Welp. D:
I think this may be appropriate reading. Don't forget the video!
OICAspork is offline   Reply With Quote
Old 12-Nov-2007, 04:00   #52
Tim Murray
the Windom Earle of GPUs
 
Join Date: May 2003
Location: Mountain View, CA
Posts: 3,277
Default

I think I can pedant myself out of this!
Tim Murray is offline   Reply With Quote
Old 12-Nov-2007, 12:36   #53
3vi1
Junior Member
 
Join Date: Jan 2007
Posts: 22
Default

Quote:
Originally Posted by Tim Murray View Post
I think I can pedant myself out of this!
pedant = not a verb.
3vi1 is offline   Reply With Quote
Old 12-Nov-2007, 20:13   #54
Lux_
Member
 
Join Date: Sep 2005
Posts: 206
Default

FH
Quote:
we have a next generation GPU core which should be faster, easier to install, and has much more accurate science (at first, comparable to the PS3, but hopefully soon, something even more accurate than the current PS3 client).
Related, I guess.
Lux_ is offline   Reply With Quote
Old 14-Nov-2007, 03:43   #55
Andrew Lauritzen
AndyTX
 
Join Date: May 2004
Location: British Columbia, Canada
Posts: 2,169
Default

I spoke to some AMD people at SC07 (come by the RapidMind booth if you're there! ). Some tidbits that I gleaned:

- the card is indeed R670-based - graphics parts to come soon
- the graphics parts will also support double precision
- the FireGL offering will basically be a superset of this card (i.e. R670, 2GB RAM but with display outputs naturally)
- double precision is 50% speed, but no fused MAD (seems in line with the ~40% figure quoted earlier in this thread)

So maybe some hat eating/video recording to come yet?

Anyways I can't stand 100% behind these facts since the AMD people may even have been in error, but they are probably fairly reliable, and nothing too crazy.
Andrew Lauritzen is offline   Reply With Quote
Old 14-Nov-2007, 04:28   #56
BRiT
...
 
Join Date: Feb 2002
Location: Cleveland
Posts: 5,201
Default

Quote:
Originally Posted by AndyTX View Post
- the card is indeed R670-based - graphics parts to come soon
- double precision is 50% speed, but no fused MAD (seems in line with the ~40% figure quoted earlier in this thread)
Why is SP so slow then?




__________________
IBSL: 2835, 6541, 8531, 9299, 86985, 87130
FBSL: 7221, 9255, 15892, 20484, 89453
BRiT is offline   Reply With Quote
Old 14-Nov-2007, 04:46   #57
Dave Baumann
Gamerscore Wh...
 
Join Date: Jan 2002
Posts: 13,304
Default

Quote:
Originally Posted by AndyTX View Post
- double precision is 50% speed, but no fused MAD (seems in line with the ~40% figure quoted earlier in this thread)
The "5'th" special function scalar is actually not invloved in DP calculations.
__________________
Radeon is Gaming
Tweet Tweet!
Dave Baumann is offline   Reply With Quote
Old 14-Nov-2007, 06:14   #58
Andrew Lauritzen
AndyTX
 
Join Date: May 2004
Location: British Columbia, Canada
Posts: 2,169
Default

Quote:
Originally Posted by BRiT View Post
Why is SP so slow then?
I don't know much about the hardware, but I think 50% DP seems to imply a fairly full utilization of the silicon in both single and double cases, no?

Quote:
Originally Posted by Dave Baumann
The "5'th" special function scalar is actually not invloved in DP calculations.
Ah that makes sense The note about no MAD is interesting though as if you're doing heavy MAD stuff (say, interpolation, evaluating berstein polynomials, etc) you're getting 1/4 the speed DP rather than 1/2. Still, it's probably not critical for most code and 50% issue rates for other instructions is pretty good.

Now I'm wondering whether transcendentals are also 50% and if so, are they accurate to more bits in DP, or the same as SP?
Andrew Lauritzen is offline   Reply With Quote
Old 14-Nov-2007, 15:32   #59
3dilettante
Regular
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 5,222
Default

That puts RV670's DP throughput at less than 25% of the peak FLOPS figure given for the chip in the AMD PDF.
My math earlier was based on peak marketing numbers, which I were going by the FMADD peak.

Actually, the 5th special function ALU being left out leaves my earlier math slightly optimistic when it comes to price/performance and performance/watt in the redundancy case.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline   Reply With Quote
Old 14-Nov-2007, 15:43   #60
Jawed
Regular
 
Join Date: Oct 2004
Location: London
Posts: 9,948
Send a message via Skype™ to Jawed
Default

See this document

http://www.cs.berkeley.edu/~samw/pro...ore/sc2007.pdf

for an example of how "useless" x86 (particularly Intel) is, and why "peak" is such a meaningless concept. The applications covered, such as Proteins, FEM and circuit simulation look like reasonable evaluation candidates for GPGPU workloads...

Jawed
Jawed is offline   Reply With Quote
Old 14-Nov-2007, 16:30   #61
3dilettante
Regular
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 5,222
Default

I've held off on discussing the difference between peak and sustained performance because it would have bloated my initial post with a discussion that is too complex to resolve for a general case.

Because at the time there was no confirmation of RV670's DP output, I only had AMD's peak SP numbers with which to compare to Intel's DP flops per cycle.

I did not think it fair to compare x86 sustained real-world flops to a GPU's unverified peak DP based off its overly optimistic SP FMADD peak.

It's not like we can't think up workloads where GPUs fall on their faces and cases where CPUs get near peak (or cases where the R600 has faltered in comparison to another GPU).
It's not something I wanted to derail the thread with.

My initial point is that a GPU is a clear price/performance winner at SP, even if it is running the same work on two cards.
I then went on to point out that at the (slightly optimistic) estimate of 25% DP throughput, that same tandem arrangement is not so clear a win.

Even if the individual card is an order of magnitude better at SP performance, cutting overall performance of the GPU by quartering its performance and then halving it again by doing tandem work removes that order of magnitude, particularly at the price point given.

Now, if it were a system of gamer RV670s, the price/performance is blasted out of the park, given it would cost an order of magnitude less.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline   Reply With Quote
Old 14-Nov-2007, 17:05   #62
Geo
Mostly Harmless
 
Join Date: Apr 2002
Location: Uffda-land
Posts: 9,156
Send a message via MSN to Geo
Default

Assuming the bit about DP being exposed on the consumer version as well turns out to be true, I wonder what F@H will do with an X2? Treat it like two separate cards that you have to run separate instances on?

/me starts to get moderately interested in an X2
__________________
"We'll thrash them --absolutely thrash them."--Richard Huddy on Larrabee
"Our multi-decade old 3D graphics rendering architecture that's based on a rasterization approach is no longer scalable and suitable for the demands of the future." --Pat Gelsinger, Intel
". . .its taking us longer than we would have liked to get a [Crossfire game] profiling system out there" --Terry Makedon, ATI, July 2006
"Christ, this is Beyond3D; just get rid of any f**ker talking about patterned chihuahuas! Can the dog write GLSL? No. Then it can f**k off." --Da Boss
Geo is offline   Reply With Quote
Old 14-Nov-2007, 17:15   #63
Tim Murray
the Windom Earle of GPUs
 
Join Date: May 2003
Location: Mountain View, CA
Posts: 3,277
Default

Quote:
Originally Posted by Geo View Post
Assuming the bit about DP being exposed on the consumer version as well turns out to be true, I wonder what F@H will do with an X2? Treat it like two separate cards that you have to run separate instances on?
At the risk of eating more hats, I'd be shocked if DP is available on consumer boards. Otherwise, why would you buy a FireStream? There may well be 1GB HD 3870s, and you'd be able to buy, what... at least 7 or 8 of those for the price of one FireStream?
Tim Murray is offline   Reply With Quote
Old 14-Nov-2007, 17:20   #64
Dave Baumann
Gamerscore Wh...
 
Join Date: Jan 2002
Posts: 13,304
Default

Quote:
Originally Posted by 3dilettante View Post
That puts RV670's DP throughput at less than 25% of the peak FLOPS figure given for the chip in the AMD PDF.
Its between 20%-40% dependant on the mix of operations.
__________________
Radeon is Gaming
Tweet Tweet!
Dave Baumann is offline   Reply With Quote
Old 14-Nov-2007, 17:50   #65
MfA
Regular
 
Join Date: Feb 2002
Posts: 5,521
Send a message via ICQ to MfA
Default

Quote:
Originally Posted by AndyTX View Post
- double precision is 50% speed, but no fused MAD (seems in line with the ~40% figure quoted earlier in this thread)
Dunno how you come to that conclusion ... hell I don't know how you let someone get away with saying "50% speed" to you without saying "Huh?". It's ambiguous to the point of not really meaning anything.
MfA is offline   Reply With Quote
Old 14-Nov-2007, 17:58   #66
Anarchist4000
Member
 
Join Date: May 2004
Location: Somewhere, IN USA
Posts: 313
Default

Quote:
Originally Posted by Tim Murray View Post
At the risk of eating more hats, I'd be shocked if DP is available on consumer boards. Otherwise, why would you buy a FireStream? There may well be 1GB HD 3870s, and you'd be able to buy, what... at least 7 or 8 of those for the price of one FireStream?
Keep in mind the original Firestream cards (1900XTs) had a MSRP of >$1k but were being sold around normal 1900XT prices. Stick 2GB of RAM, a 5 year warranty, and a vacuum cleaner to it and the price is a little more justifiable.

Plus with folding and other GPGPU apps, which may want to use DP, it doesn't do a whole lot of good if it's not available on any of the consumer cards. Besides HDR with FP64 is the next best thing!

Anarchist4000 is offline   Reply With Quote
Old 14-Nov-2007, 18:27   #67
3dilettante
Regular
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 5,222
Default

Quote:
Originally Posted by Dave Baumann View Post
Its between 20%-40% dependant on the mix of operations.
With those numbers, the lower bound would make doubling up on firestream cards for redundant execution a weak proposition, while the upper bound would make the scheme incrementally price/performance-competitive at the suggested price.

I'm liking the idea of buying 8 (4 if I'm not buying another node to host the cards) HD3870s, if they are DP-capable as well.

At consumer card prices, a redundant execution scheme would be a worthwhile investment, unless Firestream comes with some very nifty order-of-magnitude exclusives.

Or I could just run 8 of them full-tilt if I could care less about data integrity.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline   Reply With Quote
Old 14-Nov-2007, 18:35   #68
Dave Baumann
Gamerscore Wh...
 
Join Date: Jan 2002
Posts: 13,304
Default

You have to take efficiency for the particular task you are looking at into account as well. We already have 3rd party measurements for DP FFT calcutions that are higher throughput than that of the theoretical rates of the CPU's you are talking about.
__________________
Radeon is Gaming
Tweet Tweet!
Dave Baumann is offline   Reply With Quote
Old 14-Nov-2007, 19:24   #69
3dilettante
Regular
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 5,222
Default

I'm not disputing that GPUs can be more efficient.
In the case where we don't buy 2 cards to run the exact same work unit, I'd say it's an across the board win for anything GPUs don't fall on their face running.

It's only in DP with redundant cards that there is more argument.
I agree that Firestream can be a good buy, just not so amazingly good at that pricepoint that buying multiples to run redundant work units isn't a waste.

Other factors such as rack space and the cost of surrounding hardware also go up when a system doubles the number of cards to run the same amount of work, though I've tried to simplify matters to avoid bogging down on the large number of possible configurations.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline   Reply With Quote
Old 14-Nov-2007, 19:53   #70
Dave Baumann
Gamerscore Wh...
 
Join Date: Jan 2002
Posts: 13,304
Default

Excuse me, why are we buying redundant cards?
__________________
Radeon is Gaming
Tweet Tweet!
Dave Baumann is offline   Reply With Quote
Old 14-Nov-2007, 19:56   #71
nAo
Nutella Nutellae
 
Join Date: Feb 2002
Location: San Francisco
Posts: 4,321
Default

Quote:
Originally Posted by Dave Baumann View Post
Excuse me, why are we buying redundant cards?
cosmic rays ftw.
__________________
[twitter]
More samples, we need more samples! [Dean Calver]
First they ignore you, then they laugh at you, then they fight you, then you win. [Mahatma Gandhi]
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way
nAo is offline   Reply With Quote
Old 14-Nov-2007, 19:56   #72
3dilettante
Regular
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 5,222
Default

Quote:
Originally Posted by Dave Baumann View Post
Excuse me, why are we buying redundant cards?
That was brought up earlier in the thread when dealing with very large clusters. Error rates as they are for DRAM on a system with terabytes of RAM would be on the order of one error about every five minutes.
__________________
Dreaming of a .065 micron etch-a-sketch.

Last edited by 3dilettante; 14-Nov-2007 at 20:44. Reason: added quote for clarity
3dilettante is offline   Reply With Quote
Old 14-Nov-2007, 21:31   #73
Dave Baumann
Gamerscore Wh...
 
Join Date: Jan 2002
Posts: 13,304
Default

I would, then, think about the applications for a single add-in card product such as been announced here. Solutions for large clusters are going to be different.
__________________
Radeon is Gaming
Tweet Tweet!
Dave Baumann is offline   Reply With Quote
Old 14-Nov-2007, 22:19   #74
3dilettante
Regular
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 5,222
Default

If such a demarcation exists or will exist, it would make sense.
Smaller systems would have a wider window between errors and would be less bogged down by the overhead of repeated calculation.

I did not see a statement to the effect that the FireStream 9170 was a single add-in card product.

Considering the rise of cheap clusters in HPC, I interpreted the product's being touted for HPC to indicate it would match that portion of the target market.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline   Reply With Quote
Old 14-Nov-2007, 22:33   #75
Dave Baumann
Gamerscore Wh...
 
Join Date: Jan 2002
Posts: 13,304
Default

Take a look here:

http://ati.amd.com/technology/stream...-computing.pdf
__________________
Radeon is Gaming
Tweet Tweet!
Dave Baumann is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 14:32.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.