Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 14-Nov-2007, 22:51   #76
3dilettante
Regular
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 5,222
Default

Which statement should I interpret to the effect that 9170 is a single-card add-in?

The "supported in variety of server and desktop systems", the "AMD partners with system vendors and integrators to deliver system solutions", or did I overlook something?
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is online now   Reply With Quote
Old 15-Nov-2007, 02:46   #77
Andrew Lauritzen
AndyTX
 
Join Date: May 2004
Location: British Columbia, Canada
Posts: 2,169
Default

Quote:
Originally Posted by MfA View Post
Dunno how you come to that conclusion ... hell I don't know how you let someone get away with saying "50% speed" to you without saying "Huh?". It's ambiguous to the point of not really meaning anything.
It's 50% issue rate (40% I guess if you consider that the 5-th scalar unit is not involved in DP stuff as Dave mentioned) for non-MADD instructions... how ambiguous is that?

I've now heard similar things quoted from several AMD guys, including Dave above... what are you not believing here?
Andrew Lauritzen is offline   Reply With Quote
Old 15-Nov-2007, 03:38   #78
Tim Murray
the Windom Earle of GPUs
 
Join Date: May 2003
Location: Mountain View, CA
Posts: 3,277
Default

Quote:
Originally Posted by AndyTX View Post
It's 50% issue rate (40% I guess if you consider that the 5-th scalar unit is not involved in DP stuff as Dave mentioned) for non-MADD instructions... how ambiguous is that?

I've now heard similar things quoted from several AMD guys, including Dave above... what are you not believing here?
I don't think it's quite that simple--I've got some numbers that I need to look at again, but it's not as obvious as you'd think.
Tim Murray is offline   Reply With Quote
Old 15-Nov-2007, 06:22   #79
Andrew Lauritzen
AndyTX
 
Join Date: May 2004
Location: British Columbia, Canada
Posts: 2,169
Default

Quote:
Originally Posted by Tim Murray View Post
I don't think it's quite that simple--I've got some numbers that I need to look at again, but it's not as obvious as you'd think.
Oh it rarely is, but I still don't see why the statment is ambiguous, nor why I should doubt the information that I've now heard from several sources.

I certainly agree with Dave though in that these sorts of performance figures are only rough indicators of real-world numbers. Particularly since most applications tend to be memory-bound nowadays (doubles only make this worse!), I'm not too concerned with what seem to be fairly decent instruction issue rates... only an extreme figure like 1/20 or something would be cause for potential alarm. Beyond that I'm unconvinced of the utility of crunching the numbers oblivious to a particular application domain.
Andrew Lauritzen is offline   Reply With Quote
Old 15-Nov-2007, 14:59   #80
3dilettante
Regular
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 5,222
Default

Perhaps the hint is that certain operations can be issued at greater than 1/2 the SP issue rate.

A strong argument against more than 1/2 SP issue for a math op like an ADD is that accessing and writing back the required number of bits for the operation is a limiting factor where improvements to one directly translate to improvements to both.

If a unit can pull in enough data to run DP at the same rate as SP issue, the question becomes why it can't be used to double SP issue.

On the other hand, operations that have issue rates of less than 1 per cycle may be waiting on something other than data movement, and it doesn't necessarily follow that a DP instruction has to wait twice as long in such circumstances.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is online now   Reply With Quote
Old 15-Nov-2007, 15:25   #81
nAo
Nutella Nutellae
 
Join Date: Feb 2002
Location: San Francisco
Posts: 4,321
Default

Well.. it seems that having an issue rate and troughput per clock for DP operations that is half of your SP rate is the best thing you can do before starting to change your datapaths.
__________________
[twitter]
More samples, we need more samples! [Dean Calver]
First they ignore you, then they laugh at you, then they fight you, then you win. [Mahatma Gandhi]
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way
nAo is offline   Reply With Quote
Old 15-Nov-2007, 15:42   #82
3dilettante
Regular
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 5,222
Default

That would be true if every instruction's issue rate is bottlenecked by either getting data from the registers or by computation that scales linearly with data type size.

There are instructions that cannot be issued every cycle, and any of them that have a cycle due to other factors such as not being fully pipelined or needing to arbitrate shared hardware have a delay factor that does not scale with the number of bits in the operands.

In that case, while the data-limited portion of the instruction's issue rate might double, the remaining delay cycles may remain the same, which shifts the fraction to greater than 1/2.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is online now   Reply With Quote
Old 15-Nov-2007, 17:30   #83
MfA
Regular
 
Join Date: Feb 2002
Posts: 5,521
Send a message via ICQ to MfA
Default

Quote:
Originally Posted by AndyTX View Post
It's 50% issue rate (40% I guess if you consider that the 5-th scalar unit is not involved in DP stuff as Dave mentioned) for non-MADD instructions... how ambiguous is that?
Well for one you say it's issue rate, but you didn't say that was what you were told. Even if we assume for a moment it has 50% issue rate, it seems rather unlikely that goes for all instructions except FMADD. We can make a further assumption that there are no back to back multiplies at 50%. But if we have to string multiple assumptions together to get some idea of the actual speed then I'd say the original statement was ambiguous.

Last edited by MfA; 15-Nov-2007 at 17:51.
MfA is offline   Reply With Quote
Old 16-Nov-2007, 01:37   #84
Andrew Lauritzen
AndyTX
 
Join Date: May 2004
Location: British Columbia, Canada
Posts: 2,169
Default

Quote:
Originally Posted by MfA View Post
But if we have to string multiple assumptions together to get some idea of the actual speed then I'd say the original statement was ambiguous.
I think the original statement (and those from Dave) give a good, unambigous "idea of the actual speed". Sure it didn't allow you to cycle-count some sequence of code, but as I mentioned earlier I severely doubt the utility of such a task.

Last edited by Andrew Lauritzen; 16-Nov-2007 at 04:18.
Andrew Lauritzen is offline   Reply With Quote
Old 24-Dec-2007, 05:47   #85
itaru
Member
 
Join Date: May 2007
Posts: 134
Default

http://ati.amd.com/technology/stream.../register.html
Download AMD Stream Computing SDK
itaru is offline   Reply With Quote
Old 24-Dec-2007, 09:13   #86
Tim Murray
the Windom Earle of GPUs
 
Join Date: May 2003
Location: Mountain View, CA
Posts: 3,277
Default

some questions for our good buddy mhouston, based on me quickly glancing through the documentation (it's 3AM on Christmas Eve! admittedly, there aren't very many things I'd rather be doing at 3AM on Christmas Eve, though...):

- gather and scatter are mentioned in a few places, but it seems to be "will be supported eventually?" are they in now or what? what's the timeframe for that if not?

- is this R600 only? some of the PDFs mentioned R580 but Brook+ requires R6x0. (well now I see that CAL requires R600 so I assume that this is the case, but I figure I'd get it from the horse's mouth)

- is there any way to run this on the CPU like you can with the CUDA simulator? maybe high-level language to IL to OpenMP/pthreads/some other silly CPU-side thread implementation that we should never talk about because it's bad?

- no way to do inter-thread synchronization in Brook+? Arun and I were debating the use of the mythical SMX as a global cache, but I guess nothing came of that
Tim Murray is offline   Reply With Quote
Old 24-Dec-2007, 09:20   #87
itaru
Member
 
Join Date: May 2007
Posts: 134
Default

CAL:2400-2900 ok
but windows XP(32bit) only.now
itaru is offline   Reply With Quote
Old 24-Dec-2007, 09:24   #88
Tim Murray
the Windom Earle of GPUs
 
Join Date: May 2003
Location: Mountain View, CA
Posts: 3,277
Default

Quote:
Originally Posted by itaru View Post
CAL:2400-2900 ok
but windows XP(32bit) only.now
I assume that's a minor driver issue, since Brook has been Linux for a long time (and I think CTM was working on Linux as well)

ps: congrats for getting this out, AMD guys
Tim Murray is offline   Reply With Quote
Old 24-Dec-2007, 13:41   #89
Jawed
Regular
 
Join Date: Oct 2004
Location: London
Posts: 9,948
Send a message via Skype™ to Jawed
Default

Interestingly, the whitepaper:

http://ati.amd.com/technology/stream...whitepaper.pdf

shows GPUSA being able to translate Brook into intermediate language (pixel shader it would seem) code and then provide performance analysis. Though the code snippet shown appears to be doing a whole load of faffing with superfluous MOV instructions.

Jawed
Jawed is offline   Reply With Quote
Old 24-Dec-2007, 19:56   #90
mhouston
A little of this and that
 
Join Date: Oct 2005
Location: Cupertino
Posts: 342
Default

To answer questions for an Nvidian...

Quote:
Originally Posted by Tim Murray View Post
- gather and scatter re mentioned in a few places, but it seems to be "will be supported eventually?" are they in now or what? what's the timeframe for that if not?
CAL supports gather on all hardware and scatter on 670 (along with double precision). Brook supports gather currently and Brook+ will support scatter on some hardware in the near future.

Quote:
Originally Posted by Tim Murray View Post
- is this R600 only? some of the PDFs mentioned R580 but Brook+ requires R6x0. (well now I see that CAL requires R600 so I assume that this is the case, but I figure I'd get it from the horse's mouth)
R6XX and above through CAL and Brook+, although some features will only be supported on newer hardware (much like Nvidia).

Quote:
Originally Posted by Tim Murray View Post
- is there any way to run this on the CPU like you can with the CUDA simulator? maybe high-level language to IL to OpenMP/pthreads/some other silly CPU-side thread implementation that we should never talk about because it's bad?
Brook has a CPU backend that has OpenMP support. Niall did the work and it scales quite well. You can pick it up from his branch on the SourceForge tree. CAL itself does not currently have CPU emulation.

Quote:
Originally Posted by Tim Murray View Post
- no way to do inter-thread synchronization in Brook+? Arun and I were debating the use of the mythical SMX as a global cache, but I guess nothing came of that
Thread synchronization is not currently exposed in Brook+. Thread sync gets tricky since you can sync at different levels of granularity. Much like how CUDA allows you to sync warps in a block, but not within a warp or on the full grid. Synchronization primitives are being explored for future versions of Brook, but that is all I can say at this stage.

Okay, back to holiday stuff.
mhouston is offline   Reply With Quote
Old 24-Dec-2007, 19:57   #91
mhouston
A little of this and that
 
Join Date: Oct 2005
Location: Cupertino
Posts: 342
Default

I should also mention that we are working on support for 32/64 XP, Vista, Linux and those should be up in the near future. They are all running in house, it's just getting the verification and testing work done
mhouston is offline   Reply With Quote
Old 24-Dec-2007, 20:20   #92
Tim Murray
the Windom Earle of GPUs
 
Join Date: May 2003
Location: Mountain View, CA
Posts: 3,277
Default

Thanks Mike
Tim Murray is offline   Reply With Quote
Old 22-Mar-2008, 11:03   #93
itaru
Member
 
Join Date: May 2007
Posts: 134
Default

update
Version 1.0 (beta)
new support Windows XP 64-bit(vista?)

http://ati.amd.com/technology/stream.../sdkdwnld.html

amd-cal-install-win.txt
------------------------------------------------------------------------------------------------------
AMD CAL INSTALLATION NOTES:


WINDOWS XP (32bit + x64):


This release of the AMD CAL Software Development Kit includes a single
install package (MSI), which must be installed as a user with
administrative priveleges.

You値l need to have the .NET Framework v2.0 redistributables installed
prior to running the install package.

Once you have all of the necessary dependencies installed, run the
AMD CAL SDK MSI installer:

amd-cal-sdk-v1.00.0-beta.msi

By default, the AMD CAL SDK installs into:

"C:\Program Files\AMD\CAL SDK v1.00.0-beta"


WINDOWS Vista (32bit + x64 ):


For dual GPU configuration ATI CrossFire must be disabled.
To disable Crossfire open up the ATI Catalyst Control Center and
in the Advanced View select CrossFire. In the right hand pane, uncheck
Enable CrossFire and click OK.
itaru is offline   Reply With Quote
Old 22-Mar-2008, 17:07   #94
Jawed
Regular
 
Join Date: Oct 2004
Location: London
Posts: 9,948
Send a message via Skype™ to Jawed
Default

Quote:
[...]each of the processing units has 5 scalar processing units, each capable of doing integer and single precision floating point operations. One of the 5 is also capable of doing double precision floating point operations as well as transcendentals. When you issue a double-precision instruction, then it will be using the scalar processor that is capable of doing double precision and not the other scalar processors in the VLIW processing unit.
http://forums.amd.com/devforum/textt...ltmsgid=875404

From thread:

http://forums.amd.com/devforum/messa...&enterthread=y

So it seems AMD widened the T ALU for double precision. The question remains whether double-precision transcendentals are possible...

Jawed
Jawed is offline   Reply With Quote
Old 22-Mar-2008, 17:56   #95
Farhan
Member
 
Join Date: May 2005
Location: in the shade
Posts: 152
Default

Sounds strange... the guy is saying only 1 out of the 5 units is capable of double precision ops? Which means, unless some magic is used, the peak throughput rate is only 1/5 of single precision. Sounds wrong, since some double precision operations are supposedly half speed.

Srsly, that guy is giving too much credit to the RysUnit
__________________
[03:44] <thefarhan> i have exactly 128 friends right now :D
[03:45] <Jollemi> you have to teach them to remember 1MB worth of data, and see if you can run Windows 9x or Linux
Farhan is offline   Reply With Quote
Old 22-Mar-2008, 18:13   #96
Rys
Tiled
 
Join Date: Oct 2003
Location: Abbots Langley, UK
Posts: 2,712
Default

I wrote back in the RV670 intro: Out of the full 320 scalar shader units in RV670 -- the same number as R600 remember - 4/5 of those, the thinner ones, can do double precision IEEE754-compliant math. The rates might not be massive compared to the SP performance (around half speed for ADD and a quarter for MUL, depending on other factors)...

99.9% sure I got that right.
__________________
Mr. Popples!
Rys is offline   Reply With Quote
Old 22-Mar-2008, 18:24   #97
Jawed
Regular
 
Join Date: Oct 2004
Location: London
Posts: 9,948
Send a message via Skype™ to Jawed
Default

Well I guess someone can give this a play since the SDK now supports doubles.

Jawed
Jawed is offline   Reply With Quote
Old 25-Mar-2008, 00:43   #98
Rys
Tiled
 
Join Date: Oct 2003
Location: Abbots Langley, UK
Posts: 2,712
Default

Mike, any chance of a Linux driver so I can go push some code around?
__________________
Mr. Popples!
Rys is offline   Reply With Quote
Old 25-Mar-2008, 00:53   #99
AlexV
Heteroscedasticitate
 
Join Date: Mar 2005
Posts: 2,422
Default

Quote:
Originally Posted by Rys View Post
Mike, any chance of a Linux driver so I can go push some code around?
I'm truly flattered to be part of your sig
__________________
Donald Knuth: Science is what we understand well enough to explain to a computer. Art is everything else we do.
AlexV is offline   Reply With Quote
Old 25-Mar-2008, 01:41   #100
mhouston
A little of this and that
 
Join Date: Oct 2005
Location: Cupertino
Posts: 342
Default

We are working on Linux support. Getting all the i's dotted and t's crossed with different kernel/driver/chipset combinations is more "interesting" on linux than XP/Vista.
mhouston is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 23:18.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.