If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#76 |
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,263
|
Which statement should I interpret to the effect that 9170 is a single-card add-in?
The "supported in variety of server and desktop systems", the "AMD partners with system vendors and integrators to deliver system solutions", or did I overlook something?
__________________
Dreaming of a .065 micron etch-a-sketch. |
|
|
|
|
|
#77 | |
|
AndyTX
Join Date: May 2004
Location: British Columbia, Canada
Posts: 1,885
|
Quote:
I've now heard similar things quoted from several AMD guys, including Dave above... what are you not believing here? |
|
|
|
|
|
|
#78 | |
|
chaos dunk
Join Date: May 2003
Location: Mountain View, CA
Posts: 3,274
|
Quote:
|
|
|
|
|
|
|
#79 | |
|
AndyTX
Join Date: May 2004
Location: British Columbia, Canada
Posts: 1,885
|
Quote:
I certainly agree with Dave though in that these sorts of performance figures are only rough indicators of real-world numbers. Particularly since most applications tend to be memory-bound nowadays (doubles only make this worse!), I'm not too concerned with what seem to be fairly decent instruction issue rates... only an extreme figure like 1/20 or something would be cause for potential alarm. Beyond that I'm unconvinced of the utility of crunching the numbers oblivious to a particular application domain. |
|
|
|
|
|
|
#80 |
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,263
|
Perhaps the hint is that certain operations can be issued at greater than 1/2 the SP issue rate.
A strong argument against more than 1/2 SP issue for a math op like an ADD is that accessing and writing back the required number of bits for the operation is a limiting factor where improvements to one directly translate to improvements to both. If a unit can pull in enough data to run DP at the same rate as SP issue, the question becomes why it can't be used to double SP issue. On the other hand, operations that have issue rates of less than 1 per cycle may be waiting on something other than data movement, and it doesn't necessarily follow that a DP instruction has to wait twice as long in such circumstances.
__________________
Dreaming of a .065 micron etch-a-sketch. |
|
|
|
|
|
#81 |
|
Nutella Nutellae
Join Date: Feb 2002
Location: San Francisco
Posts: 4,308
|
Well.. it seems that having an issue rate and troughput per clock for DP operations that is half of your SP rate is the best thing you can do before starting to change your datapaths.
__________________
[twitter] More samples, we need more samples! [Dean Calver] First they ignore you, then they laugh at you, then they fight you, then you win. [Mahatma Gandhi] The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way |
|
|
|
|
|
#82 |
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,263
|
That would be true if every instruction's issue rate is bottlenecked by either getting data from the registers or by computation that scales linearly with data type size.
There are instructions that cannot be issued every cycle, and any of them that have a cycle due to other factors such as not being fully pipelined or needing to arbitrate shared hardware have a delay factor that does not scale with the number of bits in the operands. In that case, while the data-limited portion of the instruction's issue rate might double, the remaining delay cycles may remain the same, which shifts the fraction to greater than 1/2.
__________________
Dreaming of a .065 micron etch-a-sketch. |
|
|
|
|
|
#83 |
|
Regular
|
Well for one you say it's issue rate, but you didn't say that was what you were told. Even if we assume for a moment it has 50% issue rate, it seems rather unlikely that goes for all instructions except FMADD. We can make a further assumption that there are no back to back multiplies at 50%. But if we have to string multiple assumptions together to get some idea of the actual speed then I'd say the original statement was ambiguous.
Last edited by MfA; 15-Nov-2007 at 17:51. |
|
|
|
|
|
#84 |
|
AndyTX
Join Date: May 2004
Location: British Columbia, Canada
Posts: 1,885
|
I think the original statement (and those from Dave) give a good, unambigous "idea of the actual speed". Sure it didn't allow you to cycle-count some sequence of code, but as I mentioned earlier I severely doubt the utility of such a task.
Last edited by Andrew Lauritzen; 16-Nov-2007 at 04:18. |
|
|
|
|
|
#85 |
|
Member
Join Date: May 2007
Posts: 103
|
http://ati.amd.com/technology/stream.../register.html
Download AMD Stream Computing SDK |
|
|
|
|
|
#86 |
|
chaos dunk
Join Date: May 2003
Location: Mountain View, CA
Posts: 3,274
|
some questions for our good buddy mhouston, based on me quickly glancing through the documentation (it's 3AM on Christmas Eve! admittedly, there aren't very many things I'd rather be doing at 3AM on Christmas Eve, though...):
- gather and scatter are mentioned in a few places, but it seems to be "will be supported eventually?" are they in now or what? what's the timeframe for that if not? - is this R600 only? some of the PDFs mentioned R580 but Brook+ requires R6x0. (well now I see that CAL requires R600 so I assume that this is the case, but I figure I'd get it from the horse's mouth) - is there any way to run this on the CPU like you can with the CUDA simulator? maybe high-level language to IL to OpenMP/pthreads/some other silly CPU-side thread implementation that we should never talk about because it's bad? - no way to do inter-thread synchronization in Brook+? Arun and I were debating the use of the mythical SMX as a global cache, but I guess nothing came of that |
|
|
|
|
|
#87 |
|
Member
Join Date: May 2007
Posts: 103
|
CAL:2400-2900 ok
but windows XP(32bit) only.now |
|
|
|
|
|
#88 |
|
chaos dunk
Join Date: May 2003
Location: Mountain View, CA
Posts: 3,274
|
|
|
|
|
|
|
#89 |
|
Regular
|
Interestingly, the whitepaper:
http://ati.amd.com/technology/stream...whitepaper.pdf shows GPUSA being able to translate Brook into intermediate language (pixel shader it would seem) code and then provide performance analysis. Though the code snippet shown appears to be doing a whole load of faffing with superfluous MOV instructions. Jawed |
|
|
|
|
|
#90 | ||||
|
A little of this and that
Join Date: Oct 2005
Location: Cupertino
Posts: 342
|
To answer questions for an Nvidian...
Quote:
Quote:
Quote:
Quote:
Okay, back to holiday stuff. |
||||
|
|
|
|
|
#91 |
|
A little of this and that
Join Date: Oct 2005
Location: Cupertino
Posts: 342
|
I should also mention that we are working on support for 32/64 XP, Vista, Linux and those should be up in the near future. They are all running in house, it's just getting the verification and testing work done
|
|
|
|
|
|
#92 |
|
chaos dunk
Join Date: May 2003
Location: Mountain View, CA
Posts: 3,274
|
Thanks Mike
|
|
|
|
|
|
#93 |
|
Member
Join Date: May 2007
Posts: 103
|
update
Version 1.0 (beta) new support Windows XP 64-bit(vista?) http://ati.amd.com/technology/stream.../sdkdwnld.html amd-cal-install-win.txt ------------------------------------------------------------------------------------------------------ AMD CAL INSTALLATION NOTES: WINDOWS XP (32bit + x64): This release of the AMD CAL Software Development Kit includes a single install package (MSI), which must be installed as a user with administrative priveleges. You値l need to have the .NET Framework v2.0 redistributables installed prior to running the install package. Once you have all of the necessary dependencies installed, run the AMD CAL SDK MSI installer: amd-cal-sdk-v1.00.0-beta.msi By default, the AMD CAL SDK installs into: "C:\Program Files\AMD\CAL SDK v1.00.0-beta" WINDOWS Vista (32bit + x64 ): For dual GPU configuration ATI CrossFire must be disabled. To disable Crossfire open up the ATI Catalyst Control Center and in the Advanced View select CrossFire. In the right hand pane, uncheck Enable CrossFire and click OK. |
|
|
|
|
|
#94 | |
|
Regular
|
Quote:
From thread: http://forums.amd.com/devforum/messa...&enterthread=y So it seems AMD widened the T ALU for double precision. The question remains whether double-precision transcendentals are possible... Jawed |
|
|
|
|
|
|
#95 |
|
Member
Join Date: May 2005
Location: in the shade
Posts: 152
|
Sounds strange... the guy is saying only 1 out of the 5 units is capable of double precision ops? Which means, unless some magic is used, the peak throughput rate is only 1/5 of single precision. Sounds wrong, since some double precision operations are supposedly half speed.
Srsly, that guy is giving too much credit to the RysUnit
__________________
[03:44] <thefarhan> i have exactly 128 friends right now :D [03:45] <Jollemi> you have to teach them to remember 1MB worth of data, and see if you can run Windows 9x or Linux |
|
|
|
|
|
#96 |
|
Tiled
Join Date: Oct 2003
Location: Kings Langley, UK
Posts: 2,675
|
I wrote back in the RV670 intro: Out of the full 320 scalar shader units in RV670 -- the same number as R600 remember - 4/5 of those, the thinner ones, can do double precision IEEE754-compliant math. The rates might not be massive compared to the SP performance (around half speed for ADD and a quarter for MUL, depending on other factors)...
99.9% sure I got that right.
__________________
A major redesign of the core ALU pineapple boomerang fortress. |
|
|
|
|
|
#97 |
|
Regular
|
Well I guess someone can give this a play since the SDK now supports doubles.
Jawed |
|
|
|
|
|
#98 |
|
Tiled
Join Date: Oct 2003
Location: Kings Langley, UK
Posts: 2,675
|
Mike, any chance of a Linux driver so I can go push some code around?
__________________
A major redesign of the core ALU pineapple boomerang fortress. |
|
|
|
|
|
#99 | |
|
Heteroscedasticitate
Join Date: Mar 2005
Posts: 2,362
|
Quote:
__________________
Donald Knuth: Science is what we understand well enough to explain to a computer. Art is everything else we do. |
|
|
|
|
|
|
#100 |
|
A little of this and that
Join Date: Oct 2005
Location: Cupertino
Posts: 342
|
We are working on Linux support. Getting all the i's dotted and t's crossed with different kernel/driver/chipset combinations is more "interesting" on linux than XP/Vista.
|
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|