Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 12-Apr-2012, 09:01   #4076
Gipsel
Senior Member
 
Join Date: Jan 2010
Location: Hamburg, Germany
Posts: 1,450
Default

Quote:
Originally Posted by UniversalTruth View Post
Also, it's not logical that in most countries around Europe that experience much poorer than in Canada or USA standard of life, everything to be so much more expensive.
You mean Eastern Europe, Middle East and North Africa?
Gipsel is offline   Reply With Quote
Old 12-Apr-2012, 09:09   #4077
UniversalTruth
Former Member
 
Join Date: Sep 2010
Posts: 1,529
Default

Quote:
Originally Posted by Gipsel View Post
You mean Eastern Europe, Middle East and North Africa?
Not only those, but Central European countries, Germany (especially german engineers who go to USA to work there ), France, Italy, even Israel, Dubai...
Ok, some of these countries might be rich, but there is something special in USA which makes people wanna go there and not somewhere else.
UniversalTruth is offline   Reply With Quote
Old 12-Apr-2012, 11:25   #4078
Pressure
Member
 
Join Date: Mar 2004
Posts: 834
Default

Quote:
Originally Posted by UniversalTruth View Post
Yup, European market might be bigger but I think no one considers it as a whole. There are no Pan-European (r)e-tailers (like newegg in USA and Canada).
Maybe that's why all prices of electronic stuff are much lower in the USA, because of the union of the market in North America, and also because there are no small e-tailers (ok, there could exist some, but they are not the only ones) which have small turnover, so they look for bigger profit from a single unit sold.



Also, it's not logical that in most countries around Europe that experience much poorer than in Canada or USA standard of life, everything to be so much more expensive.
The distributors are the same, so yes, the entire European Market is basically one big pot of gold.

Regardless, this is getting off-topic

But anyone in Europe would really argue about the poorer standard of life, when we have public healthcare systems and safety nets for the weakest in the society. Regardless, the graphic cards are the same price before value added taxes are applied.
__________________
Never Argue With An Idiot. They'll Lower You To Their Level And Then Beat You With Experience!
Pressure is offline   Reply With Quote
Old 12-Apr-2012, 11:38   #4079
Davros
Naughty Boy!
 
Join Date: Jun 2004
Posts: 11,075
Default

Quote:
Originally Posted by rpg.314 View Post
Using C means shooting the compiler in the foot and taking charge like a Real ProgrammerTM.
After years of C letting the programmer shoot himself in the foot, this sounds like justice to me
__________________
Guardian of the Bodacious Three Terabytes of Gaming Goodness™
Davros is offline   Reply With Quote
Old 12-Apr-2012, 12:11   #4080
CarstenS
Senior Member
 
Join Date: May 2002
Location: Germany
Posts: 2,965
Send a message via ICQ to CarstenS
Default

*SCNR*
Quote:
Originally Posted by UniversalTruth View Post
Ok, some of these countries might be rich, but there is something special in USA which makes people wanna go there and not somewhere else.
You mean like getting cheapo prices on hardware which in turn is barely available?
__________________
English is not my native tongue. Before flaming please consider the possiblity that I did not mean to say what you might have read from my posts.
Work| Recreation
Warning! This posting may contain unhealthy doses of gross humor, sarcastic remarks and exaggeration!
CarstenS is offline   Reply With Quote
Old 12-Apr-2012, 15:34   #4081
Gipsel
Senior Member
 
Join Date: Jan 2010
Location: Hamburg, Germany
Posts: 1,450
Default

Quote:
Originally Posted by UniversalTruth View Post
Ok, some of these countries might be rich, but there is something special in USA which makes people wanna go there and not somewhere else.
And I know a few physicists searching for a job here to come back from the US to Germany. Two years ago I could have gone to California too, but I didn't want. I would not claim my sample size is representative.
Quote:
Originally Posted by CarstenS View Post
You mean like getting cheapo prices on hardware which in turn is barely available?
Gipsel is offline   Reply With Quote
Old 12-Apr-2012, 16:28   #4082
tekyfo
Registered
 
Join Date: Apr 2012
Posts: 4
Default

Quote:
Originally Posted by UniversalTruth View Post
but there is something special in USA which makes people wanna go there and not somewhere else.
Maybe it's humility?
tekyfo is offline   Reply With Quote
Old 12-Apr-2012, 16:57   #4083
trinibwoy
Meh
 
Join Date: Mar 2004
Location: New York
Posts: 9,966
Default

This is way OT but....

I'm neither from the US or Europe but in my travels people from Europe talk way more about wanting to go to the US than the other way around.
__________________
What the deuce!?
trinibwoy is online now   Reply With Quote
Old 12-Apr-2012, 18:21   #4084
Gipsel
Senior Member
 
Join Date: Jan 2010
Location: Hamburg, Germany
Posts: 1,450
Default

Quote:
Originally Posted by trinibwoy View Post
This is way OT but....

I'm neither from the US or Europe but in my travels people from Europe talk way more about wanting to go to the US than the other way around.
I agree about the way OT, but I have to mention that we have (of course ) official statistics about such stuff in Germany from the "Statistisches Bundesamt" (Federal Statistics Office). As a net effect, we had more immigration than emigration in the last few years, also from the US (but that is almost even). Only if you count just German citizens slightly more leave than coming back (and as said, including all people irrespective of their citizenship coming from the USA we arrive almost at a net zero).
By the way, the most attractive target for German emigrants is Switzerland!

And now back to topic!

Does anybody got some more insight into the question I asked here?
I mean, if nV takes the data locality issue serious, they should pin warps to a certain vALU (or actually a set of vALU, SFUs and L/S units) in roughly the same way as GCN does it with pinning its wavefronts to a certain vALU.
Gipsel is offline   Reply With Quote
Old 12-Apr-2012, 18:21   #4085
Lightman
Senior Member
 
Join Date: Jun 2008
Location: Torquay, UK
Posts: 1,156
Default

Quote:
Originally Posted by trinibwoy View Post
This is way OT but....

I'm neither from the US or Europe but in my travels people from Europe talk way more about wanting to go to the US than the other way around.

Continuing way OT ...

I call it Hollywood effect


Back on topic:
I was wondering why GK104 is slower in BitCoin mining than GF110. I know this workload is purely integer, yet still it seems odd new GPU is 20-30% slower in both OpenCL and CUDA miners (including CUDA miner compiled using 4.2 toolkit).

average numbers:
110MH/s (GTX680) vs 140MH/s (GTX580)
Lightman is offline   Reply With Quote
Old 12-Apr-2012, 19:09   #4086
fellix
Senior Member
 
Join Date: Dec 2004
Location: Varna, Bulgaria
Posts: 3,014
Send a message via Skype™ to fellix
Default

Quote:
Originally Posted by Gipsel View Post
PS:
If they didn't have a similar mistake in that slides as during the Fermi presentation, the total register space is the same as with GF100/GF110 (2 MB), so really tiny compared to Tahiti (8 MB). I have a hard time believing that number, considering the similarity of the ALU count of GK104 and Tahiti. I would expect double the value given in that slide (4 MB), i.e. 512 kB per SMX or 128kB per Scheduler.
Tahiti can support much larger number of concurrent threads. I don't think the RF size in GK104 is particularly lacking in that relation. The number of warps per SMX is more troubling, and the consequences for the memory access latency hiding -- which takes us back to the question of how the new SW scheduling will deal with data locality and dependencies.
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic.
Microsoft: Russia -- Big and bloated.
Linux: EU -- Diverse and broke.
fellix is offline   Reply With Quote
Old 12-Apr-2012, 19:43   #4087
Man from Atlantis
Member
 
Join Date: Jul 2010
Location: Istanbul
Posts: 728
Default

Quote:
Originally Posted by Lightman View Post
I was wondering why GK104 is slower in BitCoin mining than GF110. I know this workload is purely integer, yet still it seems odd new GPU is 20-30% slower in both OpenCL and CUDA miners (including CUDA miner compiled using 4.2 toolkit).

average numbers:
110MH/s (GTX680) vs 140MH/s (GTX580)
+
it's slower on GPC OpenCL benchmark too

Code:
              GTX 580   GTX680
SHA-1 Hash     571.0     471.9
__________________
SiS 6326 > Ti 4200 > 9800XT > 9800GT > GTX 460
Celeron 366 > Celeron 1700 > Athlon XP 2500+ > E6300 > Q9650
Man from Atlantis is offline   Reply With Quote
Old 12-Apr-2012, 20:50   #4088
tunafish
Member
 
Join Date: Aug 2011
Posts: 406
Default

Quote:
Originally Posted by Lightman View Post
I was wondering why GK104 is slower in BitCoin mining than GF110. I know this workload is purely integer, yet still it seems odd new GPU is 20-30% slower in both OpenCL and CUDA miners (including CUDA miner compiled using 4.2 toolkit).
Bitcoin is basically all shifts. Perhaps the shift hardware is not as good in Kepler? (It certainly has no reason to be as good for gaming loads.)
tunafish is offline   Reply With Quote
Old 12-Apr-2012, 23:34   #4089
Lightman
Senior Member
 
Join Date: Jun 2008
Location: Torquay, UK
Posts: 1,156
Default

Quote:
Originally Posted by tunafish View Post
Bitcoin is basically all shifts. Perhaps the shift hardware is not as good in Kepler? (It certainly has no reason to be as good for gaming loads.)
Granted, but results are closer to the level of GTX560Ti which GK104 doubles in almost every aspect.
It will be interesting to see if big Kepler will bring any improvements in these tasks or not.

Last edited by Lightman; 12-Apr-2012 at 23:39.
Lightman is offline   Reply With Quote
Old 13-Apr-2012, 21:18   #4090
Bludd
Eric the Half-a-bee
 
Join Date: Oct 2003
Location: The cat detector van from the Ministry of Housinge
Posts: 2,131
Default

Quote:
Originally Posted by Lightman View Post
Granted, but results are closer to the level of GTX560Ti which GK104 doubles in almost every aspect.
It will be interesting to see if big Kepler will bring any improvements in these tasks or not.
Why can't a new driver help with this?
Bludd is offline   Reply With Quote
Old 14-Apr-2012, 10:19   #4091
ahu
Junior Member
 
Join Date: Jul 2008
Posts: 51
Default

Quote:
Originally Posted by Bludd View Post
Why can't a new driver help with this?
A new driver won't help much here, as the integer performance is severely handicapped compared to GTX 580 and even to GTX 560. According to CUDA C Programming Guide version 4.2, 32-bit integer shifts and compares have only 1/24 of the throughput of the 32-bit FMA. That would put the GTX 680 at around 1/6 of the GTX 580 throughput in those operations. The other integer operations aren't quite that slow though.
ahu is offline   Reply With Quote
Old 15-Apr-2012, 04:32   #4092
Exophase
Senior Member
 
Join Date: Mar 2010
Location: Cleveland, OH
Posts: 1,937
Default

Quote:
Originally Posted by ahu View Post
A new driver won't help much here, as the integer performance is severely handicapped compared to GTX 580 and even to GTX 560. According to CUDA C Programming Guide version 4.2, 32-bit integer shifts and compares have only 1/24 of the throughput of the 32-bit FMA. That would put the GTX 680 at around 1/6 of the GTX 580 throughput in those operations. The other integer operations aren't quite that slow though.
Seems like the compiler would then want to examine the potential to favor integer MADDs over left shifts, since they have 4x better throughput (and can pair with ADDs, being an alternative to a useful left shift + insert operation). It's not intuitive, but I've actually done that sort of thing in NEON code.
Exophase is offline   Reply With Quote
Old 17-Apr-2012, 00:06   #4093
Sxotty
Senior Member
 
Join Date: Dec 2002
Location: Under a Crushing Burden
Posts: 4,355
Default

Quote:
Originally Posted by trinibwoy View Post
This is way OT but....

I'm neither from the US or Europe but in my travels people from Europe talk way more about wanting to go to the US than the other way around.
I always assumed you were, guess it true what they say.
__________________
You bought horse armor didn't you?
Sxotty is offline   Reply With Quote
Old 17-Apr-2012, 13:38   #4094
tunafish
Member
 
Join Date: Aug 2011
Posts: 406
Default

Quote:
Originally Posted by Exophase View Post
Seems like the compiler would then want to examine the potential to favor integer MADDs over left shifts, since they have 4x better throughput (and can pair with ADDs, being an alternative to a useful left shift + insert operation). It's not intuitive, but I've actually done that sort of thing in NEON code.
But the best int MADD with high throughput you can probably get is for 24-bit, and the problem that btc deals with really likes 32-bit int shifts.

The fastest implementation you could build is probably splitting the 32-bit words into 16-bit ones, but then you are going to at least quadruple the amount of ops, and double the amount of state per thread. The state is probably the bigger hit.

I have actually always been a bit puzzled as to exactly why AMD gpus are as good at 32-bit shifts as they are. There really isn't any use that justifies the expenditure outside crypto. Is AMD the main supplier to NSA or something?
tunafish is offline   Reply With Quote
Old 17-Apr-2012, 14:40   #4095
Gipsel
Senior Member
 
Join Date: Jan 2010
Location: Hamburg, Germany
Posts: 1,450
Default

Quote:
Originally Posted by tunafish View Post
I have actually always been a bit puzzled as to exactly why AMD gpus are as good at 32-bit shifts as they are. There really isn't any use that justifies the expenditure outside crypto. Is AMD the main supplier to NSA or something?
With the bitalign instruction you can do basically bitshifts of 64bit data (but it delivers only 32bits of the result) at full rate on AMD GPUs (since Cypress, R700 generation had only the normal shifts at full rate but was already a huge jump over R600/RV670, where bitshifts executed at 1/5 rate [only in t slot]). You can use this also for full speed rotates. AFAIK nVidia added this instruction with Fermi, too. But maybe it's slower (Executed on SFUs? Implemented only in a part of the vALUs? Only DP/iMUL32 throughput?) or they added it only as a macro consisting of multiple native instructions, no idea. [edit]nV GPUs have only the BFE instruction, not bitalign and also not BFI as Man from Atlantis pointed out[/edit] [edit2]According to the documentation, starting with Fermi they have a BFI instruction, just the bitalign is missing compared to AMD.[/edit2] And don't forget HD 5870 and HD6970 had a higher peak arithmetic performance than GF100/110 either way.

As to the reason, I always thought that bit manipulating instructions are quite cheap, maybe save for the shifts. But AMD obviously thought it was less enough effort to put it in at full speed. Maybe someone can enlighten us, how much a 32bit shift unit costs compared to a FMA?

Last edited by Gipsel; 17-Apr-2012 at 15:13.
Gipsel is offline   Reply With Quote
Old 17-Apr-2012, 14:46   #4096
Man from Atlantis
Member
 
Join Date: Jul 2010
Location: Istanbul
Posts: 728
Default

There is some talk about nvidia's lack of BFI_INT and int rotate functions that makes them significantly slower than AMDs..
__________________
SiS 6326 > Ti 4200 > 9800XT > 9800GT > GTX 460
Celeron 366 > Celeron 1700 > Athlon XP 2500+ > E6300 > Q9650
Man from Atlantis is offline   Reply With Quote
Old 17-Apr-2012, 15:30   #4097
Gipsel
Senior Member
 
Join Date: Jan 2010
Location: Hamburg, Germany
Posts: 1,450
Default

Quote:
Originally Posted by Man from Atlantis View Post
There is some talk about nvidia's lack of BFI_INT and int rotate functions that makes them significantly slower than AMDs..
I guess GK104 will absolutely stink at bitcoin or cryptographic stuff if there is no hidden magic coming to the rescue. I've just seen in nV's documentation, that 32bit integer shifts are supposed to run only at a 1/24 rate (8 operations per clock cycle per SMX, same as double precision). For some extremely strange reason, it is slower than 32bit integer multiplication (1/6 rate), so one could try to exchange it with multiplications when possible.
Comparing GF100/110 : GF104/114 : GK104 per clock cycle (the whole CPU and taking the hotclock for Fermi), the instruction issue rates for 32bit integer shifts relate as 4:2:1 and 32bit integer multiplies 2:1:2, making this stuff really slow on GK104 (and you still have to consider the lower clock speed of Kepler), shifts are only about 1/3 of the speed of a GF114! Only integer adds are significantly faster.
Gipsel is offline   Reply With Quote
Old 17-Apr-2012, 18:49   #4098
tunafish
Member
 
Join Date: Aug 2011
Posts: 406
Default

Quote:
Originally Posted by Gipsel View Post
Maybe someone can enlighten us, how much a 32bit shift unit costs compared to a FMA?
Bit shift units, especially ones that can operate at 4-cycle latency, are really, really cheap compared to FMA. Basically, having a name for the instruction and the paths to send operands to it are going to be more expensive than the actual shift hardware.

The thing is, throughput loads that use shifts don't really exist outside crypto. Which makes the existence of the instruction strange. I find it entirely believable that AMD added it for a single client.
tunafish is offline   Reply With Quote
Old 17-Apr-2012, 19:15   #4099
Gipsel
Senior Member
 
Join Date: Jan 2010
Location: Hamburg, Germany
Posts: 1,450
Default

Quote:
Originally Posted by tunafish View Post
The thing is, throughput loads that use shifts don't really exist outside crypto. Which makes the existence of the instruction strange. I find it entirely believable that AMD added it for a single client.
As said above, AMD made the shifts fast already with the R700 generation (RV770 was doing shifts ~12 times as fast as RV670), Cypress only added to it by enabling also full speed rotates and simplifying shifts of wider data with the bitalign instruction. If there would have been a specific customer, I guess they would have it done it only for RV770, not for the entire line (DP was also RV770 only). I think they did it mainly because it was cheap and even simplifies some things because everything (the ALUs) gets more symmetric.
Gipsel is offline   Reply With Quote
Old 17-Apr-2012, 20:45   #4100
DarthShader
Member
 
Join Date: Jul 2010
Location: Land of Mu
Posts: 350
Default

Quote:
Originally Posted by Gipsel View Post
I guess GK104 will absolutely stink at bitcoin or cryptographic stuff if there is no hidden magic coming to the rescue.
No need to guess, Kaotik posted some GPGPU benchmarks over in the 7970 thread, that includes results from a bitcoin miner:

http://muropaketti.com/artikkelit/na...md-vs-nvidia,2 (2nd benchmark)

It isn't on "absolute stink" level, but still disappointing. If GK110 adds the missing instruction on full speed, it should beat a 7970.
DarthShader is offline   Reply With Quote

Reply

Tags
kepler, wait for it

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 15:35.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.