Linux GPU Compute Issues *fork*

dirtyb1t

Newcomer
I don't think there is a feasible way to differentiate miners from gamers and system builders - shortages in the retail would affect all these categories, as all of them can order box or OEM package and small or large quantities.

The problem is, shortages would further reduce AMD's installed base, as the only guaranteed way to get a new mid- and top-range Radeon graphics card would be ordering an expensive custom-built PC from a large OEM builder. Then developers will start defecting to the NVidia camp, first making runtime optimizations for GeForce hardware then switching to proprietary CUDA and PhysX platforms.
I bought an RX Vega 56 for OpenCL compute. Given the horrible lack of communication as to when they will have prominent and working drivers/dev tools for linux and FP16 support, it's going up for sale on Ebay as we speak. It will be a long time before I ever consider AMD's GPUs again.
 
Why would you ever start out development in Linux when the target platform for hardware seems to be Windows? Was there some event I missed like the Year of Linux desktops that would have changed the situation?
 
Why would you ever start out development in Linux when the target platform for hardware seems to be Windows? Was there some event I missed like the Year of Linux desktops that would have changed the situation?
I've never in my professional or personal life used Windows for any serious development nor know anyone who does.
Most if not all serious GPU compute work is performed on Linux. Most frameworks/libraries aren't even available or supported on Windows. The event was GPU compute which is linux centric. I'm not speaking of graphics rendering, I'm speaking of things like Deep learning/Data analytics. Most frameworks/libraries are strictly linux only. I had faith AMD would get their act together and bought a Vega giving them 6 months to do so .. time's up so this card is getting sold. I wanted to avoid sticking to proprietary CUDA but it simply works. With basic GPU drivers being unstable and advertised features unsupported with no communication whatsoever as to when they will if it all, this is a dead investment for me beyond faith. I really find it deplorable that they sold this card based on advertised features that still aren't functional many months later.
 
I bought an RX Vega 56 for OpenCL compute. Given the horrible lack of communication as to when they will have prominent and working drivers/dev tools for linux and FP16 support, it's going up for sale on Ebay as we speak. It will be a long time before I ever consider AMD's GPUs again.


So.. What you mean to say is you made a purchase with little to no research about how much that product would cover your needs and you blame AMD for it.

And despite you probably selling said product on ebay with a profit of at least 100% over the base price (or you could mine and pay the card in 4-5 months), you're somehow saying you got the short end of the stick, so you won't buy more products from AMD.


I guess some people are just terribly hard to please.






EDIT:
And since you decided to spam the other thread with the same thing, it might be relevant to mention also here that ROC has supported dual-throughput FP16 in Vega since October.

It seems that FP16 limitation might not be that urgent to you, since you managed to miss it by 3 months.
Did you lose more time complaining about it than making a google search?
 
Last edited by a moderator:
So.. What you mean to say is you made a purchase with little to no research about how much that product would cover your needs and you blame AMD for it.

And despite you probably selling said product on ebay with a profit of at least 100% over the base price (or you could mine and pay the card in 4-5 months), you're somehow saying you got the short end of the stick, so you won't buy more products from AMD.


I guess some people are just terribly hard to please.






EDIT:
And since you decided to spam the other thread with the same thing, it might be relevant to mention also here that ROC has supported dual-throughput FP16 in Vega since October.

It seems that FP16 limitation might not be that urgent to you, since you managed to miss it by 3 months.
Did you lose more time complaining about it than making a google search?
First off, I had to hunt daily to get the card in stock then I paid over MSRP for a card w/ a silly bundle that was stated would be sold for MSRP. I had no guarantee that I would get more for the card than what I paid. This is arrived at by a completely unexpected cut in production of the card from AMD whereby you literally can't purchase these cards anymore. So there's that.

2nd off, these features were marketed as functional not in-development beta... and since I spent some time lurking on forums and refreshing release update information, I exhausted my patience of patiently waiting to see whether or not FP16 was supported.

3rdly, I google for about an hour to see whether FP16 was enabled for openCL on vega. I turned up nothing. Try doing it yourself.

4thly, have you personally installed this ROCm release, proof'd its stability and the functionality of FP16? I've done so for a number of vender's drivers/sdks only to find it still has flukes/issues after such time invested.

So, I have every reason to be salty about patiently following up on a card I paid over MSRP for over the course of months before having to dump it on ebay... You think its good that someone has to post a question on your company's github repository about a feature literally stated on a spec sheet as to whether or not it functions?
Oh yeah dude it was functional as of release 1.2.3.a.b.c ..
Where is the broad based announcement that they finally got a prominent feature working that they marketed?
* Buys car
* Q : Does the car finally do over 60mph like it was marketed?
* A : Oh yeah dude, we got that working in ECU update 1.13.467...


Come on man... I've seen a company forklift millions of dollars of hardware out of their data center and sever ties with a vendor for less... which is what I'm on the verge of doing (on a much more insignificant scale) in less than 24hours. Also, I am going to post back as to whether or not this software installs w/o Error and if FP16 actually even works at 2x the performance rating that was stated... I highly doubt it which will ensure I accept my pending offer.
 
Why would you ever start out development in Linux when the target platform for hardware seems to be Windows? Was there some event I missed like the Year of Linux desktops that would have changed the situation?

You mean like Doom being developed on NeXTStep? :)
 
Why would you ever start out development in Linux when the target platform for hardware seems to be Windows?
Why would you ever start out development in Windows on compute when the target platform seems to be Linux?
 
Why would you ever start out development in Windows on compute when the target platform seems to be Linux?

Because you picked the wrong hardware; If you pick the AMD Hardware then you would see where they have better support, which seems to be Windows?

To logically approach the problem you need to pick a starting point. It can be either be the development platform or it can be the hardware platform. Pick one and then follow it through. If you pick Linux Development Platform then you'll likely end up with Nvidia. If you pick AMD Hardware then you'll likely end up with Windows OS. To do it the other way is illogical.
 
AMD are supposed to have way better Vega support from Linux 4.15 and onward but maybe that is not good enough for the original poster?
 
First off, I had to hunt daily to get the card in stock then I paid over MSRP for a card w/ a silly bundle that was stated would be sold for MSRP.
And still you're probably off to sell it for some $400-500 more than what you paid for originally.
So please tell me how wronged you were here.


2nd off, these features were marketed as functional not in-development beta...
Excuse me? Please do show an official statement from AMD saying "Please buy this $400 ERP gaming card so you can use it for deep learning, inference and other professional activities. And BTW forget everything about Mi25 that costs 10x more. Just go with this cheap gaming version instead #wink wink#"

3rdly, I google for about an hour to see whether FP16 was enabled for openCL on vega. I turned up nothing. Try doing it yourself.
Dude I literally did it myself in my response to your post. It's right there.
And it took me less than 5 minutes. What did you search on google for an entire hour?!


4thly, have you personally installed this ROCm release, proof'd its stability and the functionality of FP16? I've done so for a number of vender's drivers/sdks only to find it still has flukes/issues after such time invested.
I didn't nor I intend to, but others have. Perhaps this isn't AMD's fault.



So, I have every reason to be salty
That's your opinion, yes. I don't think you do, so we'll just agree to disagree.



You think its good that someone has to post a question on your company's github repository about a feature literally stated on a spec sheet as to whether or not it functions?
You mean AMD has people providing support for enabling an expensive feature for a cheap gaming card, for which they have a much more expensive card (MI25) to cater to that market. They actually have someone helping you make them less money. And you think that's bad.
What else are you outraged about?


if FP16 actually even works at 2x the performance rating that was stated...
It's never going to be fixed 2x performance. It's Rapid Packed Math within the same 32bit ALU, they don't have another set of FP16 ALUs (like e.g. PowerVR solutions AFAIK).
2xFP16 is only max theoretical throughput as it needs to be the exact same operation.
 
AMD are supposed to have way better Vega support from Linux 4.15 and onward but maybe that is not good enough for the original poster?
I have been paying close attention to the developments of Vega before it was released. I purchased it within a month of its availability and I have been patiently waiting and watching for developments to allow me to reasonably pursue development with it. I am aware of the upcoming offerings... Sadly it takes subscribing to 5-8 third party forums/websites to get a solid handle on what's coming/planned.. when if ever certain marketed features will be functional. This should be prominently detailed by AMD's Radeon group themselves not a slew of third party websites/forums/Github Q&A. https://www.phoronix.com/scan.php?page=home has had excellent coverage so I am well aware of what's in the pipe.. The thing is it keeps slowly dripping out and even then ends up falling under expectations/promises. I still find things to be far too underdeveloped, unknown, and unstated for my development needs/timeline. I've invested a considerable amount of money in various hardware as of late only to find spec sheet features undeveloped, non-functioning, only available in special/ridiculous configs/conditions, and/or buggy. Even when this doesn't occur, anyone whose doing considerable development in linux with varied hardware/frameworks/libraries will tell you of the absolute nightmare it is to get everything installed/functioning/stable even when you have years of development behind a certain package. Broken kernel modules, segfaults, mismatched dependencies, you name it. Kernel 4.15 might break other frameworks/packages. Drivers, software, and hardware might be incompatible. The OS version might be unsupported by prominent frameworks. It's a nightmare trying to stabilize a bleeding edge dev environment.

Dev work isn't done in a vacuum .. has hardware consideration cutoffs/etc which is why you must get things functioning at or near launch as promised. So, I guess it hit a breaking point for me at the close of 2017. Sorry for the tone of my posts but there was a great deal of time/emotion invested. I had high hopes for Vega and was going to extend a considerable number of purchases of it beyond just an eval card. Then out of nowhere Radeon cuts production and there are no AIB cards :
> https://www.nowinstock.net/computers/videocards/amd/rxvega64/
> https://www.nowinstock.net/computers/videocards/amd/rxvega56/
You can't find a single card anywhere... So, on top of all of the software/driver issues, without a single bit of communication/warning you can't find a single card.

I really wish AMD's Radeon group the best, I am loaded down with AMD products currently and have no ill will towards them. I suffered through months of the famed AMD Ryzen Segfault issue. Same feeling of Vega.. No communication as to what the hell was going on.. 1000s of posts from the community debugging the issue. No broad announcement... You have to be keyed in to 5-8 third party sites to know of it. This resulted in a paper weight for months until it was RMA'd and resources dedicated to it not being resolved. All of these factors combined made my head spin.. I have no clue if Vega compute will be reasonable in 3,6,9 months to a years... Even after that, you're talking about years of API/SDK catchup. So, I thought it through deeply. I spent tons of times following this card and its developments. I also put my money where my mouth was and bought the card. There shouldn't be 254 page posts trying to figure out the status of an in-release product. This will have lasting effects on my consideration of RTG as a platform. I sold the card overnight after complications with ROCM. I am converting the entirety of my dev efforts to their competitor. Maybe this changes down the road when they get their act together and learn how to communicate better with their customers.
 
And still you're probably off to sell it for some $400-500 more than what you paid for originally.
So please tell me how wronged you were here.
Time is money.. I've wasted far more time on this than the pennies that are gained because AMD, out of nowhere, cut production of the card. I'm not a person whose aim was to buy a single GPU to play with. I invested a considerable amount of time following up on and tracking an eval card that I intended to purchase a considerable quantity for related to a medium/large scale development effort.

$200 is peanuts and that's about the extent of what you're going to get on markup.. You wont be making $400/$500 more off of a vega that hardly anyone got for MSRP. I understand you're attempting to belittle the magnitude of my complaint by centering on a trivial discussion about peanuts made by a miraculous markup sale because the cards literally can't be purchased anywhere. This disaster could have easily gone the opposite way and I'd be in the red + waste time.

Excuse me? Please do show an official statement from AMD saying "Please buy this $400 ERP gaming card so you can use it for deep learning, inference and other professional activities. And BTW forget everything about Mi25 that costs 10x more. Just go with this cheap gaming version instead #wink wink#"
Maybe someone's trying to offer a wider audience of people something incredible and it required testing their stack on consumer level hardware that AMD made official statements about and marketed to high hell considering compute performance. You're making and taking quite the stance here with your obviously targeted comments... Sadly, you're bold assumptions are horribly incorrect. I'm guessing principals and decency are only afforded to 10x more cost customers? Dually noted.

Dude I literally did it myself in my response to your post. It's right there.
And it took me less than 5 minutes. What did you search on google for an entire hour?!
I'm not going to indulge your foolishness any further. There's a reason why there are 1000s of pages of rambling on their official dev forum, github, and many other prominent sites as to the state of Vega. "It's right here" and the card was sold "right there".

I didn't nor I intend to, but others have. Perhaps this isn't AMD's fault.
A horrible lack of communication about a product is always the company's fault.
I'm not sure what industry you work in but in the computing industry, development is already costly.
Time is money and patience wears thin with broken promises and an absolute lack of clear detailed communication.
One pissed off individual who overseas purchasing decisions due to a company's lack of detailed communication can spell the termination of you being considered for years to come. I like to evaluate how a company treats its 1/10th cost customers sometimes beyond the pro-market especially when I will potentially rely on that 1/10th cost to run my solution. I make the decisions for purchasing and I've just cut their whole product line from consideration and wont be re-evaluating this or developing for it for some time to come. Meanwhile, you can form your own opinion as to whose fault it is.


That's your opinion, yes. I don't think you do, so we'll just agree to disagree.
I created a post and stated mine. You stated yours. My decision has been made. Thanks for sharing your extended views.


You mean AMD has people providing support for enabling an expensive feature for a cheap gaming card, for which they have a much more expensive card (MI25) to cater to that market. They actually have someone helping you make them less money. And you think that's bad.
What else are you outraged about?
You have an opinion of the nature of my commentary. It's wrong. Thanks for sharing it.
Apologies for giving consideration to lowly consumers who have 1/10th my budget.

It's never going to be fixed 2x performance. It's Rapid Packed Math within the same 32bit ALU, they don't have another set of FP16 ALUs (like e.g. PowerVR solutions AFAIK).
2xFP16 is only max theoretical throughput as it needs to be the exact same operation.
And yet there's no prominent real world performance detailed from them yet I can find it from their competitors.


Imagine for a second you're the company that makes these cards...someone was about to buy 10,000 cards from you but bought an eval card first. You don't know who they are. You don't know their budget. You don't know if they are one of your top pro-line card purchasers and you talk to them in the manner you did by trying to deflect away from your company's short comings and call them a cheapskate..

Have a good one.
 
I have been paying close attention to the developments of Vega before it was released. I purchased it within a month of its availability and I have been patiently waiting and watching for developments to allow me to reasonably pursue development with it. I am aware of the upcoming offerings... Sadly it takes subscribing to 5-8 third party forums/websites to get a solid handle on what's coming/planned.. when if ever certain marketed features will be functional. This should be prominently detailed by AMD's Radeon group themselves not a slew of third party websites/forums/Github Q&A.

AMD does not decide what goes into Linux, Linus Thorvalds does. The initial Vega driver did not fulfill his requirements and AMD had to do further work on it. AMD can't really plan, they can only contribute code and hope that it will be included (and it was after 6 months of cleanup in Linux 4.15).
 
OpenCL FP16 seems to be working with the current drivers under Linux (not Windows) btw. (ran that a few days ago): https://pastebin.com/cBcwx4WJ

Edit: That's the output of FlopsCL, build with FP16-support.

That's the output under Windows for comparison (it usually doesn't support Windows, so I had to make a few changes to build it, so I may have done something wrong, but I doubt it): https://pastebin.com/nVin9Jxd
 
Last edited by a moderator:
AMD does not decide what goes into Linux, Linus Thorvalds does. The initial Vega driver did not fulfill his requirements and AMD had to do further work on it. AMD can't really plan, they can only contribute code and hope that it will be included (and it was after 6 months of cleanup in Linux 4.15).
I understand that and I greatly appreciate what they're trying to do. At some point down the road when things are at a stage in which comfortable and stable development can be pursued, maybe I'll re-evaluate my decision. As of now, it's a wait and see and I got tired of waiting.. 4.14 becomes 4.15 and now rumors are 4.16 and 4.17 :
https://www.phoronix.com/scan.php?page=news_item&px=AMDGPU-Linux-4.16-Round-2
With all sorts of questionable bits sliding even further out. Notably, when you're dealing with multiple pieces of hardware, software, and frameworks.. you can't dynamically slide your dev environment around and still have questionable support. There is a claim they are seriously pursuing compute related GPU tasks. Nvidia for instance has solid drivers for compute that are rock solid plus tons of dev frameworks and even then it can be a development nightmare at times... I didn't want to imagine what kind of hell I would be in for w/ this rolling goalpost

OpenCL FP16 seems to be working with the current drivers under Linux (not Windows) btw. (ran that a few days ago): https://pastebin.com/cBcwx4WJ

Edit: That's the output of FlopsCL, build with FP16-support.

That's the output under Windows for comparison (it usually doesn't support Windows, so I had to make a few changes to build it, so I may have done something wrong, but I doubt it): https://pastebin.com/nVin9Jxd

[float ] Time: 0.004786s, 14357.77 GFLOP/s
[float16 ] Time: 0.076080s, 14452.10 GFLOP/s
Meanwhile the marketed numbers were :
It'll offer peak single-precision performance of ~13 TFLOPS and peak half-precision performance of 26 TFLOPS.
The above examples that single and half precision are the the same? Am I interpreting this wrong? So were back to the enigma that has been present since June that people dedicate pages of threads discussing yet no official support launch information confirming what's going on.
 
Look at the "half" numbers edit: under Linux. They seem in line with what to expect.
 
AMD does not decide what goes into Linux, Linus Thorvalds does. The initial Vega driver did not fulfill his requirements and AMD had to do further work on it. AMD can't really plan, they can only contribute code and hope that it will be included (and it was after 6 months of cleanup in Linux 4.15).
What about AMD's proprietary driver?
Shouldn't it be faster, more stable and up-to-date compared to the opensource one?
Does it work well with the relevant compute and AI software stacks?
 
[float ] Time: 0.004786s, 14357.77 GFLOP/s
[float16 ] Time: 0.076080s, 14452.10 GFLOP/s
.

Float is float, double is double, and half is half. The numbers mean the number of components in the vector.

Linux:

[half8 ] Time: 0.020113s, 27333.33 GFLOP/s
[float8 ] Time: 0.040722s, 13500.05 GFLOP/s
[double8 ] Time: 0.630167s, 872.40 GFLOP/s

Windows:

[half8 ] Time: 0.038200s, 14391.46 GFLOP/s
[float8 ] Time: 0.038084s, 14435.29 GFLOP/s
[double8 ] Time: 0.603230s, 911.35 GFLOP/s

Rapid Packed Math working under Linux, but not under Windows, with OpenCL (the slight difference in performance may be due to power profile, etc.).
 
What about AMD's proprietary driver?
Shouldn't it be faster, more stable and up-to-date compared to the opensource one?
Does it work well with the relevant compute and AI software stacks?

I believe that AMD is phasing out its closed source Linux driver and will only have the driver include in "standard" Linux.
 
Back
Top