Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 06-Feb-2012, 07:09   #1
pcchen
Moderator
 
Join Date: Feb 2002
Location: Taiwan
Posts: 2,348
Default Microsoft published C++ AMP spec

Microsoft published an open spec of C++ AMP (Accelerated Massive Parallelism), which is implemented in Visual Studio 11.

Spec here (PDF)
pcchen is offline   Reply With Quote
Old 06-Feb-2012, 13:07   #2
Dade
Member
 
Join Date: Dec 2009
Posts: 171
Default

Quote:
Originally Posted by pcchen View Post
Microsoft published an open spec of C++ AMP (Accelerated Massive Parallelism), which is implemented in Visual Studio 11.
They really seems to promote an open standard:

Copyright License. Microsoft grants you a license under its copyrights in the specification to (a) make copies of this specification to develop your implementation of this specification, and (b) distribute portions of this specification in your implementation or your documentation of your implementation.

(even if the following part where the license talks about patents isn't totally clear to me).

Writing an implementation over OpenCL looks like a straightforward process.
Dade is offline   Reply With Quote
Old 06-Feb-2012, 14:07   #3
pcchen
Moderator
 
Join Date: Feb 2002
Location: Taiwan
Posts: 2,348
Default

Quote:
Originally Posted by Dade View Post
(even if the following part where the license talks about patents isn't totally clear to me).
IANAL but it seems to me that they are saying is: we promise not to sue you for any patent infringement w.r.t. this spec if you don't sue us

It's really important for this kind of things to be open if it really wants to be successful.
pcchen is offline   Reply With Quote
Old 06-Feb-2012, 14:33   #4
AlexV
Heteroscedasticitate
 
Join Date: Mar 2005
Posts: 2,354
Default

That's how I read it too, under the same restriction. Pretty sure a lawyer would find a way to twist that but

On a separate note, I like AMP quite a bit more than I like OpenCL, in spite of it being currently more limited. It remains to be seen if any partie other than Microsoft will pick up the compiler writing mantle though. Some g++amp.exe would be useful...
__________________
Donald Knuth: Science is what we understand well enough to explain to a computer. Art is everything else we do.
AlexV is offline   Reply With Quote
Old 06-Feb-2012, 15:47   #5
Davros
Darlek ******
 
Join Date: Jun 2004
Posts: 9,497
Default

Is that like the patent protection you can buy from microsoft

"Now you have the option to acquire Xandros Desktop offerings together with Microsoft patent assurance. This assurance enables you to use Xandros Desktop software with confidence. This program is available for $50. Learn more by reading Microsoft's covenant."

http://www.microsoft.com/about/legal...t/xandros.aspx
__________________
Guardian of the Most holy Two Terabytes of Gaming Goodness™
Davros is offline   Reply With Quote
Old 06-Feb-2012, 20:34   #6
Dade
Member
 
Join Date: Dec 2009
Posts: 171
Default

Quote:
Originally Posted by AlexV View Post
On a separate note, I like AMP quite a bit more than I like OpenCL, in spite of it being currently more limited.
AMP looks promising however it doesn't seem exactly a direct OpenCL competitor. It is a bit like comparing Java Vs. Assembler. Ok, may be there isn't such a huge difference but OpenCL exposes a lot of hardware details.

P.S. anyone remembers the old days when C++ compilers were just front-end for translating the code in C ? Having something similar from C++ AMP to OpenCL would be quite useful.
Dade is offline   Reply With Quote
Old 06-Feb-2012, 20:54   #7
AlexV
Heteroscedasticitate
 
Join Date: Mar 2005
Posts: 2,354
Default

Quote:
Originally Posted by Dade View Post
AMP looks promising however it doesn't seem exactly a direct OpenCL competitor. It is a bit like comparing Java Vs. Assembler. Ok, may be there isn't such a huge difference but OpenCL exposes a lot of hardware details.

P.S. anyone remembers the old days when C++ compilers were just front-end for translating the code in C ? Having something similar from C++ AMP to OpenCL would be quite useful.
That ties into the limited aspect. That being said, I'm not that keen on the way OpenCL ends up exposing things (oh look, we're really close to the metal really...only that we're not really that close once one looks at is), and to be honest I have no confidence in its evolutionary path being anything worthwhile.

The whole Khronos boardism means a neverending tug of war between IHVs (just look at how nicely OpenGL did as a comitee driven effort). Apple had the potential to make things right by being the ultimate shepard / vetoer, but they seem utterly incapable to do so. What AMP has going for itself is primarly the same thing that made DX succeed: whilst it is consultation driven, MS ends up calling what happens when and how. Once AMP actually ends up firmly matching the featureset exposed by DirectCompute, I'd be surprised if CL earns anything worthwhile on GPUs. For CPUs it's likely that you may end up getting better performance than whatever WARP gives you, however, to be honest, I'd rather use something like ISPC there, if one doesn't want to get to intrinsincs.

On the other hand, this is a question of taste on my part (IMHO most of the programming language / tool warfare falls into this category, ultimately work can be done with almost anything unless it's hugely bad) so I do apologize if I end up sounding like other programming tool nazis
__________________
Donald Knuth: Science is what we understand well enough to explain to a computer. Art is everything else we do.
AlexV is offline   Reply With Quote
Old 07-Feb-2012, 01:23   #8
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 4,070
Send a message via Skype™ to rpg.314
Default

Promise not to sue != License. If MS really wanted to push AMP as an open standard, then they would have given a license grant predicated on non-litigation.
__________________
The views presented here are my own and not my employer's.
Quote:
Originally Posted by Alexko View Post
So in a nutshell, model [BLANK] will have [BLANK], up to [BLANK], and even [BLANK] for a power consumption of just [BLANK]. Impressive.
rpg.314 is offline   Reply With Quote
Old 07-Feb-2012, 08:05   #9
pcchen
Moderator
 
Join Date: Feb 2002
Location: Taiwan
Posts: 2,348
Default

Quote:
Originally Posted by rpg.314 View Post
Promise not to sue != License. If MS really wanted to push AMP as an open standard, then they would have given a license grant predicated on non-litigation.
Well, Microsoft already promised not to sue anyone for patent infringement over any compliant implementation of this spec. I think that's good enough if all you want is to make an AMP implementation. Providing free license is probably better but I understand that it's probably too much for Microsoft to do (after all, a free license means one may be allowed to use the patent freely for other projects if he has made an implementation of AMP, and that's certainly not what Microsoft meant to do.)
pcchen is offline   Reply With Quote
Old 12-Feb-2012, 20:44   #10
Nick
Senior Member
 
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
Default

I don't see how there's a future for this. Every time the hardware becomes less limited there will be a new version, further fragmenting the software ecosystem. That will only stop when eventually we're be back where we started: C++.
Nick is offline   Reply With Quote
Old 12-Feb-2012, 21:20   #11
Dade
Member
 
Join Date: Dec 2009
Posts: 171
Default

Quote:
Originally Posted by Nick View Post
I don't see how there's a future for this. Every time the hardware becomes less limited there will be a new version, further fragmenting the software ecosystem. That will only stop when eventually we're be back where we started: C++.
In my opinion, C++/C should have native vector types (i.e float4, etc.) and other few new features that we have seen to pop up in OpenCL C/CUDA C/C++ AMP, etc.

It would be useful also for developing classic CPU software (i.e. SSE, AVX, etc.)

So, may be we will go back to square one but I hope with some new feature gained on the way.
Dade is offline   Reply With Quote
Old 13-Feb-2012, 02:35   #12
Nick
Senior Member
 
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
Default

Quote:
Originally Posted by Dade View Post
In my opinion, C++/C should have native vector types (i.e float4, etc.)...
Why? Creating your own vector types is trivial.
Quote:
...and other few new features that we have seen to pop up in OpenCL C/CUDA C/C++ AMP, etc.
Such as?
Quote:
It would be useful also for developing classic CPU software (i.e. SSE, AVX, etc.)
Any auto-vectorizing compiler worth its salt already uses vector instructions. Visual Studio 11 will finally join the ranks too.
Quote:
So, may be we will go back to square one but I hope with some new feature gained on the way.
Aside from adding the 'restrict' keyword to the C++ standard, and perhaps adding a 'vectorize' pragma, I can't think of much that would be useful in the long run.
Nick is offline   Reply With Quote
Old 13-Feb-2012, 02:48   #13
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 4,070
Send a message via Skype™ to rpg.314
Default

Quote:
Originally Posted by Nick View Post
Why? Creating your own vector types is trivial.
Then it should be a part of C++ stdlib.

Quote:
Any auto-vectorizing compiler worth its salt already uses vector instructions. Visual Studio 11 will finally join the ranks too.
Autovectorization is fragile.

Quote:
Aside from adding the 'restrict' keyword to the C++ standard, and perhaps adding a 'vectorize' pragma, I can't think of much that would be useful in the long run.
Lambda's perhaps....

There are a lot of things that C++ could use. If you look outside your own niche, you'll find them useful.
__________________
The views presented here are my own and not my employer's.
Quote:
Originally Posted by Alexko View Post
So in a nutshell, model [BLANK] will have [BLANK], up to [BLANK], and even [BLANK] for a power consumption of just [BLANK]. Impressive.
rpg.314 is offline   Reply With Quote
Old 13-Feb-2012, 05:33   #14
Nick
Senior Member
 
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
Default

Quote:
Originally Posted by rpg.314 View Post
Then it should be a part of C++ stdlib.
Should? How do you determine which composite type should become a standard? And why exactly?
Quote:
Autovectorization is fragile.
It shouldn't be any more fragile than GPGPU.
Nick is offline   Reply With Quote
Old 13-Feb-2012, 08:19   #15
Dade
Member
 
Join Date: Dec 2009
Posts: 171
Default

Quote:
Originally Posted by rpg.314 View Post
Autovectorization is fragile.
Yup, in all my tests, I have always seen the compilers to produce horrible and inefficient code compared to hand written SSE code.

The only GPU compiler that had to really do some sort of autovectorization was one for AMD GPU with code for VLIW ... and one of the reasons they dropped VLIW in HD7xxx was because writing good compilers was really hard, expansive, time consuming, etc.
Dade is offline   Reply With Quote
Old 13-Feb-2012, 08:43   #16
hoho
Senior Member
 
Join Date: Aug 2007
Location: Estonia
Posts: 1,218
Send a message via MSN to hoho Send a message via Skype™ to hoho
Default

Quote:
Originally Posted by rpg.314 View Post
Then it should be a part of C++ stdlib.
Should all trivial things be part of standard library? I don't see a reason for it.
Quote:
Originally Posted by rpg.314 View Post
Lambda's perhaps....
c++11 has them and a metric TON more of awesome stuff
hoho is offline   Reply With Quote
Old 13-Feb-2012, 13:59   #17
Nick
Senior Member
 
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
Default

Quote:
Originally Posted by Dade View Post
Yup, in all my tests, I have always seen the compilers to produce horrible and inefficient code compared to hand written SSE code.
That's all going to change with AVX2. It has vector equivalents of every scalar instruction, so it can trivially parallelize any loop with independent iterations, identical to how GPUs do it.
Nick is offline   Reply With Quote
Old 13-Feb-2012, 14:13   #18
hoho
Senior Member
 
Join Date: Aug 2007
Location: Estonia
Posts: 1,218
Send a message via MSN to hoho Send a message via Skype™ to hoho
Default

Quote:
Originally Posted by Dade View Post
Yup, in all my tests, I have always seen the compilers to produce horrible and inefficient code compared to hand written SSE code.
Could be a stupid question but did you use intrinsics for the hand-written SSE or straight assembly calls? I know that at least with later versions of GCC writing code with intrinsics it'll be hard to beat the compiler in generated code efficiency. Obviously hoping it can figure out to use the SSE instructions itself is a whole different matter
hoho is offline   Reply With Quote
Old 13-Feb-2012, 14:31   #19
Dade
Member
 
Join Date: Dec 2009
Posts: 171
Default

Quote:
Originally Posted by hoho View Post
Could be a stupid question but did you use intrinsics for the hand-written SSE or straight assembly calls?
Intrinsics, for instance to do the intersection of a ray with 4 triangles in a single shot, ray/4xBounding box, etc. GCC can not even start to figure out how to auto-vectorize the code written in plain C++. Indeed, it isn't really a GCC fault, you need native float4 data type to write something where SSE/AVX can be used.

Side note: it is also noticeable how hard is to read the code written with intrinsics compared to something written with native float4 (for instance with OpenCL C). From my point of view, this alone, could be seen as a good reason to introduce native vector data types.
Dade is offline   Reply With Quote
Old 13-Feb-2012, 19:25   #20
RecessionCone
Member
 
Join Date: Feb 2010
Posts: 170
Default

Quote:
Originally Posted by Nick View Post
That's all going to change with AVX2. It has vector equivalents of every scalar instruction, so it can trivially parallelize any loop with independent iterations, identical to how GPUs do it.
What is the AVX2 vector equivalent of a store instruction? AFAIK, it doesn't exist.

AVX2 is a step forward, and will help vectorizing compilers, but it still isn't as good a compile target as either GPUs or Larrabee.
RecessionCone is offline   Reply With Quote
Old 13-Feb-2012, 21:05   #21
Nick
Senior Member
 
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
Default

Quote:
Originally Posted by Dade View Post
Side note: it is also noticeable how hard is to read the code written with intrinsics compared to something written with native float4 (for instance with OpenCL C). From my point of view, this alone, could be seen as a good reason to introduce native vector data types.
Nah, just write your own vector class and use inline operators with intrinsics. Same performance, much cleaner code.
Nick is offline   Reply With Quote
Old 13-Feb-2012, 21:12   #22
Ethatron
Member
 
Join Date: Jan 2010
Posts: 375
Default

Quote:
Originally Posted by Dade View Post
Yup, in all my tests, I have always seen the compilers to produce horrible and inefficient code compared to hand written SSE code.

The only GPU compiler that had to really do some sort of autovectorization was one for AMD GPU with code for VLIW ... and one of the reasons they dropped VLIW in HD7xxx was because writing good compilers was really hard, expansive, time consuming, etc.
The HLSL compiler to assembler is also auto-vectorizing, and it's not that complex. The complex piece in the mentioned equation is VLIW.
Of course HLSL is primitive in comparison to C++, and it's easier to have small pattern-databases (peep-hole auto-vectorization would that be called I guess). Making decisions about optimality isn't that streightforward in C++. I don't think the AV itself is really the problem.
Ethatron is offline   Reply With Quote
Old 13-Feb-2012, 21:26   #23
Nick
Senior Member
 
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
Default

Quote:
Originally Posted by RecessionCone View Post
What is the AVX2 vector equivalent of a store instruction? AFAIK, it doesn't exist.
Indeed, there's no scatter in AVX2, but that's not an issue in practice because it should be avoided anyway. For best performance you should store results linearly and read sparse data with gather. You can also use the new permute instructions. And you can always fall back to scalar extract and store instructions.
Quote:
AVX2 is a step forward, and will help vectorizing compilers, but it still isn't as good a compile target as either GPUs or Larrabee.
Why? I don't think any GPU has actual scatter support, and it might have been a macro in LRB. The problem is you can't achieve memory ordering consistency for scatter without blocking the load ports. So I don't think you lose anything from not actually supporting it.

Or were you thinking of something else that AVX2 is lacking?
Nick is offline   Reply With Quote
Old 13-Feb-2012, 23:25   #24
RecessionCone
Member
 
Join Date: Feb 2010
Posts: 170
Default

Quote:
Originally Posted by Nick View Post
Indeed, there's no scatter in AVX2, but that's not an issue in practice because it should be avoided anyway. For best performance you should store results linearly and read sparse data with gather. You can also use the new permute instructions. And you can always fall back to scalar extract and store instructions.

Why? I don't think any GPU has actual scatter support, and it might have been a macro in LRB. The problem is you can't achieve memory ordering consistency for scatter without blocking the load ports. So I don't think you lose anything from not actually supporting it.

Or were you thinking of something else that AVX2 is lacking?
GPUs have real scatter support. Which can't be replaced by permute instructions or linear writes + gathers in the general case. Of course, GPUs don't have memory consistency problems either.
RecessionCone is offline   Reply With Quote
Old 14-Feb-2012, 01:45   #25
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 4,070
Send a message via Skype™ to rpg.314
Default

Quote:
Originally Posted by Nick View Post
Should? How do you determine which composite type should become a standard? And why exactly?
Because a stnadard library's job is to provide good defaults for code that is widely used. Like STL.

Quote:
It shouldn't be any more fragile than GPGPU.
Let us agree to disagree.
__________________
The views presented here are my own and not my employer's.
Quote:
Originally Posted by Alexko View Post
So in a nutshell, model [BLANK] will have [BLANK], up to [BLANK], and even [BLANK] for a power consumption of just [BLANK]. Impressive.
rpg.314 is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 12:13.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.