Some info about SSE4

pcchen

Moderator
Moderator
Veteran
Supporter
Intel released some information about SSE4 in IDF. You can read it here:

https://intel.wingateweb.com/published/BMAS005/BMAS005_100Eng.pdf

The video encoding related instructions are quite crazy. There are two instructions designed for motion search. One is to do 8 4-byte sum of absolute difference at once, and another is to find the minimum number (and the position) from 8 16 bits numbers! IMHO that's quite crazy. Although these do provide some nice speed up in video encoding applications (probably also others, such as motion detection in video).

Some other instructions are quite good, such as the "load and expand" instructions which can load bytes/words and automatically expends to words/dwords.

The new floating point instructions are also interesting. There are insertion/extraction instructions which allow one to put a 32 bits floating point number into a specific position of a 4D vector (or extract a number from a 4D vector). That's quite handy. Also the long awaited dot product instructions which can do DP2, DP3, and DP4. There are also blending instructions which enables a limited way of flow control in the form of predication. i.e. you can select individual components from two vectors. And finally, there are new instructions for rounding.

There is also a new streaming load instruction which enables faster loading from co-processor memory (such as video card memory).
 
Back
Top