santyhammer
Newcomer
Intel released the some days ago a PDF with the future SSE4 instructions:
http://www.intel.com/technology/architecture/new_instructions.htm
ftp://download.intel.com/technology/architecture/new-instructions-paper.pdf
notice it includes a 1-cycle SIMD dot product ( and also an interesting instruction for regEx string manipulation which can be used by antivirus and sprintf/sscanf very well ). Oh btw, there is no CPU atm supporting that... People thing SSE4 is present at the moment in the Core 2 Duo, but nope... what C2D has is SSE3.1...
and I wonder... When the CPU was so bad designed that lost the battle versus the GPUs, PPUs, etc...?
Why Intel and AMD didn't listen us and put the DOT, MAD, ADD/SUB/MUL/DIV, SQRT, LOG and EXP shader 1-cycle instructions inside the CPU and create a REAL and USEFUL SSE implementation?
Why SSE is, in general, so poor, bad designed and slow? I need like 4 shuffles, 2 adds and 2 mul to perfom a simple dot product... With SSE3 is a bit better ( mul and hadd ) but still bad... Oh, come on... See the Xenos/Cell VMX128 for example... is much better... http://arstechnica.com/articles/culture/mattlee.ars/3
Perfectly Intel/AMD could do this and recover part of the power they lost with GPUs...
Now we can use CUDA and ATI GPGPU sdks... But I think the future ( with Fusion kicking hard ) is to integrate these 1-cycle CISC shader instructions inside the CPU and forget any GPU, PPU or NPU. And, seriously, I want somebody to tell me why the CPU companies didn't design the SIMD well in a start so they could avoided the GPU growing and maintain all the calculations inside the CPU.
So pls, if anybody at Intel is listening... re-though that SSE4 and ADD all the DX10 shader instructions to that.
http://www.intel.com/technology/architecture/new_instructions.htm
ftp://download.intel.com/technology/architecture/new-instructions-paper.pdf
notice it includes a 1-cycle SIMD dot product ( and also an interesting instruction for regEx string manipulation which can be used by antivirus and sprintf/sscanf very well ). Oh btw, there is no CPU atm supporting that... People thing SSE4 is present at the moment in the Core 2 Duo, but nope... what C2D has is SSE3.1...
and I wonder... When the CPU was so bad designed that lost the battle versus the GPUs, PPUs, etc...?
Why Intel and AMD didn't listen us and put the DOT, MAD, ADD/SUB/MUL/DIV, SQRT, LOG and EXP shader 1-cycle instructions inside the CPU and create a REAL and USEFUL SSE implementation?
Why SSE is, in general, so poor, bad designed and slow? I need like 4 shuffles, 2 adds and 2 mul to perfom a simple dot product... With SSE3 is a bit better ( mul and hadd ) but still bad... Oh, come on... See the Xenos/Cell VMX128 for example... is much better... http://arstechnica.com/articles/culture/mattlee.ars/3
Perfectly Intel/AMD could do this and recover part of the power they lost with GPUs...
Now we can use CUDA and ATI GPGPU sdks... But I think the future ( with Fusion kicking hard ) is to integrate these 1-cycle CISC shader instructions inside the CPU and forget any GPU, PPU or NPU. And, seriously, I want somebody to tell me why the CPU companies didn't design the SIMD well in a start so they could avoided the GPU growing and maintain all the calculations inside the CPU.
So pls, if anybody at Intel is listening... re-though that SSE4 and ADD all the DX10 shader instructions to that.
Last edited by a moderator: